Skip to content

Speech Intelligence

Transcribe audio and video files with automatic language detection, segment-level timestamps, and export to SRT/VTT — powered by GPU infrastructure.

What it is

Speech Intelligence in WAYSCloud transcribes audio and video files into text with segment-level timestamps. Upload any file with an audio track, and get back structured transcription with optional subtitle exports.

Supports automatic language detection and manual language selection for 30+ languages.

When to use

Use this when you need:

  • Meeting and interview transcription
  • Subtitle generation for video content (SRT, VTT)
  • Audio content indexing and search
  • Podcast and media processing pipelines

When NOT to use:

  • Real-time speech-to-text — Speech Intelligence is file-based, not streaming
  • Text generation — use LLM API

How it works

Speech Intelligence uses a job-based workflow:

  1. Create a job and receive a presigned upload URL
  2. Upload your audio/video file (up to 2 GB)
  3. Confirm upload to start processing
  4. Poll until status is ready, then read segments or export subtitles

Processing time depends on file length — typically 1-2x real-time.

Features

WAYSCloud Speech Intelligence is built for accuracy, flexibility, and batch processing.

Transcription

  • Automatic language detection (or specify ISO 639-1 code)
  • Segment-level timestamps with confidence scores
  • Support for 30+ languages

Export

  • Built-in TXT and JSON exports
  • On-demand SRT and VTT subtitle generation
  • Structured segments for custom processing

Input

  • Any file with an audio track (MP3, WAV, M4A, MP4, WEBM, etc.)
  • Up to 2 GB file size
  • Presigned upload URLs for secure transfer

Getting started

All Speech Intelligence functionality is available via the WAYSCloud API.

Create transcription job

Create a transcription job and receive a presigned upload URL.

The returned upload_url is a presigned S3 PUT URL valid for 1 hour. Upload your file directly to this URL, then call the upload-complete endpoint to start processing.

If the file is not uploaded within 1 hour, the job expires automatically (status becomes expired).

Content validation: The server validates the file using ffprobe after upload — the file must contain at least one audio stream. The content_type field is metadata only; actual format detection is server-side.

bash
curl -X POST https://api.wayscloud.services/v1/transcript/jobs \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "filename": "meeting-recording.mp3",
  "content_type": "audio/mpeg",
  "file_size_bytes": 15728640,
  "language": "auto"
}'

List transcription jobs

List transcription jobs for your account with optional status filter.

bash
curl https://api.wayscloud.services/v1/transcript/jobs \
  -H "X-API-Key: YOUR_API_KEY"

See all 6 endpoints in the Speech Intelligence API reference.

Limits and quotas

Limits and quotas depend on the selected plan and region. See the dashboard or API for current constraints.

Open Speech Intelligence in dashboard

WAYSCloud AS