Speech Intelligence

Transcribe audio and video files with automatic language detection, segment-level timestamps, and export to SRT/VTT — powered by GPU infrastructure.

What it is

Speech Intelligence in WAYSCloud transcribes audio and video files into text with segment-level timestamps. Upload any file with an audio track, and get back structured transcription with optional subtitle exports.

Supports automatic language detection and manual language selection for 30+ languages.

When to use

Use this when you need:

Meeting and interview transcription
Subtitle generation for video content (SRT, VTT)
Audio content indexing and search
Podcast and media processing pipelines

When NOT to use:

Real-time speech-to-text — Speech Intelligence is file-based, not streaming
Text generation — use LLM API

How it works

Speech Intelligence uses a job-based workflow:

Create a job and receive a presigned upload URL
Upload your audio/video file (up to 2 GB)
Confirm upload to start processing
Poll until status is ready, then read segments or export subtitles

Processing time depends on file length — typically 1-2x real-time.

Features

WAYSCloud Speech Intelligence is built for accuracy, flexibility, and batch processing.

Transcription

Automatic language detection (or specify ISO 639-1 code)
Segment-level timestamps with confidence scores
Support for 30+ languages

Export

Built-in TXT and JSON exports
On-demand SRT and VTT subtitle generation
Structured segments for custom processing

Input

Any file with an audio track (MP3, WAV, M4A, MP4, WEBM, etc.)
Up to 2 GB file size
Presigned upload URLs for secure transfer

Getting started

All Speech Intelligence functionality is available via the WAYSCloud API.

Create transcription job

Create a transcription job and receive a presigned upload URL.

The returned upload_url is a presigned S3 PUT URL valid for 1 hour. Upload your file directly to this URL, then call the upload-complete endpoint to start processing.

If the file is not uploaded within 1 hour, the job expires automatically (status becomes expired).

Content validation: The server validates the file using ffprobe after upload — the file must contain at least one audio stream. The content_type field is metadata only; actual format detection is server-side.

bash

curl -X POST https://api.wayscloud.services/v1/transcript/jobs \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "filename": "meeting-recording.mp3",
  "content_type": "audio/mpeg",
  "file_size_bytes": 15728640,
  "language": "auto"
}'

List transcription jobs

List transcription jobs for your account with optional status filter.

bash

curl https://api.wayscloud.services/v1/transcript/jobs \
  -H "X-API-Key: YOUR_API_KEY"

See all 6 endpoints in the Speech Intelligence API reference.

Limits and quotas

Limits and quotas depend on the selected plan and region. See the dashboard or API for current constraints.

Open Speech Intelligence in dashboard

Speech Intelligence ​

What it is ​

When to use ​

How it works ​

Features ​

Transcription ​

Export ​

Input ​

Getting started ​

Create transcription job ​

List transcription jobs ​

Limits and quotas ​

Related services ​