Speech Intelligence
Transcribe audio and video files with automatic language detection, segment-level timestamps, and export to SRT/VTT — powered by GPU infrastructure.
What it is
Speech Intelligence in WAYSCloud transcribes audio and video files into text with segment-level timestamps. Upload any file with an audio track, and get back structured transcription with optional subtitle exports.
Supports automatic language detection and manual language selection for 30+ languages.
When to use
Use this when you need:
- Meeting and interview transcription
- Subtitle generation for video content (SRT, VTT)
- Audio content indexing and search
- Podcast and media processing pipelines
When NOT to use:
- Real-time speech-to-text — Speech Intelligence is file-based, not streaming
- Text generation — use LLM API
How it works
Speech Intelligence uses a job-based workflow:
- Create a job and receive a presigned upload URL
- Upload your audio/video file (up to 2 GB)
- Confirm upload to start processing
- Poll until status is
ready, then read segments or export subtitles
Processing time depends on file length — typically 1-2x real-time.
Features
WAYSCloud Speech Intelligence is built for accuracy, flexibility, and batch processing.
Transcription
- Automatic language detection (or specify ISO 639-1 code)
- Segment-level timestamps with confidence scores
- Support for 30+ languages
Export
- Built-in TXT and JSON exports
- On-demand SRT and VTT subtitle generation
- Structured segments for custom processing
Input
- Any file with an audio track (MP3, WAV, M4A, MP4, WEBM, etc.)
- Up to 2 GB file size
- Presigned upload URLs for secure transfer
Getting started
All Speech Intelligence functionality is available via the WAYSCloud API.
Create transcription job
Create a transcription job and receive a presigned upload URL.
The returned upload_url is a presigned S3 PUT URL valid for 1 hour. Upload your file directly to this URL, then call the upload-complete endpoint to start processing.
If the file is not uploaded within 1 hour, the job expires automatically (status becomes expired).
Content validation: The server validates the file using ffprobe after upload — the file must contain at least one audio stream. The content_type field is metadata only; actual format detection is server-side.
curl -X POST https://api.wayscloud.services/v1/transcript/jobs \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"filename": "meeting-recording.mp3",
"content_type": "audio/mpeg",
"file_size_bytes": 15728640,
"language": "auto"
}'List transcription jobs
List transcription jobs for your account with optional status filter.
curl https://api.wayscloud.services/v1/transcript/jobs \
-H "X-API-Key: YOUR_API_KEY"See all 6 endpoints in the Speech Intelligence API reference.
Limits and quotas
Limits and quotas depend on the selected plan and region. See the dashboard or API for current constraints.