Speech Intelligence
Transcribe audio and video files into text with segment-level timestamps, automatic language detection, and subtitle export in SRT, VTT, and TXT formats.
Best for: meeting recordings, podcast transcription, video subtitles, and any workflow that needs accurate text from spoken audio.
Activate Speech Intelligence | Speech API reference
What this is
WAYSCloud Speech Intelligence is a job-based audio transcription API. You create a job, upload an audio or video file (up to 2 GB), confirm the upload, and poll until the transcript is ready. The service detects language automatically (30+ languages supported) and returns segment-level timestamps. You can export the transcript as SRT subtitles, VTT subtitles, or plain text.
Processing takes roughly 10-30% of the audio duration. A 10-minute recording completes in 1-3 minutes.
When to use it
Use this when:
- You need transcripts of meetings, interviews, calls, or podcasts
- You want subtitles for video content in SRT or VTT format
- You need to search or analyze spoken content as text
- You have audio in any of 30+ languages and want automatic detection
Consider something else when:
- You need real-time speech-to-text during a live call — Speech Intelligence processes uploaded files, not live streams
- You need text generation or summarization — use LLM API on the transcript output
What you get
- Job-based pipeline: create, upload, confirm, poll, export
- Segment-level timestamps: start and end time for each spoken segment
- Automatic language detection across 30+ languages
- 3 export formats: SRT, VTT, TXT
- Large file support: up to 2 GB per file
- Wide format support: mp3, wav, m4a, ogg, flac, aac, mp4, webm, mkv, avi
- Presigned upload URL: secure, time-limited upload link (valid 1 hour)
Pricing
All prices exclude VAT.
| Item | EUR | NOK | SEK | DKK |
|---|---|---|---|---|
| Speech Intelligence (Pay-as-you-go) | 0.05 | 0.50 | 0.50 | 0.35 |
How it works
- Create a job by calling
POST /v1/transcript/jobswith the filename and optional language hint. - Receive a presigned upload URL (valid 1 hour).
- Upload the file using a PUT request to the presigned URL.
- Confirm the upload by calling
POST /v1/transcript/jobs/{id}/upload-complete. - Poll for completion via
GET /v1/transcript/jobs/{id}. Status progresses:createdthenqueuedthenprocessingthenready. - Export the transcript as SRT, VTT, or TXT via
POST /v1/transcript/jobs/{id}/export.
What you see in the dashboard
- Job list: filename, language, status (created / queued / processing / ready / failed), duration, cost
- Transcript viewer: full text with clickable timestamps per segment
- Language badge: auto-detected or manually specified language
- Export buttons: download as SRT, VTT, TXT, or JSON
- Usage this month: total minutes transcribed and cost
Fastest way to get started
Dashboard
- Open my.wayscloud.services and go to AI & Machine Learning then Speech Intelligence
- Click Activate and copy your API key
- Create a job via the API and upload your first file
API
# Step 1: Create a job
curl -X POST https://api.wayscloud.services/v1/transcript/jobs \
-H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET" \
-H "Content-Type: application/json" \
-d '{"original_filename": "team-standup.mp3", "language": "auto"}'
# Step 2: Upload the file to the presigned URL from the response
curl -X PUT "https://storage.wayscloud.services/transcript-originals/..." \
-H "Content-Type: audio/mpeg" \
--data-binary @team-standup.mp3
# Step 3: Confirm upload
curl -X POST https://api.wayscloud.services/v1/transcript/jobs/{job_id}/upload-complete \
-H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET"Example request and response
Request: Create a transcription job
curl -X POST https://api.wayscloud.services/v1/transcript/jobs \
-H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET" \
-H "Content-Type: application/json" \
-d '{
"original_filename": "customer-interview-march.m4a",
"language": "auto"
}'Response:
{
"job_id": "b2c3d4e5-f6a7-8901-bcde-f23456789012",
"upload_url": "https://storage.wayscloud.services/transcript-originals/b2c3d4e5...",
"upload_expires_in": 3600
}Poll for completed transcript:
curl https://api.wayscloud.services/v1/transcript/jobs/b2c3d4e5-f6a7-8901-bcde-f23456789012 \
-H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET"Response (ready):
{
"job_id": "b2c3d4e5-f6a7-8901-bcde-f23456789012",
"status": "ready",
"language": "en",
"original_filename": "customer-interview-march.m4a",
"audio_duration_sec": 2340.7,
"processing_time_ms": 58200,
"segments": [
{"start": 0.0, "end": 3.8, "text": "Thank you for joining us today."},
{"start": 4.1, "end": 8.5, "text": "I'd like to start by asking about your experience with our platform."},
{"start": 9.0, "end": 14.2, "text": "Sure, I've been using it for about six months now and overall it's been very positive."}
]
}Export as SRT:
curl -X POST https://api.wayscloud.services/v1/transcript/jobs/b2c3d4e5.../export \
-H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET" \
-H "Content-Type: application/json" \
-d '{"format": "srt"}'Common use cases
- Meeting transcription — transcribe team meetings and generate searchable archives
- Video subtitles — export SRT or VTT files for video hosting platforms
- Podcast processing — convert episodes to text for show notes, SEO, and search
- Interview analysis — transcribe customer or user research interviews for qualitative analysis
- Compliance recording — transcribe call recordings for audit and review
Related services
- LLM API — summarize, translate, or analyze transcripts with language models
- GPU Studio — generate visual content to complement audio content
- Object Storage — store audio files and transcripts long-term
Related documentation
- Transcribe Audio — step-by-step guide
- Speech Intelligence API reference — all 6 endpoints
- API Keys — managing API credentials
- Getting Started — platform overview