Skip to content

Speech Intelligence

Transcribe audio and video files into text with segment-level timestamps, automatic language detection, and subtitle export in SRT, VTT, and TXT formats.

Best for: meeting recordings, podcast transcription, video subtitles, and any workflow that needs accurate text from spoken audio.

Activate Speech Intelligence | Speech API reference


What this is

WAYSCloud Speech Intelligence is a job-based audio transcription API. You create a job, upload an audio or video file (up to 2 GB), confirm the upload, and poll until the transcript is ready. The service detects language automatically (30+ languages supported) and returns segment-level timestamps. You can export the transcript as SRT subtitles, VTT subtitles, or plain text.

Processing takes roughly 10-30% of the audio duration. A 10-minute recording completes in 1-3 minutes.


When to use it

Use this when:

  • You need transcripts of meetings, interviews, calls, or podcasts
  • You want subtitles for video content in SRT or VTT format
  • You need to search or analyze spoken content as text
  • You have audio in any of 30+ languages and want automatic detection

Consider something else when:

  • You need real-time speech-to-text during a live call — Speech Intelligence processes uploaded files, not live streams
  • You need text generation or summarization — use LLM API on the transcript output

What you get

  • Job-based pipeline: create, upload, confirm, poll, export
  • Segment-level timestamps: start and end time for each spoken segment
  • Automatic language detection across 30+ languages
  • 3 export formats: SRT, VTT, TXT
  • Large file support: up to 2 GB per file
  • Wide format support: mp3, wav, m4a, ogg, flac, aac, mp4, webm, mkv, avi
  • Presigned upload URL: secure, time-limited upload link (valid 1 hour)

Pricing

All prices exclude VAT.

ItemEURNOKSEKDKK
Speech Intelligence (Pay-as-you-go)0.050.500.500.35

View all plans in dashboard


How it works

  1. Create a job by calling POST /v1/transcript/jobs with the filename and optional language hint.
  2. Receive a presigned upload URL (valid 1 hour).
  3. Upload the file using a PUT request to the presigned URL.
  4. Confirm the upload by calling POST /v1/transcript/jobs/{id}/upload-complete.
  5. Poll for completion via GET /v1/transcript/jobs/{id}. Status progresses: created then queued then processing then ready.
  6. Export the transcript as SRT, VTT, or TXT via POST /v1/transcript/jobs/{id}/export.

What you see in the dashboard

  • Job list: filename, language, status (created / queued / processing / ready / failed), duration, cost
  • Transcript viewer: full text with clickable timestamps per segment
  • Language badge: auto-detected or manually specified language
  • Export buttons: download as SRT, VTT, TXT, or JSON
  • Usage this month: total minutes transcribed and cost

Fastest way to get started

Dashboard

  1. Open my.wayscloud.services and go to AI & Machine Learning then Speech Intelligence
  2. Click Activate and copy your API key
  3. Create a job via the API and upload your first file

API

bash
# Step 1: Create a job
curl -X POST https://api.wayscloud.services/v1/transcript/jobs \
  -H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"original_filename": "team-standup.mp3", "language": "auto"}'

# Step 2: Upload the file to the presigned URL from the response
curl -X PUT "https://storage.wayscloud.services/transcript-originals/..." \
  -H "Content-Type: audio/mpeg" \
  --data-binary @team-standup.mp3

# Step 3: Confirm upload
curl -X POST https://api.wayscloud.services/v1/transcript/jobs/{job_id}/upload-complete \
  -H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET"

Example request and response

Request: Create a transcription job

bash
curl -X POST https://api.wayscloud.services/v1/transcript/jobs \
  -H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "original_filename": "customer-interview-march.m4a",
    "language": "auto"
  }'

Response:

json
{
  "job_id": "b2c3d4e5-f6a7-8901-bcde-f23456789012",
  "upload_url": "https://storage.wayscloud.services/transcript-originals/b2c3d4e5...",
  "upload_expires_in": 3600
}

Poll for completed transcript:

bash
curl https://api.wayscloud.services/v1/transcript/jobs/b2c3d4e5-f6a7-8901-bcde-f23456789012 \
  -H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET"

Response (ready):

json
{
  "job_id": "b2c3d4e5-f6a7-8901-bcde-f23456789012",
  "status": "ready",
  "language": "en",
  "original_filename": "customer-interview-march.m4a",
  "audio_duration_sec": 2340.7,
  "processing_time_ms": 58200,
  "segments": [
    {"start": 0.0, "end": 3.8, "text": "Thank you for joining us today."},
    {"start": 4.1, "end": 8.5, "text": "I'd like to start by asking about your experience with our platform."},
    {"start": 9.0, "end": 14.2, "text": "Sure, I've been using it for about six months now and overall it's been very positive."}
  ]
}

Export as SRT:

bash
curl -X POST https://api.wayscloud.services/v1/transcript/jobs/b2c3d4e5.../export \
  -H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"format": "srt"}'

Common use cases

  • Meeting transcription — transcribe team meetings and generate searchable archives
  • Video subtitles — export SRT or VTT files for video hosting platforms
  • Podcast processing — convert episodes to text for show notes, SEO, and search
  • Interview analysis — transcribe customer or user research interviews for qualitative analysis
  • Compliance recording — transcribe call recordings for audit and review

  • LLM API — summarize, translate, or analyze transcripts with language models
  • GPU Studio — generate visual content to complement audio content
  • Object Storage — store audio files and transcripts long-term

Open in dashboard

WAYSCloud AS