Speech Intelligence

Transcribe audio and video files into text with segment-level timestamps, automatic language detection, and subtitle export in SRT, VTT, and TXT formats.

Best for: meeting recordings, podcast transcription, video subtitles, and any workflow that needs accurate text from spoken audio.

Activate Speech Intelligence | Speech API reference

What this is

WAYSCloud Speech Intelligence is a job-based audio transcription API. You create a job, upload an audio or video file (up to 2 GB), confirm the upload, and poll until the transcript is ready. The service detects language automatically (30+ languages supported) and returns segment-level timestamps. You can export the transcript as SRT subtitles, VTT subtitles, or plain text.

Processing takes roughly 10-30% of the audio duration. A 10-minute recording completes in 1-3 minutes.

When to use it

Use this when:

You need transcripts of meetings, interviews, calls, or podcasts
You want subtitles for video content in SRT or VTT format
You need to search or analyze spoken content as text
You have audio in any of 30+ languages and want automatic detection

Consider something else when:

You need real-time speech-to-text during a live call — Speech Intelligence processes uploaded files, not live streams
You need text generation or summarization — use LLM API on the transcript output

What you get

Job-based pipeline: create, upload, confirm, poll, export
Segment-level timestamps: start and end time for each spoken segment
Automatic language detection across 30+ languages
3 export formats: SRT, VTT, TXT
Large file support: up to 2 GB per file
Wide format support: mp3, wav, m4a, ogg, flac, aac, mp4, webm, mkv, avi
Presigned upload URL: secure, time-limited upload link (valid 1 hour)

Pricing

All prices exclude VAT.

Item	EUR	NOK	SEK	DKK
Speech Intelligence (Pay-as-you-go)	0.05	0.50	0.50	0.35

View all plans in dashboard

How it works

Create a job by calling POST /v1/transcript/jobs with the filename and optional language hint.
Receive a presigned upload URL (valid 1 hour).
Upload the file using a PUT request to the presigned URL.
Confirm the upload by calling POST /v1/transcript/jobs/{id}/upload-complete.
Poll for completion via GET /v1/transcript/jobs/{id}. Status progresses: created then queued then processing then ready.
Export the transcript as SRT, VTT, or TXT via POST /v1/transcript/jobs/{id}/export.

What you see in the dashboard

Job list: filename, language, status (created / queued / processing / ready / failed), duration, cost
Transcript viewer: full text with clickable timestamps per segment
Language badge: auto-detected or manually specified language
Export buttons: download as SRT, VTT, TXT, or JSON
Usage this month: total minutes transcribed and cost

Fastest way to get started

Dashboard

Open my.wayscloud.services and go to AI & Machine Learning then Speech Intelligence
Click Activate and copy your API key
Create a job via the API and upload your first file

API

bash

# Step 1: Create a job
curl -X POST https://api.wayscloud.services/v1/transcript/jobs \
  -H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"original_filename": "team-standup.mp3", "language": "auto"}'

# Step 2: Upload the file to the presigned URL from the response
curl -X PUT "https://storage.wayscloud.services/transcript-originals/..." \
  -H "Content-Type: audio/mpeg" \
  --data-binary @team-standup.mp3

# Step 3: Confirm upload
curl -X POST https://api.wayscloud.services/v1/transcript/jobs/{job_id}/upload-complete \
  -H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET"

Example request and response

Request: Create a transcription job

bash

curl -X POST https://api.wayscloud.services/v1/transcript/jobs \
  -H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "original_filename": "customer-interview-march.m4a",
    "language": "auto"
  }'

Response:

json

{
  "job_id": "b2c3d4e5-f6a7-8901-bcde-f23456789012",
  "upload_url": "https://storage.wayscloud.services/transcript-originals/b2c3d4e5...",
  "upload_expires_in": 3600
}

Poll for completed transcript:

bash

curl https://api.wayscloud.services/v1/transcript/jobs/b2c3d4e5-f6a7-8901-bcde-f23456789012 \
  -H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET"

Response (ready):

json

{
  "job_id": "b2c3d4e5-f6a7-8901-bcde-f23456789012",
  "status": "ready",
  "language": "en",
  "original_filename": "customer-interview-march.m4a",
  "audio_duration_sec": 2340.7,
  "processing_time_ms": 58200,
  "segments": [
    {"start": 0.0, "end": 3.8, "text": "Thank you for joining us today."},
    {"start": 4.1, "end": 8.5, "text": "I'd like to start by asking about your experience with our platform."},
    {"start": 9.0, "end": 14.2, "text": "Sure, I've been using it for about six months now and overall it's been very positive."}
  ]
}

Export as SRT:

bash

curl -X POST https://api.wayscloud.services/v1/transcript/jobs/b2c3d4e5.../export \
  -H "X-API-Key: wayscloud_speech_abc12_YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"format": "srt"}'

Common use cases

Meeting transcription — transcribe team meetings and generate searchable archives
Video subtitles — export SRT or VTT files for video hosting platforms
Podcast processing — convert episodes to text for show notes, SEO, and search
Interview analysis — transcribe customer or user research interviews for qualitative analysis
Compliance recording — transcribe call recordings for audit and review

LLM API — summarize, translate, or analyze transcripts with language models
GPU Studio — generate visual content to complement audio content
Object Storage — store audio files and transcripts long-term

Transcribe Audio — step-by-step guide
Speech Intelligence API reference — all 6 endpoints
API Keys — managing API credentials
Getting Started — platform overview

Open in dashboard

Speech Intelligence ​

What this is ​

When to use it ​

What you get ​

Pricing ​

How it works ​

What you see in the dashboard ​

Fastest way to get started ​

Dashboard ​

API ​

Example request and response ​

Common use cases ​

Related services ​

Related documentation ​

Speech Intelligence

What this is

When to use it

What you get

Pricing

How it works

What you see in the dashboard

Fastest way to get started

Dashboard

API

Example request and response

Common use cases

Related services

Related documentation