Transcribe Audio
Upload audio or video files and receive accurate transcriptions with speaker detection, word-level timestamps, and export in multiple formats.
What you are building
A transcription pipeline where you upload a file, the system processes it with speech recognition, and you get back timestamped text with speaker labels.
When to use this approach
- Meeting recordings, interviews, or podcasts
- Video subtitling (export as SRT or VTT)
- Audio content indexing and search
- Compliance recordings that need written records
What you need
- A WAYSCloud account
- An audio or video file (max 2 GB)
Step 1 — Create a transcription job
Dashboard
- Open Services → AI & Machine Learning → Speech Intelligence in the dashboard
- Click New Transcription
- Upload your file
- Select language (or leave on auto-detect)
API
Request a presigned upload URL:
bash
curl -X POST https://api.wayscloud.services/v1/dashboard/transcript/jobs \
-H "Authorization: Bearer $JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"filename": "meeting-2025-12-15.mp3",
"content_type": "audio/mpeg",
"file_size_bytes": 52428800,
"language": "auto"
}'Response:
json
{
"job_id": "job-abc123",
"upload_url": "https://storage.wayscloud.services/uploads/job-abc123?X-Amz-Signature=...",
"upload_expires_in": 3600
}Upload the file to the presigned URL:
bash
curl -X PUT "https://storage.wayscloud.services/uploads/job-abc123?X-Amz-Signature=..." \
-H "Content-Type: audio/mpeg" \
--data-binary @meeting-2025-12-15.mp3Confirm the upload:
bash
curl -X POST https://api.wayscloud.services/v1/dashboard/transcript/jobs/job-abc123/upload-complete \
-H "Authorization: Bearer $JWT_TOKEN"Step 2 — Wait for processing
The job moves through: queued → processing → completed.
Poll the job status:
bash
curl https://api.wayscloud.services/v1/dashboard/transcript/jobs/job-abc123 \
-H "Authorization: Bearer $JWT_TOKEN"Response (completed):
json
{
"job_id": "job-abc123",
"status": "completed",
"language_detected": "en",
"duration_seconds": 3600,
"word_count": 4523,
"segments": [
{
"start_time": 0.0,
"end_time": 5.5,
"text": "Hello, welcome to the meeting.",
"confidence": 0.98,
"speaker_id": 1
}
]
}Step 3 — Export the result
Generate an export in your preferred format:
bash
curl -X POST https://api.wayscloud.services/v1/dashboard/transcript/jobs/job-abc123/export \
-H "Authorization: Bearer $JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"format": "srt"}'Response:
json
{
"artifact_id": "artifact-xyz",
"format": "srt",
"download_url": "https://storage.wayscloud.services/artifacts/artifact-xyz.srt"
}Supported formats: txt (plain text), json (full data), srt (subtitles), vtt (web subtitles).
Step 4 — Next steps
- Analyze with AI — Feed transcriptions to the LLM API for summarization. See Run an LLM Request.
- Store results — Save transcription files in Object Storage. See Store Files.
- Automate — Use the API to build automated transcription pipelines
Related services
- Speech Intelligence — full service documentation
- LLM API — process transcriptions with language models
- Object Storage — store audio files and results
Related guides
- Authentication — API key setup
API reference
- Speech Intelligence API — all transcription endpoints