Skip to content

Transcribe Audio

Upload audio or video files and receive accurate transcriptions with speaker detection, word-level timestamps, and export in multiple formats.

What you are building

A transcription pipeline where you upload a file, the system processes it with speech recognition, and you get back timestamped text with speaker labels.

When to use this approach

  • Meeting recordings, interviews, or podcasts
  • Video subtitling (export as SRT or VTT)
  • Audio content indexing and search
  • Compliance recordings that need written records

What you need

  • A WAYSCloud account
  • An audio or video file (max 2 GB)

Step 1 — Create a transcription job

Dashboard

  1. Open ServicesAI & Machine LearningSpeech Intelligence in the dashboard
  2. Click New Transcription
  3. Upload your file
  4. Select language (or leave on auto-detect)

API

Request a presigned upload URL:

bash
curl -X POST https://api.wayscloud.services/v1/dashboard/transcript/jobs \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "filename": "meeting-2025-12-15.mp3",
    "content_type": "audio/mpeg",
    "file_size_bytes": 52428800,
    "language": "auto"
  }'

Response:

json
{
  "job_id": "job-abc123",
  "upload_url": "https://storage.wayscloud.services/uploads/job-abc123?X-Amz-Signature=...",
  "upload_expires_in": 3600
}

Upload the file to the presigned URL:

bash
curl -X PUT "https://storage.wayscloud.services/uploads/job-abc123?X-Amz-Signature=..." \
  -H "Content-Type: audio/mpeg" \
  --data-binary @meeting-2025-12-15.mp3

Confirm the upload:

bash
curl -X POST https://api.wayscloud.services/v1/dashboard/transcript/jobs/job-abc123/upload-complete \
  -H "Authorization: Bearer $JWT_TOKEN"

Step 2 — Wait for processing

The job moves through: queuedprocessingcompleted.

Poll the job status:

bash
curl https://api.wayscloud.services/v1/dashboard/transcript/jobs/job-abc123 \
  -H "Authorization: Bearer $JWT_TOKEN"

Response (completed):

json
{
  "job_id": "job-abc123",
  "status": "completed",
  "language_detected": "en",
  "duration_seconds": 3600,
  "word_count": 4523,
  "segments": [
    {
      "start_time": 0.0,
      "end_time": 5.5,
      "text": "Hello, welcome to the meeting.",
      "confidence": 0.98,
      "speaker_id": 1
    }
  ]
}

Step 3 — Export the result

Generate an export in your preferred format:

bash
curl -X POST https://api.wayscloud.services/v1/dashboard/transcript/jobs/job-abc123/export \
  -H "Authorization: Bearer $JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"format": "srt"}'

Response:

json
{
  "artifact_id": "artifact-xyz",
  "format": "srt",
  "download_url": "https://storage.wayscloud.services/artifacts/artifact-xyz.srt"
}

Supported formats: txt (plain text), json (full data), srt (subtitles), vtt (web subtitles).


Step 4 — Next steps

  • Analyze with AI — Feed transcriptions to the LLM API for summarization. See Run an LLM Request.
  • Store results — Save transcription files in Object Storage. See Store Files.
  • Automate — Use the API to build automated transcription pipelines

API reference

WAYSCloud AS