Transcribe audio
Upload an audio file and receive a text transcription. Supports multiple languages via model selection. Optionally enables language correction and semantic tagging for enhanced output.
This endpoint is OpenAI SDK compatible. You can use the official OpenAI TypeScript or Python
client libraries by setting the base URL to https://api.valsea.ai/v1. Authentication is required
-- include your API key in the Authorization header or pass it via the SDK's apiKey parameter.
Form Fields
| Parameter | Type | Required | Description |
|---|---|---|---|
file | string (binary) | Yes | The audio file to transcribe (WAV, MP3, M4A, FLAC, OGG, WEBM). |
model | valsea-transcribe | Yes | The transcription model to use. Always use valsea-transcribe. Language routing is determined by the language parameter. |
language | string | Yes | Language of the audio. 100+ languages supported. See the full supported languages list below. |
response_format | json | verbose_json | No | Response verbosity level. Default: json |
enable_correction | boolean | No | Enable language/grammar correction on the transcript. Default: true |
enable_tags | boolean | No | Enable semantic tagging of the transcript.. Default: true |
diarize | boolean | No | Enable speaker diarization. Requires response_format=verbose_json. Default: false |
diarization_min_speakers | integer | No | Minimum expected speaker count for diarization. Default: 2 |
diarization_max_speakers | integer | No | Maximum expected speaker count for diarization. Default: 6 |
Speaker Diarization
Set diarize=true to add speaker labels to the verbose response. Speaker diarization
uses a Google Speech-to-Text diarization pass and returns speaker IDs on word objects plus grouped
utterances.
Speaker diarization is only available with response_format=verbose_json. Requests with
diarize=true and response_format=json are rejected with 400 because minimal output cannot include
speaker-level metadata.
Speaker diarization bills at 2x the normal transcription credit cost. If you also use paid rate-limit bypass, the request bills at 4x normal transcription credits.
The API playgrounds include a multi-speaker sample clip that automatically sets
response_format=verbose_json, diarize=true, and a 2-speaker range for quick testing.
Paid Rate-Limit Bypass
If you need to exceed the transcription RPM limit for a request, you can bypass the rate-limit check by sending one of these opt-in flags:
- Header:
X-Bypass-Rate-Limit: true - Query parameter:
bypass_rate_limit=true - Query parameter:
bypassRateLimit=true
Bypass applies only to the per-organization rate-limit check. Authentication, credit checks, upload
size limits, and concurrent upload limits still apply. Requests using this bypass are billed at 2x
the normal transcription credit cost and include X-Credits-Multiplier: 2 in the response headers. If
speaker diarization is also enabled, the multiplier becomes 4.
Code Examples
This endpoint is OpenAI SDK compatible. You can use the official OpenAI client libraries by pointing them at the Valsea API base URL.
curl -X POST https://api.valsea.ai/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@audio.wav" \
-F "model=valsea-transcribe" \
-F "language=english" \
-F "response_format=verbose_json" \
-F "diarize=true"
Response
Minimal (json)
| Field | Type | Description |
|---|---|---|
text | string | The transcribed text. |
Verbose (verbose_json)
| Field | Type | Description |
|---|---|---|
text | string | The transcribed text. |
raw_transcript | string | Original transcript before corrections. |
detected_languages | array | Languages detected in the audio. |
corrections | array | List of corrections applied. |
semantic_tags | array | Semantic tags extracted from the text. |
annotated_text | string | Text with inline annotations. |
clarified_text | string | Clarified version of the transcript. |
words | array | Word-level timing and zero-based speaker labels when diarize=true. |
utterances | array | Contiguous speaker turns with speaker, timing, transcript, and words when diarize=true. |
Errors
| Status | Description |
|---|---|
401 | Missing or invalid API key |
402 | Insufficient credits |
429 | Rate limit exceeded |
List of Languages
Southeast Asia
Singlish — singlish
Indonesian — indonesian
Malaysian — malay
Vietnamese — vietnamese
Thai — thai
Javanese — javanese
Lao — lao
Khmer — khmer
Filipino/Tagalog — filipino
English (Philippines) — english-philippines
Middle East & North Africa
Arabic — arabic arabic-algeria arabic-bahrain arabic-egypt arabic-israel arabic-jordan arabic-kuwait arabic-lebanon arabic-mauritania arabic-morocco arabic-oman arabic-palestine arabic-qatar arabic-saudi arabic-syria arabic-tunisia arabic-uae arabic-yemen
Persian — persian
Hebrew — hebrew
Amharic — amharic
Wolof — wolof
Sub-Saharan Africa
Swahili — swahili swahili-ke
Afrikaans — afrikaans
Akan — akan
Bemba — bemba
Fulani — fulani
Ga — ga
Hausa — hausa
Igbo — igbo
Luganda — luganda
Xhosa — xhosa
Yoruba — yoruba
Zulu — zulu
Northern Sotho — northern-sotho
Nyankole — nyankole
Oromo — oromo
Pidgin — pidgin
Kinyarwanda — kinyarwanda
Shona — shona
Sotho — sotho
Tswana — tswana
Twi — twi
South Asia
Bengali — bengali-bd bengali-in
Hindi — hindi
Gujarati — gujarati
Kannada — kannada
Malayalam — malayalam
Marathi — marathi
Nepali — nepali
Oriya — oriya
Punjabi — punjabi
Sinhala — sinhala
Tamil — tamil
Telugu — telugu
Assamese — assamese
East Asia
Chinese — cantonese chinese chinese-simplified chinese-traditional
Covers a wide range of regional accents and dialects, including those from Anhui, Beijing, Chongqing, Gansu, Guangdong, Guangxi, Guizhou, Hangzhou, Hebei, Henan, Hong Kong, Hubei, Jiangsu, Jianghuai, Jiaoliao, Jilu, Lanyin, Nanjing, Northeast, Ningxia, Shaanxi, Shandong, Sichuan, Taiwan, Tianjin, and Yunnan.
Japanese — japanese
Korean — korean
Mongolian — mongolian
Central & Western Asia
Azerbaijani — azerbaijani
Armenian — armenian
Georgian — georgian
Kazakh — kazakh
Kurdish — kurdish
Kyrgyz — kyrgyz
Uzbek — uzbek
Turkish — turkish
Western Europe
English — english english-au english-gb english-in english-philippines english-us
French — french french-ca
Spanish — spanish spanish-es spanish-mexico spanish-us
Portuguese — portuguese portuguese-br
German — german
Dutch — dutch
Italian — italian
Catalan — catalan
Galician — galician
Asturian — asturian
Basque — basque
Welsh — welsh
Luxembourgish — luxembourgish
Maltese — maltese
Northern Europe
Danish — danish
Finnish — finnish
Icelandic — icelandic
Norwegian — norwegian
Swedish — swedish
Estonian — estonian
Latvian — latvian
Lithuanian — lithuanian
Eastern Europe & Balkans
Bulgarian — bulgarian
Croatian — croatian
Czech — czech
Hungarian — hungarian
Macedonian — macedonian
Polish — polish
Romanian — romanian
Russian — russian
Serbian — serbian
Slovak — slovak
Slovenian — slovenian
Ukrainian — ukrainian
Albanian — albanian
Greek — greek
Pacific & Oceania
Maori — maori