Text to speech

Generate spoken audio from text using Valsea voice aliases. The endpoint is synchronous and returns the generated audio bytes directly.

This endpoint is OpenAI SDK compatible. Set the OpenAI client baseURL to https://api.valsea.ai/v1 and use client.audio.speech.create(...).

Voices

Valsea exposes stable voice aliases instead of provider-specific speaker IDs. The same alias maps to the correct underlying speaker for the requested language.

Voice	Description
`valsea-neutral`	Default balanced voice
`valsea-male`	Male voice
`valsea-female`	Female voice

Supported languages are vietnamese, english, english-in, hindi, bengali-in, kannada, malayalam, marathi, odia, oriya, punjabi, tamil, telugu, and gujarati.

Audio samples

Vietnamese sample

English sample

Request body

Parameter	Type	Required	Description
`model`	`valsea-tts`	Yes	Public TTS model name.
`input`	string	Yes	Text to synthesize, up to 10,000 characters. Requests exceeding this limit are rejected with a 400 error (not truncated).
`voice`	`valsea-neutral` \| `valsea-male` \| `valsea-female`	Yes	Stable Valsea voice alias.
`language`	`vietnamese` \| `english` \| `english-in` \| `hindi` \| `bengali-in` \| `kannada` \| `malayalam` \| `marathi` \| `odia` \| `oriya` \| `punjabi` \| `tamil` \| `telugu` \| `gujarati`	No	Language for voice alias routing. Default: `vietnamese`.
`response_format`	`mp3` \| `wav`	No	Audio response format. Default: `mp3`.
`speed`	number	No	Playback speed from `0.25` to `4`. Default: `1`.
`normalization`	`no` \| `basic` \| `advanced`	No	Text normalization level. Default: `basic`.
`audio_quality`	integer	No	Bitrate/quality value. Supported: `32`, `64`, `128`, `192`, `256`, `320`. Default: `64`.

Code examples

import fs from 'fs/promises';
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://api.valsea.ai/v1',
});

const response = await client.audio.speech.create({
  model: 'valsea-tts',
  voice: 'valsea-neutral',
  input: 'Xin chao, day la giong noi Valsea.',
  response_format: 'mp3',
  extra_body: { language: 'vietnamese' },
});

const buffer = Buffer.from(await response.arrayBuffer());
await fs.writeFile('speech.mp3', buffer);

Billing

TTS is billed by generated audio duration, rounded up to the next whole minute. The response includes normal credit headers such as X-Credits-Used.