Text to speech

    Generate spoken audio from text using Valsea voice aliases. The endpoint is synchronous and returns the generated audio bytes directly.

    Voices

    Valsea exposes stable voice aliases instead of provider-specific speaker IDs. The same alias maps to the correct underlying speaker for the requested language.

    VoiceDescription
    valsea-neutralDefault balanced voice
    valsea-maleMale voice
    valsea-femaleFemale voice

    Supported languages are vietnamese and english.

    Audio samples

    Vietnamese sample

    English sample

    Request body

    ParameterTypeRequiredDescription
    modelvalsea-ttsYesPublic TTS model name.
    inputstringYesText to synthesize, up to 10,000 characters.
    voicevalsea-neutral | valsea-male | valsea-femaleYesStable Valsea voice alias.
    languagevietnamese | englishNoLanguage for voice alias routing. Default: vietnamese.
    response_formatmp3 | wavNoAudio response format. Default: mp3.
    speednumberNoPlayback speed from 0.25 to 4. Default: 1.
    normalizationno | basic | advancedNoText normalization level. Default: basic.
    audio_qualityintegerNoBitrate/quality value. Supported: 32, 64, 128, 192, 256, 320. Default: 64.

    Code examples

    import fs from 'fs/promises';
    import OpenAI from 'openai';
    
    const client = new OpenAI({
      apiKey: 'YOUR_API_KEY',
      baseURL: 'https://api.valsea.ai/v1',
    });
    
    const response = await client.audio.speech.create({
      model: 'valsea-tts',
      voice: 'valsea-neutral',
      input: 'Xin chao, day la giong noi Valsea.',
      response_format: 'mp3',
      extra_body: { language: 'vietnamese' },
    });
    
    const buffer = Buffer.from(await response.arrayBuffer());
    await fs.writeFile('speech.mp3', buffer);
    

    Billing

    TTS is billed by generated audio duration, rounded up to the next whole minute. The response includes normal credit headers such as X-Credits-Used.

    Was this page helpful?