Voicebot realtime flow

    Build a realtime voicebot by composing Valsea speech-to-text, OpenAI chat completions, and Valsea realtime text-to-speech.

    Flow

    1. Capture microphone audio in the browser.
    2. Send each detected utterance to POST /v1/audio/transcriptions.
    3. Send the transcript to POST /v1/chat/completions with your OpenAI-compatible conversation messages.
    4. Stream the assistant text to WS /v1/realtime/tts for low-latency speech playback.
    5. Keep your conversation state server-side so browser clients never need to hold long-lived secrets.

    Speech To Text

    Use the standard transcription endpoint for utterance-level recognition.

    cURL

    curl -X POST https://api.valsea.ai/v1/audio/transcriptions \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F file=@utterance.wav \
      -F model=valsea-transcribe \
      -F language=vietnamese
    

    Conversation Turn

    Use chat completions for the voicebot response.

    JavaScript

    const response = await fetch('https://api.valsea.ai/v1/chat/completions', {
      method: 'POST',
      headers: {
        Authorization: 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: 'valsea-fast',
        messages: [
          { role: 'system', content: 'You are a helpful voice assistant.' },
          { role: 'user', content: transcriptText },
        ],
      }),
    });
    
    const completion = await response.json();
    const assistantText = completion.choices[0].message.content;
    

    Realtime Text To Speech

    Open a realtime TTS WebSocket, authenticate, and stream assistant text to receive audio chunks.

    JavaScript

    const socket = new WebSocket('wss://api.valsea.ai/v1/realtime/tts');
    
    socket.addEventListener('open', () => {
      socket.send(JSON.stringify({ token: 'YOUR_API_KEY' }));
    });
    
    socket.addEventListener('message', (event) => {
      if (event.data instanceof Blob) {
        // Queue audio chunks for playback.
        return;
      }
    
      const message = JSON.parse(event.data);
      if (message.type === 'successful-authentication') {
        socket.send(
          JSON.stringify({
            type: 'speak',
            text: assistantText,
            voice: 'valsea-default',
            audio_format: 'mp3',
          }),
        );
      }
    });
    
    APIDocs
    Transcription/docs/api/transcribe
    Chat completionsPOST /v1/chat/completions
    Realtime TTS/docs/realtime-tts
    Realtime sessions/docs/api/voicebot-livekit

    Was this page helpful?