Live Transcription
VALSEA provides a real-time speech-to-text API via WebSocket, allowing you to stream audio and receive transcriptions with low latency.
Connection
Endpoint: wss://api.valsea.ai/v1/realtime
Authentication
You must authenticate the WebSocket connection by passing your API key in the HTTP headers during the handshake.
Headers:
Authorization:Bearer YOUR_API_KEY(Recommended)X-API-Key:YOUR_API_KEY(Supported)
For browser-based clients where custom headers might be restricted, ensure your library supports header injection, or reach out to support for query parameter options.
Message Flow
- Connect: Client establishes WebSocket connection.
- Session Created: Server sends
session.createdevent. - Start Session: Client sends
session.startto configure language and model. - Stream Audio: Client sends
audio.appendmessages with base64-encoded PCM16 audio chunks. - Receive Transcripts: Server streams
transcript.partialandtranscript.finalevents. - Commit Audio: Client sends
audio.commitwhen a user stops speaking (optional/VAD dependent). - Stop Session: Client sends
session.stopto end the session.
Partial vs Final (Important)
RTT emits two transcript event types for each utterance:
transcript.partial: low-latency, in-progress text. This can change as more audio arrives.transcript.final: stable text for a completed segment. Treat this as the committed result.
Recommended client behavior:
- Keep a temporary
currentPartialstring fortranscript.partial. - Append only
transcript.finalto your persisted transcript history. - Clear
currentPartialwhen you receive a matchingtranscript.final.
Do not persist partial text as final output. Partials are intentionally mutable and may be revised by the engine before a final segment is produced.
Client Messages
session.start
Initialize the session with configuration.
{
"type": "session.start",
"model": "valsea-rtt",
"hint_text": "Optional context or vocabulary",
"enable_correction": true,
"language": "singlish"
}
| Field | Type | Description |
|---|---|---|
model | string | Model to use (e.g., valsea-rtt). |
hint_text | string | Optional list of words or context to improve accuracy. |
enable_correction | boolean | Enable post-processing for grammar/language correction (default: true). |
language | string | Language hint for correction (e.g., singlish, english, chinese, korean). |
audio.append
Send audio data.
{
"type": "audio.append",
"audio": "BASE64_ENCODED_PCM16_DATA"
}
- Format: Raw PCM 16-bit, 16kHz (recommended), mono.
- Encoding: Base64 string.
audio.commit
Signal the end of a speech segment (e.g., VAD triggered silence).
{
"type": "audio.commit"
}
session.stop
End the session gracefully.
{
"type": "session.stop"
}
Server Messages
session.created
Sent immediately upon connection.
{
"type": "session.created",
"sessionId": "rtt_...",
"supported_models": ["valsea-rtt"]
}
session.ready
Sent when the backend engine is connected and ready to receive audio.
{
"type": "session.ready",
"sessionId": "rtt_..."
}
transcript.partial
Intermediate transcription results (low latency, may change).
{
"type": "transcript.partial",
"text": "Hello world",
"isFinal": false,
"timestampMs": 1230
}
transcript.final
Finalized text for a speech segment.
{
"type": "transcript.final",
"text": "Hello, world.",
"raw_text": "hello world",
"isFinal": true,
"timestampMs": 2500,
"corrections": []
}
error
Sent when an error occurs.
{
"type": "error",
"code": "INVALID_MESSAGE",
"message": "Failed to parse message"
}
Event Handling Pattern
Use this pattern to avoid duplicated or unstable transcript content:
let currentPartial = '';
const finalSegments = [];
ws.on('message', (raw) => {
const msg = JSON.parse(raw);
if (msg.type === 'transcript.partial') {
currentPartial = msg.text || '';
}
if (msg.type === 'transcript.final') {
finalSegments.push(msg.text || '');
currentPartial = '';
}
});
Example (Node.js)
const WebSocket = require('ws');
const fs = require('fs');
const ws = new WebSocket('wss://api.valsea.ai/v1/realtime', {
headers: { 'X-API-Key': 'YOUR_KEY' },
});
ws.on('open', () => {
// 1. Configure Session
ws.send(
JSON.stringify({
type: 'session.start',
model: 'valsea-singlish-rtt',
}),
);
});
ws.on('message', (data) => {
const msg = JSON.parse(data);
if (msg.type === 'session.ready') {
// 2. Start streaming audio (example)
const audioStream = fs.createReadStream('audio.raw');
audioStream.on('data', (chunk) => {
ws.send(
JSON.stringify({
type: 'audio.append',
audio: chunk.toString('base64'),
}),
);
});
} else if (msg.type === 'transcript.final') {
console.log('Final:', msg.text);
}
});