Audio Streaming

Receiving Audio

Use the async iterator audio_stream() to receive incoming audio chunks from the caller:

1 async for audio_chunk in call.audio_stream():
2     await process(audio_chunk)

Each audio_chunk is a bytes object containing raw PCM audio data.

Sending Audio

Use send_audio() to queue audio data for playback to the caller:

1 await call.send_audio(pcm_audio_bytes)

Audio is placed in an internal ConcurrentByteBuffer. The SDK sends data to the server only when the server requests it (pull-based flow control). This ensures smooth playback without jitter.

Audio Format

All audio sent to and received from AgenTao must use the following format:

Property	Value
Encoding	PCM (raw, uncompressed)
Bit Depth	16-bit (signed integer)
Byte Order	Little-endian
Channels	Mono (single channel)
Sample Rate	24,000 Hz (24 kHz)
MIME Type	`audio/pcm;rate=24000`

This means each audio sample is a 16-bit signed integer in little-endian byte order, producing 48,000 bytes per second of audio (24,000 samples x 2 bytes per sample).

This format is consistent across:

AgenTao media connections (both send and receive)
Google Gemini native audio
Deepgram speech-to-text input

No headers, containers, or codecs are involved. The audio data is raw PCM bytes. If you are generating audio from a TTS engine or other source, make sure to strip any file headers (e.g., WAV headers) before sending.

Buffer Management

1 # Check buffer size
2 size = call.get_send_audio_buffer_size()
3 
4 # Clear all queued audio (for interruption handling)
5 await call.clear_send_audio_buffer()

Why Use a Buffer?

Smooth Playback - Prevents audio jitter by maintaining a steady supply of data for the server
Flow Control - Automatically handles the rate at which audio should be sent
Interruption Handling - If your AI model gets interrupted (e.g., via a Gemini Live interruption event), you can instantly clear the buffer to stop any pending audio from being played

Configuring Buffer Size

The public 0.24.0 docs emphasize the factory helpers for client setup:

1 config = AgenTaoClientConfig.sandbox(
2     api_key="YOUR_API_KEY",
3     connector_uuid="YOUR_APP_UUID",
4     sample_rate=24000,
5 )

Handling Interruptions

When your AI model is interrupted (e.g., the caller starts talking while the AI is responding), clear the audio buffer to immediately stop playback:

1 if event.interrupted:
2     await call.clear_send_audio_buffer()
3     # Now safe to start sending new audio

This prevents stale audio from playing after the model has already moved on to a new response. Both the Google GenAI SDK and Google ADK surface interruption events that you can use to trigger this.

Next Steps

Call Commands - Commands that control the call
Event Handling - Listen for call events
Google GenAI Integration - Full bidirectional audio example with Gemini
Google ADK Integration - Multi-agent audio pipeline with ADK