Use the async iterator audio_stream() to receive incoming audio chunks from the caller:
Each audio_chunk is a bytes object containing raw PCM audio data.
Use send_audio() to queue audio data for playback to the caller:
Audio is placed in an internal ConcurrentByteBuffer. The SDK sends data to the server only when the server requests it (pull-based flow control). This ensures smooth playback without jitter.
All audio sent to and received from AgenTao must use the following format:
This means each audio sample is a 16-bit signed integer in little-endian byte order, producing 48,000 bytes per second of audio (24,000 samples x 2 bytes per sample).
This format is consistent across:
No headers, containers, or codecs are involved. The audio data is raw PCM bytes. If you are generating audio from a TTS engine or other source, make sure to strip any file headers (e.g., WAV headers) before sending.
The public 0.24.0 docs emphasize the factory helpers for client setup:
When your AI model is interrupted (e.g., the caller starts talking while the AI is responding), clear the audio buffer to immediately stop playback:
This prevents stale audio from playing after the model has already moved on to a new response. Both the Google GenAI SDK and Google ADK surface interruption events that you can use to trigger this.