Google GenAI SDK (Gemini Live)

This guide shows how to integrate the AgenTao SDK with the Google GenAI SDK for bidirectional real-time audio streaming with Gemini’s native audio model.

Overview

The integration bridges two real-time streams:

Caller audio -> Gemini: Incoming phone audio is forwarded to a Gemini Live session
Gemini audio -> Caller: Gemini’s voice responses are sent back to the caller via send_audio()

Interruption handling is built in: when Gemini detects the user is speaking over the model, the outgoing audio buffer is cleared instantly.

Full Example

1 import asyncio
2 import os
3 from google import genai
4 from google.genai import types
5 from agentao_sdk import AgenTaoClient, AgenTaoClientConfig, ActiveCall
6 import agentao_sdk.events as events
7 
8 gemini_client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
9 MODEL = "gemini-2.5-flash-native-audio-preview-12-2025"
10 
11 async def start_gemini_session(call: ActiveCall):
12     await call.answer()
13 
14     async with gemini_client.aio.live.connect(model=MODEL) as session:
15         async def stream_to_gemini():
16             async for chunk in call.audio_stream():
17                 await session.send_realtime_input(
18                     audio=types.Blob(
19                         data=chunk, mime_type="audio/pcm;rate=24000"
20                     )
21                 )
22 
23         async def receive_from_gemini():
24             async for response in session.receive():
25                 if content := response.server_content:
26                     if content.interrupted:
27                         await call.clear_send_audio_buffer()
28                     elif content.model_turn:
29                         for part in content.model_turn.parts:
30                             if part.inline_data:
31                                 await call.send_audio(part.inline_data.data)
32 
33         await asyncio.gather(stream_to_gemini(), receive_from_gemini())
34 
35 async def main():
36     config = AgenTaoClientConfig.sandbox(
37         api_key=os.getenv("WSS_API_KEY"),
38         connector_uuid=os.getenv("WSS_CONNECTOR_UUID"),
39         sample_rate=24000,
40     )
41 
42     async with AgenTaoClient(config) as client:
43         @client.on(events.INCOMING_CALL)
44         async def on_call(call: ActiveCall):
45             await start_gemini_session(call)
46 
47         await client.run_forever()
48 
49 if __name__ == "__main__":
50     asyncio.run(main())

How It Works

Stream to Gemini

The stream_to_gemini() coroutine reads audio chunks from call.audio_stream() and forwards them to the Gemini Live session using send_realtime_input(). The audio is wrapped in a types.Blob with the PCM MIME type.

Receive from Gemini

The receive_from_gemini() coroutine listens for Gemini responses:

Interruption: When content.interrupted is True, the caller has started speaking over the model. clear_send_audio_buffer() is called to immediately stop any queued audio.
Model audio: When content.model_turn contains inline_data, the raw audio bytes are sent to the caller via send_audio().

Concurrency

Both coroutines run concurrently via asyncio.gather(). This allows the system to simultaneously listen to the caller and send AI responses without blocking.

Environment Variables

Variable	Description
`GEMINI_API_KEY`	Google API key with Gemini API access
`WSS_API_KEY`	AgenTao API key
`WSS_CONNECTOR_UUID`	AgenTao connector UUID

Audio Format

Both AgenTao and Gemini native audio use PCM 16-bit linear, 24kHz, mono (audio/pcm;rate=24000). No transcoding is needed.

Next Steps

Google ADK Integration - For structured multi-agent orchestration
Audio Streaming - Buffer management and interruption handling
Use Cases - Apply this integration to real-world scenarios