For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
  • Introduction
    • Overview
    • Installation
    • Developer Onboarding
    • Quick Start
  • Concepts
    • Architecture
    • Call States and Lifecycle
    • Call Commands
    • Audio Streaming
    • Event Handling
  • Integrations
    • Google GenAI SDK (Gemini Live)
    • Google ADK (Agent Development Kit)
  • Use Cases
    • After-hours Voicemail
    • Appointment Booking
    • Call Monitoring and Coaching
    • Database Lookup
    • Human Escalation
    • Interactive Notifications
  • Reference
    • API Reference
    • Error Handling
    • Advanced Topics
LogoLogo
On this page
  • Overview
  • Full Example
  • How It Works
  • Stream to Gemini
  • Receive from Gemini
  • Concurrency
  • Environment Variables
  • Audio Format
  • Next Steps
Integrations

Google GenAI SDK (Gemini Live)

||View as Markdown|
Was this page helpful?
Previous

Event Handling

Next

Google ADK (Agent Development Kit)

Built with

This guide shows how to integrate the AgenTao SDK with the Google GenAI SDK for bidirectional real-time audio streaming with Gemini’s native audio model.

Overview

The integration bridges two real-time streams:

  • Caller audio -> Gemini: Incoming phone audio is forwarded to a Gemini Live session
  • Gemini audio -> Caller: Gemini’s voice responses are sent back to the caller via send_audio()

Interruption handling is built in: when Gemini detects the user is speaking over the model, the outgoing audio buffer is cleared instantly.

Full Example

1import asyncio
2import os
3from google import genai
4from google.genai import types
5from agentao_sdk import AgenTaoClient, AgenTaoClientConfig, ActiveCall
6import agentao_sdk.events as events
7
8gemini_client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
9MODEL = "gemini-2.5-flash-native-audio-preview-12-2025"
10
11async def start_gemini_session(call: ActiveCall):
12 await call.answer()
13
14 async with gemini_client.aio.live.connect(model=MODEL) as session:
15 async def stream_to_gemini():
16 async for chunk in call.audio_stream():
17 await session.send_realtime_input(
18 audio=types.Blob(
19 data=chunk, mime_type="audio/pcm;rate=24000"
20 )
21 )
22
23 async def receive_from_gemini():
24 async for response in session.receive():
25 if content := response.server_content:
26 if content.interrupted:
27 await call.clear_send_audio_buffer()
28 elif content.model_turn:
29 for part in content.model_turn.parts:
30 if part.inline_data:
31 await call.send_audio(part.inline_data.data)
32
33 await asyncio.gather(stream_to_gemini(), receive_from_gemini())
34
35async def main():
36 config = AgenTaoClientConfig.sandbox(
37 api_key=os.getenv("WSS_API_KEY"),
38 connector_uuid=os.getenv("WSS_CONNECTOR_UUID"),
39 sample_rate=24000,
40 )
41
42 async with AgenTaoClient(config) as client:
43 @client.on(events.INCOMING_CALL)
44 async def on_call(call: ActiveCall):
45 await start_gemini_session(call)
46
47 await client.run_forever()
48
49if __name__ == "__main__":
50 asyncio.run(main())

How It Works

Stream to Gemini

The stream_to_gemini() coroutine reads audio chunks from call.audio_stream() and forwards them to the Gemini Live session using send_realtime_input(). The audio is wrapped in a types.Blob with the PCM MIME type.

Receive from Gemini

The receive_from_gemini() coroutine listens for Gemini responses:

  • Interruption: When content.interrupted is True, the caller has started speaking over the model. clear_send_audio_buffer() is called to immediately stop any queued audio.
  • Model audio: When content.model_turn contains inline_data, the raw audio bytes are sent to the caller via send_audio().

Concurrency

Both coroutines run concurrently via asyncio.gather(). This allows the system to simultaneously listen to the caller and send AI responses without blocking.

Environment Variables

VariableDescription
GEMINI_API_KEYGoogle API key with Gemini API access
WSS_API_KEYAgenTao API key
WSS_CONNECTOR_UUIDAgenTao connector UUID

Audio Format

Both AgenTao and Gemini native audio use PCM 16-bit linear, 24kHz, mono (audio/pcm;rate=24000). No transcoding is needed.

Next Steps

  • Google ADK Integration - For structured multi-agent orchestration
  • Audio Streaming - Buffer management and interruption handling
  • Use Cases - Apply this integration to real-world scenarios