> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zerotwo.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Voice Overview

> Talk to ZeroTwo in real time using your microphone — no push-to-talk required, with transcripts saved automatically to your chat history.

ZeroTwo supports real-time voice conversations powered by WebRTC and the OpenAI Realtime API. Click the microphone icon in the prompt bar, start speaking, and ZeroTwo responds with a natural AI voice. The entire conversation is transcribed and saved to your chat history automatically — no extra steps required.

***

## How Voice works

ZeroTwo uses **WebRTC** for low-latency, bidirectional audio streaming directly in your browser. There is no file to upload and no processing delay between turns — the connection stays open for the entire session.

**Server-side Voice Activity Detection (VAD)** automatically detects when you start and stop speaking. You never need to press and hold a button. When you finish a sentence, ZeroTwo begins processing immediately.

```
You speak → VAD detects end of turn → ZeroTwo processes → ZeroTwo responds with voice
```

You can **interrupt ZeroTwo at any time** by simply speaking while it is responding. The current response stops and ZeroTwo addresses what you just said, making conversations feel natural rather than rigidly turn-based.

***

## 10 available voices

ZeroTwo offers 10 distinct AI voices powered by the OpenAI Realtime API:

| Voice       | Character                                  |
| ----------- | ------------------------------------------ |
| **Alloy**   | Balanced, neutral, versatile — the default |
| **Ash**     | Warm, conversational                       |
| **Ballad**  | Expressive, nuanced                        |
| **Cedar**   | Clear, professional, crisp                 |
| **Coral**   | Friendly, approachable, upbeat             |
| **Echo**    | Precise, sharp, technical                  |
| **Marin**   | Calm, smooth, measured                     |
| **Sage**    | Thoughtful, careful, wise                  |
| **Shimmer** | Bright, energetic, lively                  |
| **Verse**   | Natural, flowing, conversational           |

Change your voice in **Settings → Preferences → Voice**. See [Voice Options](/tools/voice/voice-options) for full descriptions and recommendations.

***

## Transcripts

Every voice session is fully transcribed. Both what you said and what ZeroTwo responded are saved as text in the chat history entry for that conversation. Transcripts are available immediately after each exchange.

<Info>
  Transcripts are saved automatically. You do not need to enable anything — just make sure you are logged in and not using a private/incognito browser session.
</Info>

***

## Use cases

<CardGroup cols={2}>
  <Card title="Hands-Free Work" icon="mic">
    Great for when your hands are occupied — cooking, commuting, exercising, or taking notes during a meeting.
  </Card>

  <Card title="Brainstorming" icon="zap">
    Thinking out loud is often faster than typing. Use Voice to explore ideas and let the transcript capture everything.
  </Card>

  <Card title="Accessibility" icon="check">
    Voice input removes barriers for users who find typing difficult or slow.
  </Card>

  <Card title="Language Practice" icon="globe">
    Have spoken conversations in a language you are learning — ZeroTwo can respond in kind.
  </Card>

  <Card title="Long Dictation" icon="volume-2">
    Dictate emails, documents, or notes at speaking pace — the transcript captures everything.
  </Card>

  <Card title="Quick Questions" icon="zap">
    Ask something fast without stopping to type — ideal for quick lookups while you're in the middle of something else.
  </Card>
</CardGroup>

***

## Audio technical specs

| Property             | Value                   |
| -------------------- | ----------------------- |
| Audio format         | PCM16                   |
| Sample rate          | 24 kHz                  |
| Channels             | Mono                    |
| Transport            | WebRTC (browser-native) |
| Transcription engine | Whisper-1               |

The PCM16 / 24 kHz format is optimized for real-time streaming — it prioritizes low latency while maintaining clear, intelligible speech. The mono channel reduces bandwidth requirements without meaningfully affecting voice quality for conversation.

***

## Plan availability

Voice is available on **all ZeroTwo plans**.

| Plan           | Voice Access |
| -------------- | ------------ |
| **Free**       | Included     |
| **Pro**        | Included     |
| **Pro 2x**     | Included     |
| **Plus Ultra** | Included     |
| **Business**   | Included     |

***

## Requirements

* A modern browser (Chrome, Edge, Firefox, or Safari)
* Microphone access granted to zerotwo.ai
* An HTTPS connection (all zerotwo.ai pages use HTTPS by default)
* A stable internet connection (WiFi recommended for best quality)

<Tip>
  For the best experience, use headphones. This prevents your microphone from picking up ZeroTwo's audio output, eliminating echo, and gives you cleaner audio quality overall.
</Tip>

***

## Quick Links

<CardGroup cols={3}>
  <Card title="Start a Voice Chat" icon="mic" href="/tools/voice/start-a-voice-chat">
    Step-by-step guide to your first voice conversation
  </Card>

  <Card title="Voice Options" icon="volume-2" href="/tools/voice/voice-options">
    10 voices, audio specs, and how to change your voice
  </Card>

  <Card title="Troubleshooting" icon="alert-triangle" href="/tools/voice/troubleshooting">
    Fix microphone, audio, and connection issues
  </Card>
</CardGroup>
