> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zerotwo.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Audio Generation Models

> AI audio generation models available in ZeroTwo Studio — music, voice synthesis, and sound effects.

ZeroTwo's audio Studio provides access to multiple AI audio generation models, each optimized for different types of audio output. This page explains the model types available and how to choose between them.

<Info>
  AI audio generation technology is evolving rapidly. New models are added to ZeroTwo regularly. Check the model dropdown in the audio Studio for the current full list.
</Info>

## Model categories

Audio generation models in ZeroTwo fall into three main categories:

### Music generation models

Specialized for generating original music from text descriptions. These models understand genre, mood, instrumentation, tempo, and musical structure.

**Best for:**

* Background music for videos, presentations, and apps
* Ambient soundscapes and atmospheric audio
* Jingles, intros, and branded audio pieces
* Specific genre requests (jazz, classical, electronic, etc.)

**Prompt approach:** Describe genre, mood, instruments, tempo, and duration. Example: `"Upbeat electronic background music, 120 BPM, synthesizer melody, suitable for a tech product demo, 60 seconds"`

### Voice synthesis / text-to-speech models

Generate spoken audio from text input. These models produce natural-sounding narration in various voices and styles.

**Best for:**

* AI narration for videos and presentations
* Podcast-style spoken content
* Accessibility audio (screen reader-style narration)
* Character voices for creative projects

**Prompt approach:** Provide the text to be spoken and describe the voice characteristics — tone, pace, gender, accent, emotional register. Example: `"Narrate the following in a warm, professional female voice at a moderate pace: [text]"`

### Sound effects models

Generate specific, discrete audio events — clicks, chimes, environment sounds, and other effects.

**Best for:**

* UI sounds and notification tones
* Environmental and ambient effects
* Production sound design
* Game audio assets

**Prompt approach:** Describe the specific sound event as precisely as possible. Example: `"A single wooden door knock, two knocks, natural reverb, interior setting"`

## Choosing the right model

| Use case                     | Model type to choose              |
| ---------------------------- | --------------------------------- |
| Background music for a video | Music generation                  |
| Voiceover narration          | Voice synthesis / TTS             |
| App notification sound       | Sound effects                     |
| Ambient environment audio    | Music generation or sound effects |
| Podcast intro                | Music generation                  |
| AI-read article              | Voice synthesis / TTS             |

<Tip>
  For music, describing the genre and mood is the most important part of the prompt. For voice, the most important elements are the text content and the voice tone/style description. For sound effects, precision about the specific sound event produces the best results.
</Tip>

## Output formats

| Format  | Best for                                                 |
| ------- | -------------------------------------------------------- |
| **MP3** | Web sharing, social media, general use                   |
| **WAV** | Professional production, lossless quality, video editing |

Download format options depend on the model selected. MP3 is available from all models; WAV is available from higher-quality models.

## Prompting by model type

Each audio model type responds to different prompt elements:

### Prompting music generation models

The most important elements for music prompts are genre, mood, and instrumentation:

| Prompt element | Examples                                                                         |
| -------------- | -------------------------------------------------------------------------------- |
| Genre          | `jazz`, `classical`, `electronic`, `ambient`, `folk`, `hip-hop`, `cinematic`     |
| Mood           | `uplifting`, `tense`, `melancholic`, `energetic`, `peaceful`, `mysterious`       |
| Instruments    | `piano`, `acoustic guitar`, `orchestral strings`, `synthesizer`, `drums`, `bass` |
| Tempo          | `120 BPM`, `slow and deliberate`, `fast-paced`, `moderate tempo`                 |
| Duration       | `30 seconds`, `60 seconds`, `2 minutes`                                          |
| Purpose        | `background music for a product video`, `podcast intro`, `game menu music`       |

Strong music prompt: `Cinematic orchestral piece with rising strings and dramatic percussion, building tension over 30 seconds, suitable for a movie trailer`

### Prompting voice synthesis models

Voice synthesis prompts focus on the text to be spoken and the voice characteristics:

| Prompt element        | Examples                                                                                |
| --------------------- | --------------------------------------------------------------------------------------- |
| Voice characteristics | `warm and friendly`, `authoritative and professional`, `energetic`, `calm and soothing` |
| Gender / age          | `male voice`, `female voice`, `neutral`, `mature`, `young`                              |
| Pace                  | `slow and deliberate`, `conversational pace`, `brisk and confident`                     |
| Accent / style        | `American English`, `British accent`, `news anchor style`                               |

Strong voice prompt: `Read the following in a warm, professional female voice at a conversational pace, with natural pauses: [your text here]`

### Prompting sound effects models

Sound effect prompts should be as specific as possible about the exact sound event:

| Prompt element       | Examples                                                        |
| -------------------- | --------------------------------------------------------------- |
| Sound event          | `door knock`, `coin drop`, `camera click`, `notification chime` |
| Material / character | `wooden`, `metallic`, `glass`, `soft`, `sharp`                  |
| Environment          | `interior`, `outdoor`, `reverberant space`, `dry studio`        |
| Duration             | `brief 1-second burst`, `3-second sustained`                    |

Strong sound effect prompt: `A single metallic coin dropped onto a hardwood floor, brief ring and roll, indoor environment with slight room reverb`

## Model updates

ZeroTwo's audio model library is updated as new models become available. Check the ZeroTwo changelog for announcements about newly added audio models.

## Related

<CardGroup cols={2}>
  <Card title="Creating audio" icon="music" href="/studio/audio/create-audio">
    Step-by-step guide and prompt examples for all audio types.
  </Card>

  <Card title="Audio troubleshooting" icon="alert-triangle" href="/studio/audio/troubleshooting">
    Fix common issues with audio generation.
  </Card>

  <Card title="Studio overview" icon="sparkles" href="/studio/overview">
    Overview of all three Studio sections — images, video, and audio.
  </Card>

  <Card title="Video generation" icon="video" href="/studio/video/overview">
    Generate AI video clips from text descriptions.
  </Card>
</CardGroup>