> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zerotwo.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Tokens and Context Limits

> Understand token usage, context windows, and how to optimize your prompts for different AI models

Understanding tokens and context limits is essential for effective use of AI models in ZeroTwo. Tokens determine how much text can be processed in a single conversation, and managing them efficiently ensures you get the best results.

## What are tokens?

Tokens are the fundamental units that AI models use to process text. A token can be a word, part of a word, or even punctuation.

<Info>
  **Token approximation:**

  * 1 token ≈ 4 characters in English
  * 1 token ≈ ¾ of a word on average
  * 100 tokens ≈ 75 words
  * 1,000 tokens ≈ 750 words or 1-2 pages of text
</Info>

### Examples of tokenization

<Tabs>
  <Tab title="Simple words">
    ```text theme={null}
    "Hello world" = 2 tokens
    - "Hello" = 1 token
    - " world" = 1 token (note the space)
    ```
  </Tab>

  <Tab title="Longer words">
    ```text theme={null}
    "artificial intelligence" = 3 tokens
    - "art" = 1 token
    - "ificial" = 1 token
    - " intelligence" = 1 token
    ```
  </Tab>

  <Tab title="Code">
    ```javascript theme={null}
    function greet() { return "Hello"; }
    // Approximately 10-12 tokens

    const user = { name: "John", age: 30 };
    // Approximately 15-17 tokens
    ```
  </Tab>

  <Tab title="Special characters">
    ```text theme={null}
    "user@example.com" = 5-6 tokens
    - "user" = 1 token
    - "@" = 1 token
    - "example" = 1 token
    - "." = 1 token
    - "com" = 1 token
    ```
  </Tab>
</Tabs>

<Tip>
  Code, special characters, and non-English text typically use more tokens per character than plain English text.
</Tip>

## Context windows by model

Different models support different context window sizes. The context window is the total amount of tokens a model can process, including both input and output.

### Text models

<ResponseField name="GPT-4o" type="128,000 tokens">
  OpenAI's flagship model with extended context support. Suitable for analyzing long documents, large codebases, or extended conversations.
</ResponseField>

<ResponseField name="Claude 3.5 Sonnet" type="200,000 tokens">
  Anthropic's extended context model. Excellent for processing entire books, comprehensive code reviews, or long research documents.
</ResponseField>

<ResponseField name="Gemini 1.5 Pro" type="1,000,000 tokens">
  Google's ultra-long context model. Can process multiple large documents, entire codebases, or very long conversation histories simultaneously.
</ResponseField>

<ResponseField name="GPT-4o-mini" type="128,000 tokens">
  Fast, cost-effective model with substantial context. Good balance for most tasks.
</ResponseField>

<ResponseField name="Claude 3.5 Haiku" type="200,000 tokens">
  Quick responses with extended context. Ideal for rapid iterations over medium-sized contexts.
</ResponseField>

<ResponseField name="o1 and o3 (reasoning models)" type="200,000 tokens">
  Reasoning-focused models with extended context for complex problem-solving.
</ResponseField>

### Token distribution

The context window is shared between input and output:

```text theme={null}
Total context window = Input tokens + Output tokens

Example with GPT-4o (128K tokens):
- System prompt: 1,000 tokens
- Custom instructions: 500 tokens
- Conversation history: 10,000 tokens
- Your current message: 2,000 tokens
- Available for response: 114,500 tokens
```

<Warning>
  If your conversation uses too many input tokens, the model has less space for generating a response. Very long contexts may result in truncated outputs.
</Warning>

## What consumes tokens?

Every component of your conversation uses tokens:

<Steps>
  <Step title="System prompt">
    ZeroTwo's base instructions that define model behavior.

    **Typical usage**: 500-2,000 tokens
  </Step>

  <Step title="Custom instructions">
    Your personal preferences and [custom instructions](/prompts/custom-instructions-patterns).

    **Typical usage**: 100-1,000 tokens
  </Step>

  <Step title="Conversation history">
    Previous messages in the current conversation thread.

    **Growth**: Increases with each message exchange
  </Step>

  <Step title="File attachments">
    Content from uploaded files, images, or documents.

    **Variable**: Can be significant for large files or many images
  </Step>

  <Step title="Tool outputs">
    Results from [web search](/tools/web-search), [code interpreter](/tools/code-interpreter), or other tools.

    **Variable**: Depends on tool usage and results
  </Step>

  <Step title="Your message">
    Your current prompt or question.

    **Variable**: Depends on complexity and length
  </Step>

  <Step title="Model response">
    The AI's generated response.

    **Variable**: Longer responses use more tokens
  </Step>
</Steps>

## Token optimization strategies

### For prompts

<AccordionGroup>
  <Accordion title="Be concise but specific">
    Remove filler words while maintaining clarity:

    ```text theme={null}
    ❌ Token-heavy (25 tokens):
    "I was wondering if you could possibly help me understand how I might be able to implement a feature that allows users to log in."

    ✅ Optimized (15 tokens):
    "How do I implement user login functionality?"
    ```
  </Accordion>

  <Accordion title="Remove redundancy">
    ```text theme={null}
    ❌ Redundant (30 tokens):
    "Please create a function. The function should validate email addresses. The email validation should check if the email is valid."

    ✅ Concise (18 tokens):
    "Create a function to validate email addresses and return true if valid."
    ```
  </Accordion>

  <Accordion title="Use code efficiently">
    When sharing code, include only relevant portions:

    ```javascript theme={null}
    // ❌ Sharing entire 500-line file (5,000+ tokens)

    // ✅ Share relevant function (50 tokens)
    function processUser(user) {
      // ... relevant code only
    }
    ```
  </Accordion>

  <Accordion title="Summarize when possible">
    For long contexts, provide summaries:

    ```text theme={null}
    ❌ Paste entire 10-page document (7,500 tokens)

    ✅ Summarize key points (500 tokens):
    "Document summary: Client wants an e-commerce platform with:
    - Multi-vendor marketplace
    - Real-time inventory sync
    - Mobile-first design
    - Budget: $50K, Timeline: 3 months"
    ```
  </Accordion>
</AccordionGroup>

### For conversations

<Tip>
  **Managing long conversations:**

  1. **Start fresh for new topics**: Begin a new conversation for unrelated topics
  2. **Summarize history**: Manually summarize previous context when starting a new thread
  3. **Use assistants**: Create [specialized assistants](/assistants/create) with task-specific contexts
  4. **Leverage memory**: Enable [Memory](/tools/memory) to persist important context across conversations
</Tip>

### For file attachments

<Steps>
  <Step title="Preprocess large files">
    Extract relevant sections before uploading:

    * Upload specific chapters, not entire books
    * Include relevant code files, not entire repositories
    * Crop images to relevant areas
  </Step>

  <Step title="Use text formats when possible">
    Text files are more token-efficient than PDFs or images with text.
  </Step>

  <Step title="Compress images">
    Reduce image resolution for analysis tasks that don't require high detail.
  </Step>
</Steps>

## Token limits and truncation

### Input truncation

When your input exceeds the context window:

<Warning>
  **What happens:**

  * Older messages may be automatically truncated
  * File content might be summarized or omitted
  * Tool outputs could be condensed
  * You'll receive a warning when approaching limits
</Warning>

### Output truncation

When the model's response would exceed remaining tokens:

```text theme={null}
If response would be 15,000 tokens but only 10,000 remain:
- Response stops mid-generation
- You can ask the model to "continue" in the next message
- Consider using a model with larger context window
```

## Estimating token usage

### Quick estimation

<CodeGroup>
  ```javascript Character count theme={null}
  // Rough estimate
  const estimatedTokens = text.length / 4;

  "This is a sample text" 
  // 21 characters ÷ 4 ≈ 5 tokens (actual: 5)
  ```

  ```javascript Word count theme={null}
  // Rough estimate
  const words = text.split(' ').length;
  const estimatedTokens = words * 1.3;

  "This is a sample text"
  // 5 words × 1.3 ≈ 6.5 tokens (actual: 5)
  ```
</CodeGroup>

### More accurate estimation

For precise token counting, use tokenizer tools specific to the model provider:

* **OpenAI**: tiktoken library
* **Anthropic**: Claude tokenizer
* **Google**: Gemini tokenizer

<Info>
  ZeroTwo may display token usage information in the interface for active conversations, helping you monitor your context usage in real-time.
</Info>

## Strategies by task type

### Code generation

```text theme={null}
✅ Efficient approach:
"Create a React hook for form validation with Zod schema"

❌ Token-heavy approach:
"I'm building a React application and I need form validation. 
I want to use Zod for schema validation. Can you help me create 
a custom hook that handles validation? It should work with 
multiple form fields and show error messages..."
[continues for 200+ tokens]
```

### Long document analysis

<Tabs>
  <Tab title="For shorter docs (< 50K tokens)">
    Upload the full document and ask specific questions.
  </Tab>

  <Tab title="For longer docs (> 50K tokens)">
    Use models with larger context windows (Claude 3.5 Sonnet, Gemini 1.5 Pro) or break analysis into sections.
  </Tab>

  <Tab title="For multiple documents">
    Consider [Deep Research](/tools/deep-research) for comprehensive multi-document analysis.
  </Tab>
</Tabs>

### Debugging

```text theme={null}
✅ Efficient debugging prompt (100 tokens):
"This React component throws 'Cannot read property id of undefined'. 
Code: [paste minimal reproduction]
What's the issue?"

❌ Inefficient (500+ tokens):
[Paste entire component + all imports + parent components + 
detailed description of what you tried + full error stack trace + 
unrelated code]
```

## Model selection based on token needs

<CardGroup cols={2}>
  <Card title="Short contexts (< 10K tokens)" icon="gauge">
    Use fast models: GPT-4o-mini, Claude Haiku, or GPT-3.5-turbo

    **Best for**: Quick questions, simple code generation, brief explanations
  </Card>

  <Card title="Medium contexts (10K-50K tokens)" icon="file">
    Use: GPT-4o, Claude 3.5 Sonnet

    **Best for**: Code reviews, document analysis, extended conversations
  </Card>

  <Card title="Long contexts (50K-200K tokens)" icon="files">
    Use: Claude 3.5 Sonnet, o1, o3

    **Best for**: Large codebase analysis, multiple documents, long research
  </Card>

  <Card title="Ultra-long contexts (> 200K tokens)" icon="database">
    Use: Gemini 1.5 Pro

    **Best for**: Entire books, massive codebases, comprehensive research projects
  </Card>
</CardGroup>

## Token costs and efficiency

While ZeroTwo abstracts away direct token costs, being efficient with tokens provides benefits:

<Check>
  **Benefits of token efficiency:**

  * Faster response times (less to process)
  * More room for detailed responses
  * Ability to maintain longer conversations
  * Better performance on complex tasks
</Check>

## Advanced techniques

### Summarization chains

For very long tasks, chain conversations:

<Steps>
  <Step title="Analyze first section">
    "Analyze section 1 of this document and provide key findings."
  </Step>

  <Step title="Analyze subsequent sections">
    "Now analyze section 2, considering findings from section 1."
  </Step>

  <Step title="Synthesize results">
    "Based on analysis of all sections, provide comprehensive recommendations."
  </Step>
</Steps>

### Iterative refinement

Build complex outputs incrementally:

```text theme={null}
Round 1: "Create basic structure for user authentication"
[Save response]

Round 2: "Add error handling to the previous code"
[Model remembers context]

Round 3: "Add JWT token generation"
[Continues building on previous work]
```

### External storage

For reference data:

* Store large documentation in [Memory](/tools/memory)
* Use external files or databases
* Reference URLs for documentation
* Summarize large contexts

## Monitoring token usage

<Tip>
  **Best practices:**

  * Monitor conversation length regularly
  * Start new threads for unrelated topics
  * Use [assistants](/assistants/create) for specialized, token-efficient workflows
  * Enable [Memory](/tools/memory) to reduce repeated context
  * Choose appropriate models for your token needs
</Tip>

## Next steps

<CardGroup cols={2}>
  <Card title="Custom instructions" icon="sliders" href="/prompts/custom-instructions-patterns">
    Optimize persistent instructions
  </Card>

  <Card title="Models and providers" icon="brain" href="/overview/models-and-providers">
    Choose the right model for your needs
  </Card>

  <Card title="Memory system" icon="database" href="/tools/memory">
    Reduce token usage with Memory
  </Card>

  <Card title="Structured techniques" icon="sitemap" href="/prompts/structured-techniques">
    Write token-efficient structured prompts
  </Card>
</CardGroup>
