> ## Documentation Index > Fetch the complete documentation index at: https://docs.zerotwo.ai/llms.txt > Use this file to discover all available pages before exploring further. # Tokens and Context Limits > Understand token usage, context windows, and how to optimize your prompts for different AI models Understanding tokens and context limits is essential for effective use of AI models in ZeroTwo. Tokens determine how much text can be processed in a single conversation, and managing them efficiently ensures you get the best results. ## What are tokens? Tokens are the fundamental units that AI models use to process text. A token can be a word, part of a word, or even punctuation. **Token approximation:** * 1 token ≈ 4 characters in English * 1 token ≈ ¾ of a word on average * 100 tokens ≈ 75 words * 1,000 tokens ≈ 750 words or 1-2 pages of text ### Examples of tokenization ```text theme={null} "Hello world" = 2 tokens - "Hello" = 1 token - " world" = 1 token (note the space) ``` ```text theme={null} "artificial intelligence" = 3 tokens - "art" = 1 token - "ificial" = 1 token - " intelligence" = 1 token ``` ```javascript theme={null} function greet() { return "Hello"; } // Approximately 10-12 tokens const user = { name: "John", age: 30 }; // Approximately 15-17 tokens ``` ```text theme={null} "user@example.com" = 5-6 tokens - "user" = 1 token - "@" = 1 token - "example" = 1 token - "." = 1 token - "com" = 1 token ``` Code, special characters, and non-English text typically use more tokens per character than plain English text. ## Context windows by model Different models support different context window sizes. The context window is the total amount of tokens a model can process, including both input and output. ### Text models OpenAI's flagship model with extended context support. Suitable for analyzing long documents, large codebases, or extended conversations. Anthropic's extended context model. Excellent for processing entire books, comprehensive code reviews, or long research documents. Google's ultra-long context model. Can process multiple large documents, entire codebases, or very long conversation histories simultaneously. Fast, cost-effective model with substantial context. Good balance for most tasks. Quick responses with extended context. Ideal for rapid iterations over medium-sized contexts. Reasoning-focused models with extended context for complex problem-solving. ### Token distribution The context window is shared between input and output: ```text theme={null} Total context window = Input tokens + Output tokens Example with GPT-4o (128K tokens): - System prompt: 1,000 tokens - Custom instructions: 500 tokens - Conversation history: 10,000 tokens - Your current message: 2,000 tokens - Available for response: 114,500 tokens ``` If your conversation uses too many input tokens, the model has less space for generating a response. Very long contexts may result in truncated outputs. ## What consumes tokens? Every component of your conversation uses tokens: ZeroTwo's base instructions that define model behavior. **Typical usage**: 500-2,000 tokens Your personal preferences and [custom instructions](/prompts/custom-instructions-patterns). **Typical usage**: 100-1,000 tokens Previous messages in the current conversation thread. **Growth**: Increases with each message exchange Content from uploaded files, images, or documents. **Variable**: Can be significant for large files or many images Results from [web search](/tools/web-search), [code interpreter](/tools/code-interpreter), or other tools. **Variable**: Depends on tool usage and results Your current prompt or question. **Variable**: Depends on complexity and length The AI's generated response. **Variable**: Longer responses use more tokens ## Token optimization strategies ### For prompts Remove filler words while maintaining clarity: ```text theme={null} ❌ Token-heavy (25 tokens): "I was wondering if you could possibly help me understand how I might be able to implement a feature that allows users to log in." ✅ Optimized (15 tokens): "How do I implement user login functionality?" ``` ```text theme={null} ❌ Redundant (30 tokens): "Please create a function. The function should validate email addresses. The email validation should check if the email is valid." ✅ Concise (18 tokens): "Create a function to validate email addresses and return true if valid." ``` When sharing code, include only relevant portions: ```javascript theme={null} // ❌ Sharing entire 500-line file (5,000+ tokens) // ✅ Share relevant function (50 tokens) function processUser(user) { // ... relevant code only } ``` For long contexts, provide summaries: ```text theme={null} ❌ Paste entire 10-page document (7,500 tokens) ✅ Summarize key points (500 tokens): "Document summary: Client wants an e-commerce platform with: - Multi-vendor marketplace - Real-time inventory sync - Mobile-first design - Budget: $50K, Timeline: 3 months" ``` ### For conversations **Managing long conversations:** 1. **Start fresh for new topics**: Begin a new conversation for unrelated topics 2. **Summarize history**: Manually summarize previous context when starting a new thread 3. **Use assistants**: Create [specialized assistants](/assistants/create) with task-specific contexts 4. **Leverage memory**: Enable [Memory](/tools/memory) to persist important context across conversations ### For file attachments Extract relevant sections before uploading: * Upload specific chapters, not entire books * Include relevant code files, not entire repositories * Crop images to relevant areas Text files are more token-efficient than PDFs or images with text. Reduce image resolution for analysis tasks that don't require high detail. ## Token limits and truncation ### Input truncation When your input exceeds the context window: **What happens:** * Older messages may be automatically truncated * File content might be summarized or omitted * Tool outputs could be condensed * You'll receive a warning when approaching limits ### Output truncation When the model's response would exceed remaining tokens: ```text theme={null} If response would be 15,000 tokens but only 10,000 remain: - Response stops mid-generation - You can ask the model to "continue" in the next message - Consider using a model with larger context window ``` ## Estimating token usage ### Quick estimation ```javascript Character count theme={null} // Rough estimate const estimatedTokens = text.length / 4; "This is a sample text" // 21 characters ÷ 4 ≈ 5 tokens (actual: 5) ``` ```javascript Word count theme={null} // Rough estimate const words = text.split(' ').length; const estimatedTokens = words * 1.3; "This is a sample text" // 5 words × 1.3 ≈ 6.5 tokens (actual: 5) ``` ### More accurate estimation For precise token counting, use tokenizer tools specific to the model provider: * **OpenAI**: tiktoken library * **Anthropic**: Claude tokenizer * **Google**: Gemini tokenizer ZeroTwo may display token usage information in the interface for active conversations, helping you monitor your context usage in real-time. ## Strategies by task type ### Code generation ```text theme={null} ✅ Efficient approach: "Create a React hook for form validation with Zod schema" ❌ Token-heavy approach: "I'm building a React application and I need form validation. I want to use Zod for schema validation. Can you help me create a custom hook that handles validation? It should work with multiple form fields and show error messages..." [continues for 200+ tokens] ``` ### Long document analysis Upload the full document and ask specific questions. Use models with larger context windows (Claude 3.5 Sonnet, Gemini 1.5 Pro) or break analysis into sections. Consider [Deep Research](/tools/deep-research) for comprehensive multi-document analysis. ### Debugging ```text theme={null} ✅ Efficient debugging prompt (100 tokens): "This React component throws 'Cannot read property id of undefined'. Code: [paste minimal reproduction] What's the issue?" ❌ Inefficient (500+ tokens): [Paste entire component + all imports + parent components + detailed description of what you tried + full error stack trace + unrelated code] ``` ## Model selection based on token needs Use fast models: GPT-4o-mini, Claude Haiku, or GPT-3.5-turbo **Best for**: Quick questions, simple code generation, brief explanations Use: GPT-4o, Claude 3.5 Sonnet **Best for**: Code reviews, document analysis, extended conversations Use: Claude 3.5 Sonnet, o1, o3 **Best for**: Large codebase analysis, multiple documents, long research Use: Gemini 1.5 Pro **Best for**: Entire books, massive codebases, comprehensive research projects ## Token costs and efficiency While ZeroTwo abstracts away direct token costs, being efficient with tokens provides benefits: **Benefits of token efficiency:** * Faster response times (less to process) * More room for detailed responses * Ability to maintain longer conversations * Better performance on complex tasks ## Advanced techniques ### Summarization chains For very long tasks, chain conversations: "Analyze section 1 of this document and provide key findings." "Now analyze section 2, considering findings from section 1." "Based on analysis of all sections, provide comprehensive recommendations." ### Iterative refinement Build complex outputs incrementally: ```text theme={null} Round 1: "Create basic structure for user authentication" [Save response] Round 2: "Add error handling to the previous code" [Model remembers context] Round 3: "Add JWT token generation" [Continues building on previous work] ``` ### External storage For reference data: * Store large documentation in [Memory](/tools/memory) * Use external files or databases * Reference URLs for documentation * Summarize large contexts ## Monitoring token usage **Best practices:** * Monitor conversation length regularly * Start new threads for unrelated topics * Use [assistants](/assistants/create) for specialized, token-efficient workflows * Enable [Memory](/tools/memory) to reduce repeated context * Choose appropriate models for your token needs ## Next steps Optimize persistent instructions Choose the right model for your needs Reduce token usage with Memory Write token-efficient structured prompts