What are tokens?
Tokens are the fundamental units that AI models use to process text. A token can be a word, part of a word, or even punctuation.Token approximation:
- 1 token ≈ 4 characters in English
- 1 token ≈ ¾ of a word on average
- 100 tokens ≈ 75 words
- 1,000 tokens ≈ 750 words or 1-2 pages of text
Examples of tokenization
- Simple words
- Longer words
- Code
- Special characters
Context windows by model
Different models support different context window sizes. The context window is the total amount of tokens a model can process, including both input and output.Text models
OpenAI’s flagship model with extended context support. Suitable for analyzing long documents, large codebases, or extended conversations.
Anthropic’s extended context model. Excellent for processing entire books, comprehensive code reviews, or long research documents.
Google’s ultra-long context model. Can process multiple large documents, entire codebases, or very long conversation histories simultaneously.
Fast, cost-effective model with substantial context. Good balance for most tasks.
Quick responses with extended context. Ideal for rapid iterations over medium-sized contexts.
Reasoning-focused models with extended context for complex problem-solving.
Token distribution
The context window is shared between input and output:What consumes tokens?
Every component of your conversation uses tokens:1
System prompt
ZeroTwo’s base instructions that define model behavior.Typical usage: 500-2,000 tokens
2
Custom instructions
Your personal preferences and custom instructions.Typical usage: 100-1,000 tokens
3
Conversation history
Previous messages in the current conversation thread.Growth: Increases with each message exchange
4
File attachments
Content from uploaded files, images, or documents.Variable: Can be significant for large files or many images
5
Tool outputs
Results from web search, code interpreter, or other tools.Variable: Depends on tool usage and results
6
Your message
Your current prompt or question.Variable: Depends on complexity and length
7
Model response
The AI’s generated response.Variable: Longer responses use more tokens
Token optimization strategies
For prompts
Be concise but specific
Be concise but specific
Remove filler words while maintaining clarity:
Remove redundancy
Remove redundancy
Use code efficiently
Use code efficiently
When sharing code, include only relevant portions:
Summarize when possible
Summarize when possible
For long contexts, provide summaries:
For conversations
For file attachments
1
Preprocess large files
Extract relevant sections before uploading:
- Upload specific chapters, not entire books
- Include relevant code files, not entire repositories
- Crop images to relevant areas
2
Use text formats when possible
Text files are more token-efficient than PDFs or images with text.
3
Compress images
Reduce image resolution for analysis tasks that don’t require high detail.
Token limits and truncation
Input truncation
When your input exceeds the context window:Output truncation
When the model’s response would exceed remaining tokens:Estimating token usage
Quick estimation
More accurate estimation
For precise token counting, use tokenizer tools specific to the model provider:- OpenAI: tiktoken library
- Anthropic: Claude tokenizer
- Google: Gemini tokenizer
ZeroTwo may display token usage information in the interface for active conversations, helping you monitor your context usage in real-time.
Strategies by task type
Code generation
Long document analysis
- For shorter docs (< 50K tokens)
- For longer docs (> 50K tokens)
- For multiple documents
Upload the full document and ask specific questions.
Debugging
Model selection based on token needs
Short contexts (< 10K tokens)
Use fast models: GPT-4o-mini, Claude Haiku, or GPT-3.5-turboBest for: Quick questions, simple code generation, brief explanations
Medium contexts (10K-50K tokens)
Use: GPT-4o, Claude 3.5 SonnetBest for: Code reviews, document analysis, extended conversations
Long contexts (50K-200K tokens)
Use: Claude 3.5 Sonnet, o1, o3Best for: Large codebase analysis, multiple documents, long research
Ultra-long contexts (> 200K tokens)
Use: Gemini 1.5 ProBest for: Entire books, massive codebases, comprehensive research projects
Token costs and efficiency
While ZeroTwo abstracts away direct token costs, being efficient with tokens provides benefits:Benefits of token efficiency:
- Faster response times (less to process)
- More room for detailed responses
- Ability to maintain longer conversations
- Better performance on complex tasks
Advanced techniques
Summarization chains
For very long tasks, chain conversations:1
Analyze first section
“Analyze section 1 of this document and provide key findings.”
2
Analyze subsequent sections
“Now analyze section 2, considering findings from section 1.”
3
Synthesize results
“Based on analysis of all sections, provide comprehensive recommendations.”
Iterative refinement
Build complex outputs incrementally:External storage
For reference data:- Store large documentation in Memory
- Use external files or databases
- Reference URLs for documentation
- Summarize large contexts

