Skip to main content
Understanding tokens and context limits is essential for effective use of AI models in ZeroTwo. Tokens determine how much text can be processed in a single conversation, and managing them efficiently ensures you get the best results.

What are tokens?

Tokens are the fundamental units that AI models use to process text. A token can be a word, part of a word, or even punctuation.
Token approximation:
  • 1 token ≈ 4 characters in English
  • 1 token ≈ ¾ of a word on average
  • 100 tokens ≈ 75 words
  • 1,000 tokens ≈ 750 words or 1-2 pages of text

Examples of tokenization

"Hello world" = 2 tokens
- "Hello" = 1 token
- " world" = 1 token (note the space)
Code, special characters, and non-English text typically use more tokens per character than plain English text.

Context windows by model

Different models support different context window sizes. The context window is the total amount of tokens a model can process, including both input and output.

Text models

GPT-4o
128,000 tokens
OpenAI’s flagship model with extended context support. Suitable for analyzing long documents, large codebases, or extended conversations.
Claude 3.5 Sonnet
200,000 tokens
Anthropic’s extended context model. Excellent for processing entire books, comprehensive code reviews, or long research documents.
Gemini 1.5 Pro
1,000,000 tokens
Google’s ultra-long context model. Can process multiple large documents, entire codebases, or very long conversation histories simultaneously.
GPT-4o-mini
128,000 tokens
Fast, cost-effective model with substantial context. Good balance for most tasks.
Claude 3.5 Haiku
200,000 tokens
Quick responses with extended context. Ideal for rapid iterations over medium-sized contexts.
o1 and o3 (reasoning models)
200,000 tokens
Reasoning-focused models with extended context for complex problem-solving.

Token distribution

The context window is shared between input and output:
Total context window = Input tokens + Output tokens

Example with GPT-4o (128K tokens):
- System prompt: 1,000 tokens
- Custom instructions: 500 tokens
- Conversation history: 10,000 tokens
- Your current message: 2,000 tokens
- Available for response: 114,500 tokens
If your conversation uses too many input tokens, the model has less space for generating a response. Very long contexts may result in truncated outputs.

What consumes tokens?

Every component of your conversation uses tokens:
1

System prompt

ZeroTwo’s base instructions that define model behavior.Typical usage: 500-2,000 tokens
2

Custom instructions

Your personal preferences and custom instructions.Typical usage: 100-1,000 tokens
3

Conversation history

Previous messages in the current conversation thread.Growth: Increases with each message exchange
4

File attachments

Content from uploaded files, images, or documents.Variable: Can be significant for large files or many images
5

Tool outputs

Results from web search, code interpreter, or other tools.Variable: Depends on tool usage and results
6

Your message

Your current prompt or question.Variable: Depends on complexity and length
7

Model response

The AI’s generated response.Variable: Longer responses use more tokens

Token optimization strategies

For prompts

Remove filler words while maintaining clarity:
❌ Token-heavy (25 tokens):
"I was wondering if you could possibly help me understand how I might be able to implement a feature that allows users to log in."

✅ Optimized (15 tokens):
"How do I implement user login functionality?"
❌ Redundant (30 tokens):
"Please create a function. The function should validate email addresses. The email validation should check if the email is valid."

✅ Concise (18 tokens):
"Create a function to validate email addresses and return true if valid."
When sharing code, include only relevant portions:
// ❌ Sharing entire 500-line file (5,000+ tokens)

// ✅ Share relevant function (50 tokens)
function processUser(user) {
  // ... relevant code only
}
For long contexts, provide summaries:
❌ Paste entire 10-page document (7,500 tokens)

✅ Summarize key points (500 tokens):
"Document summary: Client wants an e-commerce platform with:
- Multi-vendor marketplace
- Real-time inventory sync
- Mobile-first design
- Budget: $50K, Timeline: 3 months"

For conversations

Managing long conversations:
  1. Start fresh for new topics: Begin a new conversation for unrelated topics
  2. Summarize history: Manually summarize previous context when starting a new thread
  3. Use assistants: Create specialized assistants with task-specific contexts
  4. Leverage memory: Enable Memory to persist important context across conversations

For file attachments

1

Preprocess large files

Extract relevant sections before uploading:
  • Upload specific chapters, not entire books
  • Include relevant code files, not entire repositories
  • Crop images to relevant areas
2

Use text formats when possible

Text files are more token-efficient than PDFs or images with text.
3

Compress images

Reduce image resolution for analysis tasks that don’t require high detail.

Token limits and truncation

Input truncation

When your input exceeds the context window:
What happens:
  • Older messages may be automatically truncated
  • File content might be summarized or omitted
  • Tool outputs could be condensed
  • You’ll receive a warning when approaching limits

Output truncation

When the model’s response would exceed remaining tokens:
If response would be 15,000 tokens but only 10,000 remain:
- Response stops mid-generation
- You can ask the model to "continue" in the next message
- Consider using a model with larger context window

Estimating token usage

Quick estimation

// Rough estimate
const estimatedTokens = text.length / 4;

"This is a sample text" 
// 21 characters ÷ 4 ≈ 5 tokens (actual: 5)

More accurate estimation

For precise token counting, use tokenizer tools specific to the model provider:
  • OpenAI: tiktoken library
  • Anthropic: Claude tokenizer
  • Google: Gemini tokenizer
ZeroTwo may display token usage information in the interface for active conversations, helping you monitor your context usage in real-time.

Strategies by task type

Code generation

✅ Efficient approach:
"Create a React hook for form validation with Zod schema"

❌ Token-heavy approach:
"I'm building a React application and I need form validation. 
I want to use Zod for schema validation. Can you help me create 
a custom hook that handles validation? It should work with 
multiple form fields and show error messages..."
[continues for 200+ tokens]

Long document analysis

Upload the full document and ask specific questions.

Debugging

✅ Efficient debugging prompt (100 tokens):
"This React component throws 'Cannot read property id of undefined'. 
Code: [paste minimal reproduction]
What's the issue?"

❌ Inefficient (500+ tokens):
[Paste entire component + all imports + parent components + 
detailed description of what you tried + full error stack trace + 
unrelated code]

Model selection based on token needs

Short contexts (< 10K tokens)

Use fast models: GPT-4o-mini, Claude Haiku, or GPT-3.5-turboBest for: Quick questions, simple code generation, brief explanations

Medium contexts (10K-50K tokens)

Use: GPT-4o, Claude 3.5 SonnetBest for: Code reviews, document analysis, extended conversations

Long contexts (50K-200K tokens)

Use: Claude 3.5 Sonnet, o1, o3Best for: Large codebase analysis, multiple documents, long research

Ultra-long contexts (> 200K tokens)

Use: Gemini 1.5 ProBest for: Entire books, massive codebases, comprehensive research projects

Token costs and efficiency

While ZeroTwo abstracts away direct token costs, being efficient with tokens provides benefits:
Benefits of token efficiency:
  • Faster response times (less to process)
  • More room for detailed responses
  • Ability to maintain longer conversations
  • Better performance on complex tasks

Advanced techniques

Summarization chains

For very long tasks, chain conversations:
1

Analyze first section

“Analyze section 1 of this document and provide key findings.”
2

Analyze subsequent sections

“Now analyze section 2, considering findings from section 1.”
3

Synthesize results

“Based on analysis of all sections, provide comprehensive recommendations.”

Iterative refinement

Build complex outputs incrementally:
Round 1: "Create basic structure for user authentication"
[Save response]

Round 2: "Add error handling to the previous code"
[Model remembers context]

Round 3: "Add JWT token generation"
[Continues building on previous work]

External storage

For reference data:
  • Store large documentation in Memory
  • Use external files or databases
  • Reference URLs for documentation
  • Summarize large contexts

Monitoring token usage

Best practices:
  • Monitor conversation length regularly
  • Start new threads for unrelated topics
  • Use assistants for specialized, token-efficient workflows
  • Enable Memory to reduce repeated context
  • Choose appropriate models for your token needs

Next steps