Tokens and Context Limits

Understanding tokens and context limits is essential for effective use of AI models in ZeroTwo. Tokens determine how much text can be processed in a single conversation, and managing them efficiently ensures you get the best results.

What are tokens?

Tokens are the fundamental units that AI models use to process text. A token can be a word, part of a word, or even punctuation.

Token approximation:

1 token ≈ 4 characters in English
1 token ≈ ¾ of a word on average
100 tokens ≈ 75 words
1,000 tokens ≈ 750 words or 1-2 pages of text

Examples of tokenization

Simple words
Longer words
Code
Special characters

"Hello world" = 2 tokens
- "Hello" = 1 token
- " world" = 1 token (note the space)

"artificial intelligence" = 3 tokens
- "art" = 1 token
- "ificial" = 1 token
- " intelligence" = 1 token

function greet() { return "Hello"; }
// Approximately 10-12 tokens

const user = { name: "John", age: 30 };
// Approximately 15-17 tokens

"[email protected]" = 5-6 tokens
- "user" = 1 token
- "@" = 1 token
- "example" = 1 token
- "." = 1 token
- "com" = 1 token

Code, special characters, and non-English text typically use more tokens per character than plain English text.

Context windows by model

Different models support different context window sizes. The context window is the total amount of tokens a model can process, including both input and output.

Text models

GPT-4o

128,000 tokens

OpenAI’s flagship model with extended context support. Suitable for analyzing long documents, large codebases, or extended conversations.

Claude 3.5 Sonnet

200,000 tokens

Anthropic’s extended context model. Excellent for processing entire books, comprehensive code reviews, or long research documents.

Gemini 1.5 Pro

1,000,000 tokens

Google’s ultra-long context model. Can process multiple large documents, entire codebases, or very long conversation histories simultaneously.

GPT-4o-mini

128,000 tokens

Fast, cost-effective model with substantial context. Good balance for most tasks.

Claude 3.5 Haiku

200,000 tokens

Quick responses with extended context. Ideal for rapid iterations over medium-sized contexts.

o1 and o3 (reasoning models)

200,000 tokens

Reasoning-focused models with extended context for complex problem-solving.

Token distribution

The context window is shared between input and output:

Total context window = Input tokens + Output tokens

Example with GPT-4o (128K tokens):
- System prompt: 1,000 tokens
- Custom instructions: 500 tokens
- Conversation history: 10,000 tokens
- Your current message: 2,000 tokens
- Available for response: 114,500 tokens

If your conversation uses too many input tokens, the model has less space for generating a response. Very long contexts may result in truncated outputs.

What consumes tokens?

Every component of your conversation uses tokens:

System prompt

ZeroTwo’s base instructions that define model behavior.Typical usage: 500-2,000 tokens

Custom instructions

Your personal preferences and custom instructions.Typical usage: 100-1,000 tokens

Conversation history

Previous messages in the current conversation thread.Growth: Increases with each message exchange

File attachments

Content from uploaded files, images, or documents.Variable: Can be significant for large files or many images

Tool outputs

Results from web search, code interpreter, or other tools.Variable: Depends on tool usage and results

Your message

Your current prompt or question.Variable: Depends on complexity and length

Model response

The AI’s generated response.Variable: Longer responses use more tokens

Token optimization strategies

For prompts

Be concise but specific

Remove filler words while maintaining clarity:

❌ Token-heavy (25 tokens):
"I was wondering if you could possibly help me understand how I might be able to implement a feature that allows users to log in."

✅ Optimized (15 tokens):
"How do I implement user login functionality?"

Remove redundancy

❌ Redundant (30 tokens):
"Please create a function. The function should validate email addresses. The email validation should check if the email is valid."

✅ Concise (18 tokens):
"Create a function to validate email addresses and return true if valid."

Use code efficiently

When sharing code, include only relevant portions:

// ❌ Sharing entire 500-line file (5,000+ tokens)

// ✅ Share relevant function (50 tokens)
function processUser(user) {
  // ... relevant code only
}

Summarize when possible

For long contexts, provide summaries:

❌ Paste entire 10-page document (7,500 tokens)

✅ Summarize key points (500 tokens):
"Document summary: Client wants an e-commerce platform with:
- Multi-vendor marketplace
- Real-time inventory sync
- Mobile-first design
- Budget: $50K, Timeline: 3 months"

For conversations

Managing long conversations:

Start fresh for new topics: Begin a new conversation for unrelated topics
Summarize history: Manually summarize previous context when starting a new thread
Use assistants: Create specialized assistants with task-specific contexts
Leverage memory: Enable Memory to persist important context across conversations

For file attachments

Preprocess large files

Extract relevant sections before uploading:

Upload specific chapters, not entire books
Include relevant code files, not entire repositories
Crop images to relevant areas

Use text formats when possible

Text files are more token-efficient than PDFs or images with text.

Compress images

Reduce image resolution for analysis tasks that don’t require high detail.

Token limits and truncation

Input truncation

When your input exceeds the context window:

What happens:

Older messages may be automatically truncated
File content might be summarized or omitted
Tool outputs could be condensed
You’ll receive a warning when approaching limits

Output truncation

When the model’s response would exceed remaining tokens:

If response would be 15,000 tokens but only 10,000 remain:
- Response stops mid-generation
- You can ask the model to "continue" in the next message
- Consider using a model with larger context window

Estimating token usage

Quick estimation

// Rough estimate
const estimatedTokens = text.length / 4;

"This is a sample text" 
// 21 characters ÷ 4 ≈ 5 tokens (actual: 5)

More accurate estimation

For precise token counting, use tokenizer tools specific to the model provider:

OpenAI: tiktoken library
Anthropic: Claude tokenizer
Google: Gemini tokenizer

ZeroTwo may display token usage information in the interface for active conversations, helping you monitor your context usage in real-time.

Strategies by task type

Code generation

✅ Efficient approach:
"Create a React hook for form validation with Zod schema"

❌ Token-heavy approach:
"I'm building a React application and I need form validation. 
I want to use Zod for schema validation. Can you help me create 
a custom hook that handles validation? It should work with 
multiple form fields and show error messages..."
[continues for 200+ tokens]

Long document analysis

For shorter docs (< 50K tokens)
For longer docs (> 50K tokens)
For multiple documents

Upload the full document and ask specific questions.

Debugging

✅ Efficient debugging prompt (100 tokens):
"This React component throws 'Cannot read property id of undefined'. 
Code: [paste minimal reproduction]
What's the issue?"

❌ Inefficient (500+ tokens):
[Paste entire component + all imports + parent components + 
detailed description of what you tried + full error stack trace + 
unrelated code]

Model selection based on token needs

Short contexts (< 10K tokens)

Use fast models: GPT-4o-mini, Claude Haiku, or GPT-3.5-turboBest for: Quick questions, simple code generation, brief explanations

Medium contexts (10K-50K tokens)

Use: GPT-4o, Claude 3.5 SonnetBest for: Code reviews, document analysis, extended conversations

Long contexts (50K-200K tokens)

Use: Claude 3.5 Sonnet, o1, o3Best for: Large codebase analysis, multiple documents, long research

Ultra-long contexts (> 200K tokens)

Use: Gemini 1.5 ProBest for: Entire books, massive codebases, comprehensive research projects

Token costs and efficiency

While ZeroTwo abstracts away direct token costs, being efficient with tokens provides benefits:

Benefits of token efficiency:

Faster response times (less to process)
More room for detailed responses
Ability to maintain longer conversations
Better performance on complex tasks

Advanced techniques

Summarization chains

For very long tasks, chain conversations:

Analyze first section

“Analyze section 1 of this document and provide key findings.”

Analyze subsequent sections

“Now analyze section 2, considering findings from section 1.”

Synthesize results

“Based on analysis of all sections, provide comprehensive recommendations.”

Build complex outputs incrementally:

Round 1: "Create basic structure for user authentication"
[Save response]

Round 2: "Add error handling to the previous code"
[Model remembers context]

Round 3: "Add JWT token generation"
[Continues building on previous work]

External storage

For reference data:

Store large documentation in Memory
Use external files or databases
Reference URLs for documentation
Summarize large contexts

Monitoring token usage

Best practices:

Monitor conversation length regularly
Start new threads for unrelated topics
Use assistants for specialized, token-efficient workflows
Enable Memory to reduce repeated context
Choose appropriate models for your token needs

Next steps

Custom instructions

Optimize persistent instructions

Models and providers

Choose the right model for your needs

Memory system

Reduce token usage with Memory

Structured techniques

Write token-efficient structured prompts

Overview

Getting Started

Chat Interface

Projects & Teams

Tools & Extensions

Assistants

Prompts & Memory

Integrations & Connectors

Models & Providers

Account & Billing

Troubleshooting

​What are tokens?

​Examples of tokenization

​Context windows by model

​Text models

​Token distribution

​What consumes tokens?

​Token optimization strategies

​For prompts

​For conversations

​For file attachments

​Token limits and truncation

​Input truncation

​Output truncation

​Estimating token usage

​Quick estimation

​More accurate estimation

​Strategies by task type

​Code generation

​Long document analysis

​Debugging

​Model selection based on token needs

Short contexts (< 10K tokens)

Medium contexts (10K-50K tokens)

Long contexts (50K-200K tokens)

Ultra-long contexts (> 200K tokens)

​Token costs and efficiency

​Advanced techniques

​Summarization chains

​Iterative refinement

​External storage

​Monitoring token usage

​Next steps

Custom instructions

Models and providers

Memory system

Structured techniques

What are tokens?

Examples of tokenization

Context windows by model

Text models

Token distribution

What consumes tokens?

Token optimization strategies

For prompts

For conversations

For file attachments

Token limits and truncation

Input truncation

Output truncation

Estimating token usage

Quick estimation

More accurate estimation

Strategies by task type

Code generation

Long document analysis

Debugging

Model selection based on token needs

Token costs and efficiency

Advanced techniques

Summarization chains

Iterative refinement

External storage

Monitoring token usage

Next steps