HOW-TO · DEV
How to count tokens in OpenAI API requests using the tiktoken library in Python
Target environment
Ubuntu 24.04 · Python 3.12
PREREQUISITES
Python 3.10+, tiktoken installed (pip install tiktoken)
What this does
The tiktoken library provides a fast, accurate token counter for OpenAI models using byte-pair encoding (BPE). Before sending a request, developers can estimate token usage to avoid exceeding context window limits, optimize batch sizing, and forecast API costs. Tiktoken supports the cl100k_base encoding used by GPT-4, GPT-4o, and GPT-3.5 Turbo models.
Steps
- Install tiktoken in the active Python environment.
- Import the
tiktokenmodule. Usetiktoken.get_encoding('cl100k_base')to obtain the encoder for GPT-4 and GPT-3.5 Turbo models. - For single-string token counting, call
encoding.encode(text)and retrievelen(tokens). - For chat message token counting (required for multi-turn conversations), sum the tokens of each message component. Each message contributes a base overhead of 4 tokens plus the token count of its content.
- Write a helper function
count_message_tokens(messages)that iterates over the messages list and aggregates per-message token counts plus the 4-token overhead per message. - For tool-call or function-call messages, include the
namefield in the token count calculation since it adds variable overhead. - Add the 3-token completion overhead to the final count to match OpenAI's token accounting for the
gpt-3.5-turbo-0301and later models. - Use the token count to check against model context limits before sending a request. If the count exceeds the limit, truncate or split the input.
- Cache the encoding object at module level to avoid reinitializing it on every call, which is an expensive operation.
Verification
python3 -c "
import tiktoken
enc = tiktoken.get_encoding('cl100k_base')
text = 'The quick brown fox jumps over the lazy dog.'
tokens = enc.encode(text)
print(f'Text: \"{text}\"')
print(f'Token count: {len(tokens)}')
print(f'Token IDs: {tokens}')
decoded = enc.decode(tokens)
assert decoded == text, 'Round-trip decode failed'
print('Round-trip decode: OK')
"
Expected output:
Text: "The quick brown fox jumps over the lazy dog."
Token count: 9
Token IDs: [792, 1712, 5272, 2264, 16148, 1245, 330, 2405, 38109]
Round-trip decode: OK
Common failures
- Wrong encoding for model version: using
cl100k_basefor a model that uses a different encoding produces inaccurate counts. Solution: map model names to encoding names using OpenAI's official table, and verify with a known token count sample. - Forgetting message overhead in chat completions: counting only message content tokens underestimates total usage. Solution: implement the standard chat token formula with the 4-token overhead per message.
- Encoding object recreated per request: initializing a new encoding for every API call is slow and wasteful. Solution: instantiate the encoding once and reuse it via a module-level variable or singleton pattern.
- Unicode characters inflating token count unexpectedly: emojis and special Unicode characters may encode to multiple tokens. Solution: test token counts with a representative sample of actual input data to calibrate expectations.