HOW-TO · DEV

How to count tokens in OpenAI API requests using the tiktoken library in Python

intermediate10 minBy Fredoline Eruo
Target environment
Ubuntu 24.04 · Python 3.12
PREREQUISITES

Python 3.10+, tiktoken installed (pip install tiktoken)

What this does

The tiktoken library provides a fast, accurate token counter for OpenAI models using byte-pair encoding (BPE). Before sending a request, developers can estimate token usage to avoid exceeding context window limits, optimize batch sizing, and forecast API costs. Tiktoken supports the cl100k_base encoding used by GPT-4, GPT-4o, and GPT-3.5 Turbo models.

Steps

  1. Install tiktoken in the active Python environment.
  2. Import the tiktoken module. Use tiktoken.get_encoding('cl100k_base') to obtain the encoder for GPT-4 and GPT-3.5 Turbo models.
  3. For single-string token counting, call encoding.encode(text) and retrieve len(tokens).
  4. For chat message token counting (required for multi-turn conversations), sum the tokens of each message component. Each message contributes a base overhead of 4 tokens plus the token count of its content.
  5. Write a helper function count_message_tokens(messages) that iterates over the messages list and aggregates per-message token counts plus the 4-token overhead per message.
  6. For tool-call or function-call messages, include the name field in the token count calculation since it adds variable overhead.
  7. Add the 3-token completion overhead to the final count to match OpenAI's token accounting for the gpt-3.5-turbo-0301 and later models.
  8. Use the token count to check against model context limits before sending a request. If the count exceeds the limit, truncate or split the input.
  9. Cache the encoding object at module level to avoid reinitializing it on every call, which is an expensive operation.

Verification

python3 -c "
import tiktoken
enc = tiktoken.get_encoding('cl100k_base')
text = 'The quick brown fox jumps over the lazy dog.'
tokens = enc.encode(text)
print(f'Text: \"{text}\"')
print(f'Token count: {len(tokens)}')
print(f'Token IDs: {tokens}')
decoded = enc.decode(tokens)
assert decoded == text, 'Round-trip decode failed'
print('Round-trip decode: OK')
"

Expected output:

Text: "The quick brown fox jumps over the lazy dog."
Token count: 9
Token IDs: [792, 1712, 5272, 2264, 16148, 1245, 330, 2405, 38109]
Round-trip decode: OK

Common failures

  • Wrong encoding for model version: using cl100k_base for a model that uses a different encoding produces inaccurate counts. Solution: map model names to encoding names using OpenAI's official table, and verify with a known token count sample.
  • Forgetting message overhead in chat completions: counting only message content tokens underestimates total usage. Solution: implement the standard chat token formula with the 4-token overhead per message.
  • Encoding object recreated per request: initializing a new encoding for every API call is slow and wasteful. Solution: instantiate the encoding once and reuse it via a module-level variable or singleton pattern.
  • Unicode characters inflating token count unexpectedly: emojis and special Unicode characters may encode to multiple tokens. Solution: test token counts with a representative sample of actual input data to calibrate expectations.

Related guides