13. Model-Specific: Qwen
Qwen models have distinct training characteristics that affect prompting strategies. Qwen2 and Qwen2.5 variants show improved instruction following, but specific techniques still improve results.
Chinese language handling: Qwen was trained on extensive Chinese text. Prompts in Chinese may produce more detailed responses for Chinese-related content, but this can also cause code-switching. If you want English output, specify "Respond in English only."
Code generation: Qwen was specifically pre-trained on code. It responds well to code-specific prompting:
Write a Python function that:
- Takes a list of URLs as input
- Fetches each URL concurrently using asyncio
- Returns a dict mapping URL to status code
- Handles timeouts gracefully
Use type hints and include a docstring.
Extended context window: Qwen2.5 models support 128K token contexts. However, for tasks requiring precise extraction from long documents, chunked processing with explicit overlap improves accuracy:
def extract_with_overlap(document, chunk_size=6000, overlap=500):
chunks = []
for i in range(0, len(document), chunk_size - overlap):
chunks.append(document[i:i + chunk_size])
results = []
for i, chunk in enumerate(chunks):
prompt = f"""Extract information from this chunk (part {i+1}/{len(chunks)}).
If a piece of information appears in multiple chunks, extract it once.
Chunk: {chunk}
Output: [structured format]
"""
results.append(model.generate(prompt, format="json"))
return merge_results(results)
Mathematical reasoning: Qwen models were trained on extensive mathematical data. For math tasks, explicit step notation improves accuracy:
Solve this problem, showing each step:
Step 1: [operation and reasoning]
Step 2: [operation and reasoning]
Final answer: [value]
Tool use / function calling: Qwen2.5 has improved function calling capabilities. For structured tool use:
You have access to the following functions:
- get_weather(location: str) -> dict
- get_time(zone: str) -> str
Based on the user request, call the appropriate function with correct arguments.
User: "What's the weather in Berlin?"
Common Qwen failure modes:
Over-elaboration: Qwen may produce verbose responses even when concise is requested. Add explicit constraints: "Limit your response to 3 sentences" or "Provide only the JSON, no explanation."
Code in markdown blocks: Qwen defaults to wrapping code in markdown. If you need raw code, specify: "Return code without markdown formatting."
Ambiguous truncation: When output reaches token limits, Qwen may truncate mid-structure. Always validate JSON completeness programmatically.
Test a complex task (code generation, math reasoning, or long-document extraction) with Qwen and a Llama model. Compare output quality and identify where each model excels.