Claude AI Tutorial #12: Extended Thinking — Claude's Reasoning Mode

Some problems need thinking. A complex debugging task, a math proof, or a multi-step architecture decision — these benefit from Claude reasoning through the problem before answering. Extended thinking makes this explicit and controllable.

This is Article 12 in the Claude AI — From Zero to Power User series. You should have completed Article 7: Messages API before this article.

By the end of this article, you will know how to enable extended thinking, set thinking budgets, and decide when to use it.

What is Extended Thinking?

Extended thinking lets Claude “think” before responding. When enabled, Claude generates internal reasoning tokens — a chain of thought that you can see in the response. These thinking tokens help Claude work through complex problems step by step.

Without extended thinking:

Claude generates the response immediately
Works well for simple tasks
May make mistakes on complex reasoning

With extended thinking:

Claude thinks first, then responds
Better accuracy on math, logic, and multi-step problems
You can see the thinking process
Costs more (thinking tokens are billed as output tokens)

Enabling Extended Thinking

Python

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[
        {
            "role": "user",
            "content": "I have a Python function that sometimes returns incorrect results. Here is the code:\n\ndef calculate_discount(price, discount_percent, tax_rate, coupon=None):\n    discounted = price * (1 - discount_percent)\n    if coupon:\n        discounted -= coupon\n    taxed = discounted * (1 + tax_rate)\n    return round(taxed, 2)\n\nWhen I call calculate_discount(100, 0.2, 0.08, 5), I expect $81.00 but get $81.22. Find the bug."
        }
    ]
)

# The response has thinking blocks and text blocks
for block in message.content:
    if block.type == "thinking":
        print("=== THINKING ===")
        print(block.thinking)
        print("=== END THINKING ===\n")
    elif block.type == "text":
        print("=== RESPONSE ===")
        print(block.text)

TypeScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 10000,
  },
  messages: [
    {
      role: "user",
      content:
        "I have a Python function that sometimes returns incorrect results...",
    },
  ],
});

for (const block of message.content) {
  if (block.type === "thinking") {
    console.log("=== THINKING ===");
    console.log(block.thinking);
    console.log("=== END THINKING ===\n");
  } else if (block.type === "text") {
    console.log("=== RESPONSE ===");
    console.log(block.text);
  }
}

The response contains two types of content blocks:

thinking — Claude’s internal reasoning (one or more blocks)
text — The final response to the user

Budget Tokens

The budget_tokens parameter controls how much thinking Claude can do.

thinking={
    "type": "enabled",
    "budget_tokens": 5000    # Up to 5,000 thinking tokens
}

Budget	Use Case	Typical Cost (Sonnet 4.6)
1,000-2,000	Simple reasoning, quick analysis	$0.015-0.030
5,000-10,000	Code debugging, moderate complexity	$0.075-0.150
20,000-50,000	Complex math, deep analysis	$0.300-0.750
100,000+	Research-level problems	$1.50+

Claude does not always use the full budget. If a problem is simple, Claude might use only 500 thinking tokens even with a budget of 10,000. The budget is a maximum, not a target.

Important: max_tokens must be greater than budget_tokens. The max_tokens value covers both thinking tokens and response tokens.

Adaptive Thinking

Opus 4.6 and Sonnet 4.6 support adaptive thinking. Instead of setting a fixed budget, Claude decides how much to think based on the problem complexity.

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    temperature=1,  # Required for adaptive thinking
    messages=[
        {"role": "user", "content": "What is 2 + 2?"}
    ]
)
# Claude uses very few thinking tokens for this simple question

message2 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    temperature=1,
    messages=[
        {"role": "user", "content": "Prove that the square root of 2 is irrational."}
    ]
)
# Claude uses much more thinking budget for this complex question

With adaptive thinking, Claude automatically scales its reasoning effort. Simple questions get minimal thinking. Complex questions get deep reasoning. This saves money because you do not pay for unnecessary thinking tokens.

Note: Temperature must be set to 1 (the default) when using extended thinking. You cannot use temperature=0 with thinking enabled.

When to Use Extended Thinking

Good Use Cases

Complex debugging — Finding subtle bugs in code
Math and logic — Multi-step calculations, proofs
Architecture decisions — Weighing trade-offs between approaches
Code review — Deep analysis of complex code
Data analysis — Finding patterns in complex data
Planning — Multi-step task decomposition

Bad Use Cases

Simple questions — “What is the capital of France?” does not need thinking
Classification — “Is this positive or negative?” is simple enough without thinking
Data extraction — Pulling fields from structured text is straightforward
Low-latency needs — Thinking adds latency (seconds to tens of seconds)

Rule of thumb: If you would need to think about the problem yourself, turn on extended thinking. If the answer is obvious, skip it.

Reading Thinking Blocks

The thinking content shows Claude’s internal reasoning. This is valuable for:

Debugging — Understand why Claude gave a particular answer
Trust — Verify Claude’s reasoning is sound
Education — Learn how to approach similar problems

Python

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 5000
    },
    messages=[
        {"role": "user", "content": "Which database should I use for a real-time chat application: PostgreSQL, MongoDB, or Redis? The app needs to handle 10,000 concurrent users."}
    ]
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking ({len(block.thinking)} chars):")
        print(block.thinking[:500])  # Print first 500 chars
        print("...")
    elif block.type == "text":
        print(f"\nFinal answer:")
        print(block.text)

# Token usage
print(f"\nTotal output tokens: {message.usage.output_tokens}")

Extended Thinking with Streaming

Thinking tokens can be streamed in real time, so users see the reasoning as it happens.

Python

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 5000
    },
    messages=[
        {"role": "user", "content": "Debug this SQL query that returns wrong results: SELECT u.name, COUNT(o.id) FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE o.status = 'completed' GROUP BY u.name"}
    ]
) as stream:
    current_block = None

    for event in stream:
        if event.type == "content_block_start":
            if event.content_block.type == "thinking":
                current_block = "thinking"
                print("Thinking: ", end="")
            elif event.content_block.type == "text":
                current_block = "text"
                print("\n\nAnswer: ", end="")

        elif event.type == "content_block_delta":
            if event.delta.type == "thinking_delta":
                print(event.delta.thinking, end="", flush=True)
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)

print()

TypeScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const stream = client.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 5000,
  },
  messages: [
    {
      role: "user",
      content: "Debug this SQL query...",
    },
  ],
});

let currentBlock: string | null = null;

for await (const event of stream) {
  if (event.type === "content_block_start") {
    if (event.content_block.type === "thinking") {
      currentBlock = "thinking";
      process.stdout.write("Thinking: ");
    } else if (event.content_block.type === "text") {
      currentBlock = "text";
      process.stdout.write("\n\nAnswer: ");
    }
  } else if (event.type === "content_block_delta") {
    if (event.delta.type === "thinking_delta") {
      process.stdout.write(event.delta.thinking);
    } else if (event.delta.type === "text_delta") {
      process.stdout.write(event.delta.text);
    }
  }
}

console.log();

Streaming thinking tokens is useful for user-facing applications. The user sees Claude “working through” the problem, which builds trust and patience for longer tasks.

Extended Thinking with Tool Use

Claude can think before deciding to call a tool. This improves tool selection accuracy for complex tasks.

import anthropic
import json

client = anthropic.Anthropic()

tools = [
    {
        "name": "run_query",
        "description": "Run a SQL query against the production database",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "SQL query to execute"}
            },
            "required": ["query"]
        }
    },
    {
        "name": "run_test",
        "description": "Run a specific test file or test function",
        "input_schema": {
            "type": "object",
            "properties": {
                "test_path": {"type": "string", "description": "Path to test file or function"}
            },
            "required": ["test_path"]
        }
    }
]

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 5000
    },
    tools=tools,
    messages=[
        {"role": "user", "content": "The user registration endpoint returns 500 errors intermittently. Investigate."}
    ]
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:200]}...")
    elif block.type == "tool_use":
        print(f"Tool call: {block.name}({json.dumps(block.input)})")
    elif block.type == "text":
        print(f"Response: {block.text}")

Claude thinks about the problem first, then decides which tool to call. This is especially useful in agent workflows where the right tool choice matters.

Cost Analysis

Extended thinking adds cost because thinking tokens are billed as output tokens.

Cost Comparison (Sonnet 4.6)

A code review task with 2,000 input tokens:

Mode	Thinking Tokens	Output Tokens	Total Cost
No thinking	0	500	$0.0135
2K thinking budget	~1,500	500	$0.0360
10K thinking budget	~5,000	500	$0.0885

Extended thinking roughly doubles to triples the cost per request. But if it catches a bug that would have taken you 30 minutes to find, the $0.05 extra is well worth it.

Cost Optimization Tips

Use appropriate budgets — Do not set 50K budget for simple tasks
Use adaptive thinking — Let Claude decide how much to think
Cache thinking tokens — In multi-turn conversations, previous thinking gets cached
Use Sonnet for most tasks — Reserve Opus for the hardest problems

Extended Thinking vs Chain of Thought Prompting

You might wonder: “Why not just add ‘Think step by step’ to my prompt?” Here is the difference:

Feature	Chain of Thought (prompt)	Extended Thinking
How it works	Claude writes reasoning in the response	Claude has a separate thinking phase
Thinking visible	Mixed with the response text	Separate thinking blocks
Thinking budget	You control via max_tokens	Dedicated budget_tokens parameter
Accuracy	Good for moderate tasks	Better for complex tasks
Cost	All output tokens	Thinking tokens are extra

Chain of thought prompting is free (no extra tokens beyond what Claude writes). Extended thinking costs more but is more powerful for genuinely complex problems.

Recommendation: Use chain of thought prompting (“Think step by step”) for moderate tasks. Use extended thinking for genuinely complex tasks where accuracy matters most.

Multi-Turn with Extended Thinking

When continuing a conversation that used extended thinking, include the thinking blocks in the message history:

# First turn with thinking
response1 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 5000},
    messages=[
        {"role": "user", "content": "Analyze this algorithm for time complexity: ..."}
    ]
)

# Continue the conversation — include thinking blocks
messages = [
    {"role": "user", "content": "Analyze this algorithm for time complexity: ..."},
    {"role": "assistant", "content": response1.content},  # Includes thinking + text
    {"role": "user", "content": "Can you optimize it to O(n log n)?"}
]

response2 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 5000},
    messages=messages
)

The thinking blocks from previous turns help Claude maintain context. Combined with prompt caching, the thinking tokens from earlier turns get cached automatically.

Summary

Feature	Details
Enable	`thinking: {"type": "enabled", "budget_tokens": N}`
Budget	1K-100K tokens (Claude uses only what it needs)
Adaptive	Opus 4.6 and Sonnet 4.6 scale thinking to problem complexity
Cost	Thinking tokens billed as output tokens
Response format	`thinking` blocks + `text` blocks
Temperature	Must be 1 (default) when thinking is enabled
Best for	Complex debugging, math, architecture, code review

Extended thinking is Claude’s most powerful accuracy feature. Use it when the problem is genuinely hard, and the cost of being wrong is high.

What’s Next?

In the next article, we will cover MCP (Model Context Protocol) — the open standard for connecting Claude to external tools and data sources.

Next: MCP (Model Context Protocol)

What is Extended Thinking?#

Enabling Extended Thinking#

Python#

TypeScript#

Budget Tokens#

Adaptive Thinking#

When to Use Extended Thinking#

Good Use Cases#

Bad Use Cases#

Reading Thinking Blocks#

Python#

Extended Thinking with Streaming#

Python#

TypeScript#

Extended Thinking with Tool Use#

Cost Analysis#

Cost Comparison (Sonnet 4.6)#

Cost Optimization Tips#

Extended Thinking vs Chain of Thought Prompting#

Multi-Turn with Extended Thinking#

Summary#

What’s Next?#

Related Articles#

What is Extended Thinking?

Enabling Extended Thinking

Python

TypeScript

Budget Tokens

Adaptive Thinking

When to Use Extended Thinking

Good Use Cases

Bad Use Cases

Reading Thinking Blocks

Python

Extended Thinking with Streaming

Python

TypeScript

Extended Thinking with Tool Use

Cost Analysis

Cost Comparison (Sonnet 4.6)

Cost Optimization Tips

Extended Thinking vs Chain of Thought Prompting

Multi-Turn with Extended Thinking

Summary

What’s Next?

Related Articles