Some problems need thinking. A complex debugging task, a math proof, or a multi-step architecture decision — these benefit from Claude reasoning through the problem before answering. Extended thinking makes this explicit and controllable.

This is Article 12 in the Claude AI — From Zero to Power User series. You should have completed Article 7: Messages API before this article.

By the end of this article, you will know how to enable extended thinking, set thinking budgets, and decide when to use it.


What is Extended Thinking?

Extended thinking lets Claude “think” before responding. When enabled, Claude generates internal reasoning tokens — a chain of thought that you can see in the response. These thinking tokens help Claude work through complex problems step by step.

Without extended thinking:

  • Claude generates the response immediately
  • Works well for simple tasks
  • May make mistakes on complex reasoning

With extended thinking:

  • Claude thinks first, then responds
  • Better accuracy on math, logic, and multi-step problems
  • You can see the thinking process
  • Costs more (thinking tokens are billed as output tokens)

Enabling Extended Thinking

Python

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[
        {
            "role": "user",
            "content": "I have a Python function that sometimes returns incorrect results. Here is the code:\n\ndef calculate_discount(price, discount_percent, tax_rate, coupon=None):\n    discounted = price * (1 - discount_percent)\n    if coupon:\n        discounted -= coupon\n    taxed = discounted * (1 + tax_rate)\n    return round(taxed, 2)\n\nWhen I call calculate_discount(100, 0.2, 0.08, 5), I expect $81.00 but get $81.22. Find the bug."
        }
    ]
)

# The response has thinking blocks and text blocks
for block in message.content:
    if block.type == "thinking":
        print("=== THINKING ===")
        print(block.thinking)
        print("=== END THINKING ===\n")
    elif block.type == "text":
        print("=== RESPONSE ===")
        print(block.text)

TypeScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 10000,
  },
  messages: [
    {
      role: "user",
      content:
        "I have a Python function that sometimes returns incorrect results...",
    },
  ],
});

for (const block of message.content) {
  if (block.type === "thinking") {
    console.log("=== THINKING ===");
    console.log(block.thinking);
    console.log("=== END THINKING ===\n");
  } else if (block.type === "text") {
    console.log("=== RESPONSE ===");
    console.log(block.text);
  }
}

The response contains two types of content blocks:

  1. thinking — Claude’s internal reasoning (one or more blocks)
  2. text — The final response to the user

Budget Tokens

The budget_tokens parameter controls how much thinking Claude can do.

thinking={
    "type": "enabled",
    "budget_tokens": 5000    # Up to 5,000 thinking tokens
}
BudgetUse CaseTypical Cost (Sonnet 4.6)
1,000-2,000Simple reasoning, quick analysis$0.015-0.030
5,000-10,000Code debugging, moderate complexity$0.075-0.150
20,000-50,000Complex math, deep analysis$0.300-0.750
100,000+Research-level problems$1.50+

Claude does not always use the full budget. If a problem is simple, Claude might use only 500 thinking tokens even with a budget of 10,000. The budget is a maximum, not a target.

Important: max_tokens must be greater than budget_tokens. The max_tokens value covers both thinking tokens and response tokens.


Adaptive Thinking

Opus 4.6 and Sonnet 4.6 support adaptive thinking. Instead of setting a fixed budget, Claude decides how much to think based on the problem complexity.

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    temperature=1,  # Required for adaptive thinking
    messages=[
        {"role": "user", "content": "What is 2 + 2?"}
    ]
)
# Claude uses very few thinking tokens for this simple question

message2 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    temperature=1,
    messages=[
        {"role": "user", "content": "Prove that the square root of 2 is irrational."}
    ]
)
# Claude uses much more thinking budget for this complex question

With adaptive thinking, Claude automatically scales its reasoning effort. Simple questions get minimal thinking. Complex questions get deep reasoning. This saves money because you do not pay for unnecessary thinking tokens.

Note: Temperature must be set to 1 (the default) when using extended thinking. You cannot use temperature=0 with thinking enabled.


When to Use Extended Thinking

Good Use Cases

  • Complex debugging — Finding subtle bugs in code
  • Math and logic — Multi-step calculations, proofs
  • Architecture decisions — Weighing trade-offs between approaches
  • Code review — Deep analysis of complex code
  • Data analysis — Finding patterns in complex data
  • Planning — Multi-step task decomposition

Bad Use Cases

  • Simple questions — “What is the capital of France?” does not need thinking
  • Classification — “Is this positive or negative?” is simple enough without thinking
  • Data extraction — Pulling fields from structured text is straightforward
  • Low-latency needs — Thinking adds latency (seconds to tens of seconds)

Rule of thumb: If you would need to think about the problem yourself, turn on extended thinking. If the answer is obvious, skip it.


Reading Thinking Blocks

The thinking content shows Claude’s internal reasoning. This is valuable for:

  1. Debugging — Understand why Claude gave a particular answer
  2. Trust — Verify Claude’s reasoning is sound
  3. Education — Learn how to approach similar problems

Python

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 5000
    },
    messages=[
        {"role": "user", "content": "Which database should I use for a real-time chat application: PostgreSQL, MongoDB, or Redis? The app needs to handle 10,000 concurrent users."}
    ]
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking ({len(block.thinking)} chars):")
        print(block.thinking[:500])  # Print first 500 chars
        print("...")
    elif block.type == "text":
        print(f"\nFinal answer:")
        print(block.text)

# Token usage
print(f"\nTotal output tokens: {message.usage.output_tokens}")

Extended Thinking with Streaming

Thinking tokens can be streamed in real time, so users see the reasoning as it happens.

Python

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 5000
    },
    messages=[
        {"role": "user", "content": "Debug this SQL query that returns wrong results: SELECT u.name, COUNT(o.id) FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE o.status = 'completed' GROUP BY u.name"}
    ]
) as stream:
    current_block = None

    for event in stream:
        if event.type == "content_block_start":
            if event.content_block.type == "thinking":
                current_block = "thinking"
                print("Thinking: ", end="")
            elif event.content_block.type == "text":
                current_block = "text"
                print("\n\nAnswer: ", end="")

        elif event.type == "content_block_delta":
            if event.delta.type == "thinking_delta":
                print(event.delta.thinking, end="", flush=True)
            elif event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)

print()

TypeScript

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const stream = client.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 5000,
  },
  messages: [
    {
      role: "user",
      content: "Debug this SQL query...",
    },
  ],
});

let currentBlock: string | null = null;

for await (const event of stream) {
  if (event.type === "content_block_start") {
    if (event.content_block.type === "thinking") {
      currentBlock = "thinking";
      process.stdout.write("Thinking: ");
    } else if (event.content_block.type === "text") {
      currentBlock = "text";
      process.stdout.write("\n\nAnswer: ");
    }
  } else if (event.type === "content_block_delta") {
    if (event.delta.type === "thinking_delta") {
      process.stdout.write(event.delta.thinking);
    } else if (event.delta.type === "text_delta") {
      process.stdout.write(event.delta.text);
    }
  }
}

console.log();

Streaming thinking tokens is useful for user-facing applications. The user sees Claude “working through” the problem, which builds trust and patience for longer tasks.


Extended Thinking with Tool Use

Claude can think before deciding to call a tool. This improves tool selection accuracy for complex tasks.

import anthropic
import json

client = anthropic.Anthropic()

tools = [
    {
        "name": "run_query",
        "description": "Run a SQL query against the production database",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "SQL query to execute"}
            },
            "required": ["query"]
        }
    },
    {
        "name": "run_test",
        "description": "Run a specific test file or test function",
        "input_schema": {
            "type": "object",
            "properties": {
                "test_path": {"type": "string", "description": "Path to test file or function"}
            },
            "required": ["test_path"]
        }
    }
]

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 5000
    },
    tools=tools,
    messages=[
        {"role": "user", "content": "The user registration endpoint returns 500 errors intermittently. Investigate."}
    ]
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:200]}...")
    elif block.type == "tool_use":
        print(f"Tool call: {block.name}({json.dumps(block.input)})")
    elif block.type == "text":
        print(f"Response: {block.text}")

Claude thinks about the problem first, then decides which tool to call. This is especially useful in agent workflows where the right tool choice matters.


Cost Analysis

Extended thinking adds cost because thinking tokens are billed as output tokens.

Cost Comparison (Sonnet 4.6)

A code review task with 2,000 input tokens:

ModeThinking TokensOutput TokensTotal Cost
No thinking0500$0.0135
2K thinking budget~1,500500$0.0360
10K thinking budget~5,000500$0.0885

Extended thinking roughly doubles to triples the cost per request. But if it catches a bug that would have taken you 30 minutes to find, the $0.05 extra is well worth it.

Cost Optimization Tips

  1. Use appropriate budgets — Do not set 50K budget for simple tasks
  2. Use adaptive thinking — Let Claude decide how much to think
  3. Cache thinking tokens — In multi-turn conversations, previous thinking gets cached
  4. Use Sonnet for most tasks — Reserve Opus for the hardest problems

Extended Thinking vs Chain of Thought Prompting

You might wonder: “Why not just add ‘Think step by step’ to my prompt?” Here is the difference:

FeatureChain of Thought (prompt)Extended Thinking
How it worksClaude writes reasoning in the responseClaude has a separate thinking phase
Thinking visibleMixed with the response textSeparate thinking blocks
Thinking budgetYou control via max_tokensDedicated budget_tokens parameter
AccuracyGood for moderate tasksBetter for complex tasks
CostAll output tokensThinking tokens are extra

Chain of thought prompting is free (no extra tokens beyond what Claude writes). Extended thinking costs more but is more powerful for genuinely complex problems.

Recommendation: Use chain of thought prompting (“Think step by step”) for moderate tasks. Use extended thinking for genuinely complex tasks where accuracy matters most.


Multi-Turn with Extended Thinking

When continuing a conversation that used extended thinking, include the thinking blocks in the message history:

# First turn with thinking
response1 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 5000},
    messages=[
        {"role": "user", "content": "Analyze this algorithm for time complexity: ..."}
    ]
)

# Continue the conversation — include thinking blocks
messages = [
    {"role": "user", "content": "Analyze this algorithm for time complexity: ..."},
    {"role": "assistant", "content": response1.content},  # Includes thinking + text
    {"role": "user", "content": "Can you optimize it to O(n log n)?"}
]

response2 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 5000},
    messages=messages
)

The thinking blocks from previous turns help Claude maintain context. Combined with prompt caching, the thinking tokens from earlier turns get cached automatically.


Summary

FeatureDetails
Enablethinking: {"type": "enabled", "budget_tokens": N}
Budget1K-100K tokens (Claude uses only what it needs)
AdaptiveOpus 4.6 and Sonnet 4.6 scale thinking to problem complexity
CostThinking tokens billed as output tokens
Response formatthinking blocks + text blocks
TemperatureMust be 1 (default) when thinking is enabled
Best forComplex debugging, math, architecture, code review

Extended thinking is Claude’s most powerful accuracy feature. Use it when the problem is genuinely hard, and the cost of being wrong is high.


What’s Next?

In the next article, we will cover MCP (Model Context Protocol) — the open standard for connecting Claude to external tools and data sources.

Next: MCP (Model Context Protocol)