Guides

Claude Extended Thinking Guide

ClaudeAIHub. For official extended thinking documentation, visit platform.claude.com.

Extended thinking is an API feature that gives Claude more tokens to reason through a problem before producing a final answer. Rather than responding immediately, Claude works through the problem step by step in a separate “thinking” block, then delivers a final response. This can improve answer quality for complex tasks — at the cost of more output tokens and higher latency.

How Extended Thinking Works

When you enable extended thinking, Claude’s API response includes two types of content blocks:

  • Thinking block: Claude’s internal reasoning process, shown before the final answer.
  • Text block: The final response based on that reasoning.

You control the maximum tokens Claude can use for thinking via the budget_tokens parameter. A larger budget allows more thorough analysis on difficult problems. Claude may not use the entire budget — it stops reasoning when it has enough to answer confidently.

Which Models Support Extended Thinking

Extended thinking support varies by model. As of the current official documentation:

ModelExtended ThinkingAdaptive ThinkingNotes
Claude Opus 4.8Not supportedYesManual extended thinking returns a 400 error — use adaptive thinking instead
Claude Opus 4.7Not supportedYesManual extended thinking returns a 400 error — use adaptive thinking instead
Claude Sonnet 5Not supportedYesManual extended thinking returns an error — use adaptive thinking (effort parameter) instead
Claude Sonnet 4.6DeprecatedYes (recommended)Manual mode still works but is deprecated
Claude Haiku 4.5YesNoSupported via manual mode

Important: If you are using Claude Opus 4.8 or Claude Opus 4.7, do not use the manual thinking parameter — it is not supported and returns a 400 error. Use adaptive thinking instead. Check official Anthropic docs for the latest model support status, as this changes with new releases.

Adaptive Thinking vs Extended Thinking

Anthropic now recommends adaptive thinking for most Claude 4 models. Adaptive thinking automatically decides how much thinking to apply based on task complexity — you set an effort level rather than a token budget. For Claude Sonnet 5, Claude Sonnet 4.6, Claude Opus 4.7, and Claude Opus 4.8, adaptive thinking is the preferred approach — on Claude Sonnet 5 it defaults to "high" effort on the Claude API and in Claude Code.

Manual extended thinking (the thinking parameter with budget_tokens) is still supported on Claude Haiku 4.5 and some legacy models, and deprecated but functional on Sonnet 4.6. Claude Sonnet 5 does not support it at all — calling it returns an error, so adaptive thinking is required.

How to Use Extended Thinking (API)

For models that support it, enable extended thinking by adding a thinking object to your request. The example below uses Claude Haiku 4.5:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[
        {
            "role": "user",
            "content": "Walk through the trade-offs of using a relational vs document database for an e-commerce order system with variable product attributes."
        }
    ]
)

for block in response.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking}")
    elif block.type == "text":
        print(f"Response: {block.text}")

The budget_tokens value must be less than max_tokens. For the most complex tasks, a higher budget (10,000–30,000 tokens) allows more thorough analysis. For straightforward tasks, a smaller budget (2,000–5,000 tokens) is sufficient and more cost-efficient.

Billing for Extended Thinking

Extended thinking tokens are billed at the model’s standard output token rate. A few important billing notes:

  • You pay for all thinking tokens, including any that were generated internally and summarized — not just the thinking content you see in the response.
  • On Claude 4 models, thinking is summarized by default before being returned. The visible thinking summary will be shorter than the full reasoning Claude performed, but billing reflects the full token count.
  • Setting display to “omitted” reduces latency (faster first-text-token when streaming) but does not reduce costs — you still pay for the thinking tokens generated.

Because thinking adds to output token costs, evaluate whether the quality improvement justifies the additional cost for your specific use case before enabling it at scale.

When Extended Thinking Helps

  • Multi-step logic problems: Tasks that require chaining multiple reasoning steps before arriving at an answer.
  • Technical analysis: Comparing architectural trade-offs, debugging complex issues, or analyzing code logic.
  • Math and formal reasoning: Problems that require intermediate steps, proofs, or verifiable calculations.
  • Planning tasks: Creating structured plans where the quality of the structure depends on considering many constraints.
  • Long document analysis: Synthesizing findings across large documents where key insights may be distributed throughout.

When Extended Thinking May Not Help

  • Simple factual questions: Tasks that have a straightforward answer don’t benefit from extended reasoning.
  • Short-form writing: Creative tasks or summaries where quality depends on style, not logic depth.
  • High-volume, cost-sensitive workloads: Thinking adds significant token cost; weigh this against the quality benefit.
  • Latency-sensitive applications: Thinking increases time-to-first-token; use standard responses where speed matters more than depth.

Prompt Examples That Work Well With Deeper Reasoning

These prompting strategies complement deeper model reasoning even without the extended thinking API parameter — they encourage careful analysis in any context:

  • “List your assumptions before answering, then state your conclusion.”
  • “Walk through each step of your reasoning before giving a final recommendation.”
  • “Identify at least three edge cases or risks in this approach.”
  • “Compare two or three options before recommending one, and explain the trade-offs.”
  • “Check your answer against the source material and flag any uncertainties.”
  • “Give a concise reasoning summary — what was the key insight that led to this answer?”

Key Limitations

  • Model support varies: Claude Opus 4.7 does not support manual extended thinking. Always check official docs for current model support before deploying.
  • Tool choice restriction: Extended thinking only supports tool_choice: auto or tool_choice: none. Forced tool selection is not compatible.
  • Cannot toggle mid-turn: You cannot switch thinking on and off within a tool use loop — the entire assistant turn must use a single thinking mode.
  • Cache pre-warming incompatibility: Extended thinking cannot be used with max_tokens: 0 cache pre-warming requests.

Related Resources