The Messages API is the foundation of everything you build with Claude. Every feature — tool use, vision, streaming, caching — goes through this API. Understanding it deeply will make every other article in this series easier.
This is Article 7 in the Claude AI — From Zero to Power User series. You should have completed Article 2: Getting Started before this article.
By the end of this article, you will know how to build multi-turn conversations, stream responses in real time, and track your token usage.
Messages API Anatomy
Every API call has the same basic structure:
Python
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a helpful coding assistant.",
messages=[
{"role": "user", "content": "What is a list comprehension in Python?"}
]
)
print(message.content[0].text)
TypeScript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const message = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: "You are a helpful coding assistant.",
messages: [
{ role: "user", content: "What is a list comprehension in Python?" }
],
});
if (message.content[0].type === "text") {
console.log(message.content[0].text);
}
The required parameters are:
| Parameter | Type | Description |
|---|---|---|
model | string | Which Claude model to use |
max_tokens | integer | Maximum tokens in the response |
messages | array | The conversation history |
Optional but common:
| Parameter | Type | Description |
|---|---|---|
system | string | System prompt (instructions for Claude) |
temperature | float | Randomness (0.0 to 1.0) |
stop_sequences | array | Stop generating when these strings appear |
metadata | object | User ID for tracking and abuse detection |
The Response Object
Every API call returns a message object with this structure:
{
"id": "msg_01ABC123",
"type": "message",
"role": "assistant",
"model": "claude-sonnet-4-6",
"content": [
{
"type": "text",
"text": "A list comprehension is..."
}
],
"stop_reason": "end_turn",
"usage": {
"input_tokens": 25,
"output_tokens": 150
}
}
Key fields:
- id — Unique message identifier
- content — Array of content blocks (text, tool_use, etc.)
- stop_reason — Why Claude stopped generating
- usage — Token counts for cost tracking
Stop Reasons
| Stop Reason | Meaning |
|---|---|
end_turn | Claude finished its response naturally |
max_tokens | Hit the max_tokens limit (response may be incomplete) |
stop_sequence | Hit one of your stop sequences |
tool_use | Claude wants to call a tool |
Always check stop_reason. If it is max_tokens, your response was cut off. Increase max_tokens or continue the conversation.
Multi-Turn Conversations
Claude does not remember previous conversations. Each API call is independent. To have a multi-turn conversation, you send the entire message history every time.
Python
import anthropic
client = anthropic.Anthropic()
# Start the conversation
conversation = []
def chat(user_message: str) -> str:
conversation.append({"role": "user", "content": user_message})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a Python tutor. Give short, clear explanations with code examples.",
messages=conversation
)
assistant_message = response.content[0].text
conversation.append({"role": "assistant", "content": assistant_message})
return assistant_message
# Turn 1
print(chat("What is a decorator in Python?"))
# Turn 2 — Claude remembers the context
print(chat("Can you show me a real-world example?"))
# Turn 3 — Claude still has full context
print(chat("How do I add arguments to a decorator?"))
TypeScript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const conversation: Anthropic.MessageParam[] = [];
async function chat(userMessage: string): Promise<string> {
conversation.push({ role: "user", content: userMessage });
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: "You are a Python tutor. Give short, clear explanations with code examples.",
messages: conversation,
});
const assistantMessage =
response.content[0].type === "text" ? response.content[0].text : "";
conversation.push({ role: "assistant", content: assistantMessage });
return assistantMessage;
}
// Turn 1
console.log(await chat("What is a decorator in Python?"));
// Turn 2
console.log(await chat("Can you show me a real-world example?"));
// Turn 3
console.log(await chat("How do I add arguments to a decorator?"));
Each API call sends the full conversation. By turn 3, you are sending 6 messages (3 user + 3 assistant). This means token usage grows with each turn.
Cost tip: A 10-turn conversation with Sonnet 4.6 costs approximately $0.03-0.10, depending on response length. Use prompt caching to reduce repeated context costs.
Streaming Responses
By default, the API waits until Claude finishes generating the entire response before returning. Streaming gives you tokens as they are generated — much better for user-facing applications.
Python
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain async/await in Python with examples"}
]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print() # New line at the end
# Get the final message object for usage stats
final_message = stream.get_final_message()
print(f"\nTokens used: {final_message.usage.input_tokens} in, {final_message.usage.output_tokens} out")
TypeScript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const stream = client.messages.stream({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [
{ role: "user", content: "Explain async/await in Python with examples" }
],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
process.stdout.write(event.delta.text);
}
}
console.log();
const finalMessage = await stream.finalMessage();
console.log(
`\nTokens used: ${finalMessage.usage.input_tokens} in, ${finalMessage.usage.output_tokens} out`
);
Stream Event Types
The streaming API sends these events:
| Event | When | Contains |
|---|---|---|
message_start | First event | Message metadata (id, model) |
content_block_start | New content block begins | Block type and index |
content_block_delta | Token generated | The actual text delta |
content_block_stop | Content block finished | Block index |
message_delta | Message metadata update | Stop reason, usage |
message_stop | Last event | Nothing (signals end) |
For most use cases, you only need content_block_delta events to get the streaming text.
Token Usage Tracking
Every response includes token counts. Track them to monitor costs.
Python
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[
{"role": "user", "content": "Write a Python function to validate email addresses"}
]
)
usage = message.usage
print(f"Input tokens: {usage.input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
print(f"Total tokens: {usage.input_tokens + usage.output_tokens}")
# Calculate cost (Sonnet 4.6 pricing)
input_cost = usage.input_tokens * 3.00 / 1_000_000
output_cost = usage.output_tokens * 15.00 / 1_000_000
total_cost = input_cost + output_cost
print(f"Cost: ${total_cost:.6f}")
TypeScript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const message = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [
{ role: "user", content: "Write a Python function to validate email addresses" }
],
});
const usage = message.usage;
console.log(`Input tokens: ${usage.input_tokens}`);
console.log(`Output tokens: ${usage.output_tokens}`);
console.log(`Total tokens: ${usage.input_tokens + usage.output_tokens}`);
// Calculate cost (Sonnet 4.6 pricing)
const inputCost = (usage.input_tokens * 3.0) / 1_000_000;
const outputCost = (usage.output_tokens * 15.0) / 1_000_000;
const totalCost = inputCost + outputCost;
console.log(`Cost: $${totalCost.toFixed(6)}`);
Pricing Reference (March 2026)
| Model | Input ($/MTok) | Output ($/MTok) |
|---|---|---|
| Opus 4.6 | $5.00 | $25.00 |
| Sonnet 4.6 | $3.00 | $15.00 |
| Haiku 4.5 | $1.00 | $5.00 |
MTok = million tokens. A typical coding question uses 200-500 input tokens and 300-1000 output tokens.
Setting max_tokens
The max_tokens parameter controls the maximum length of Claude’s response. It does not affect the input — only the output.
# Short response — good for classification, extraction
short = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=100,
messages=[{"role": "user", "content": "Is this a bug report or feature request? Text: 'The app crashes when I click save'"}]
)
# Long response — good for code generation, explanations
long = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
messages=[{"role": "user", "content": "Write a complete REST API for user management in FastAPI"}]
)
Tips:
- Set
max_tokensto the maximum you might need, not the exact length you want - You are only charged for tokens actually generated, not the max_tokens limit
- If Claude hits the limit,
stop_reasonwill bemax_tokens - Maximum values: 128K tokens for Opus 4.6, 64K for Sonnet 4.6 and Haiku 4.5
Stop Sequences
Stop sequences tell Claude to stop generating when it encounters a specific string.
Python
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
stop_sequences=["```"],
messages=[
{"role": "user", "content": "Write a Python function to sort a list. Only the code, in a code block."}
]
)
print(message.content[0].text)
print(f"Stop reason: {message.stop_reason}") # "stop_sequence"
Common uses:
- Stop at a closing code block marker
- Stop at a specific delimiter in structured output
- Stop at “END” or similar markers in formatted responses
Error Handling
The API returns standard HTTP error codes. Here is how to handle them properly.
Python
import anthropic
import time
client = anthropic.Anthropic()
def call_claude(messages: list, retries: int = 3) -> str:
for attempt in range(retries):
try:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=messages
)
return response.content[0].text
except anthropic.RateLimitError:
# 429 — too many requests
wait = 2 ** attempt # exponential backoff: 1s, 2s, 4s
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
except anthropic.AuthenticationError:
# 401 — bad API key
print("Invalid API key. Check your ANTHROPIC_API_KEY.")
raise
except anthropic.BadRequestError as e:
# 400 — invalid request (bad params, too many tokens, etc.)
print(f"Bad request: {e}")
raise
except anthropic.APIError as e:
# 500+ — server error
if attempt < retries - 1:
wait = 2 ** attempt
print(f"Server error. Retrying in {wait}s...")
time.sleep(wait)
else:
raise
raise Exception("Max retries exceeded")
result = call_claude([{"role": "user", "content": "Hello"}])
print(result)
TypeScript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function callClaude(
messages: Anthropic.MessageParam[],
retries = 3
): Promise<string> {
for (let attempt = 0; attempt < retries; attempt++) {
try {
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages,
});
return response.content[0].type === "text"
? response.content[0].text
: "";
} catch (error) {
if (error instanceof Anthropic.RateLimitError) {
const wait = Math.pow(2, attempt) * 1000;
console.log(`Rate limited. Waiting ${wait}ms...`);
await new Promise((r) => setTimeout(r, wait));
} else if (error instanceof Anthropic.AuthenticationError) {
console.log("Invalid API key.");
throw error;
} else if (error instanceof Anthropic.BadRequestError) {
console.log(`Bad request: ${error.message}`);
throw error;
} else if (error instanceof Anthropic.APIError) {
if (attempt < retries - 1) {
const wait = Math.pow(2, attempt) * 1000;
console.log(`Server error. Retrying in ${wait}ms...`);
await new Promise((r) => setTimeout(r, wait));
} else {
throw error;
}
} else {
throw error;
}
}
}
throw new Error("Max retries exceeded");
}
const result = await callClaude([{ role: "user", content: "Hello" }]);
console.log(result);
The Python and TypeScript SDKs both have built-in retry logic for rate limits. But custom retry handling gives you more control over the behavior.
Metadata for User Tracking
If you are building an app where multiple users call Claude, use the metadata parameter to track which user made each request.
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
metadata={"user_id": "user_12345"},
messages=[
{"role": "user", "content": "Help me debug this code"}
]
)
Anthropic uses this for abuse detection and rate limiting per user. It does not affect Claude’s response.
Long Context (1M Tokens)
Claude supports up to 1 million tokens of input context. This is a beta feature with specific requirements:
- Access: Tier 4 API access required
- Header: Include
anthropic-beta: context-1m-2025-08-07in your request - Pricing: Requests over 200K input tokens cost 2x for input, 1.5x for output
- Models: Opus 4.6 and Sonnet 4.6 only
Python
import anthropic
client = anthropic.Anthropic()
# For requests over 200K tokens, add the beta header
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
extra_headers={"anthropic-beta": "context-1m-2025-08-07"},
system="You are a code analyst. Analyze the entire codebase provided.",
messages=[
{"role": "user", "content": f"Here is the full codebase:\n\n{large_codebase}\n\nFind all security vulnerabilities."}
]
)
For most use cases, the standard 200K context window is enough. Use 1M context only when you genuinely need it — the cost premium adds up.
Practical Pattern: Conversation with Cost Tracking
Here is a complete example that combines multi-turn conversation, streaming, and cost tracking:
Python
import anthropic
client = anthropic.Anthropic()
conversation = []
total_input_tokens = 0
total_output_tokens = 0
def chat(user_message: str) -> str:
global total_input_tokens, total_output_tokens
conversation.append({"role": "user", "content": user_message})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system="You are a senior software engineer. Be concise and practical.",
messages=conversation
)
assistant_text = response.content[0].text
conversation.append({"role": "assistant", "content": assistant_text})
total_input_tokens += response.usage.input_tokens
total_output_tokens += response.usage.output_tokens
cost = (total_input_tokens * 3.00 + total_output_tokens * 15.00) / 1_000_000
print(f"[Tokens: {total_input_tokens} in, {total_output_tokens} out | Cost: ${cost:.4f}]")
return assistant_text
# Use it
print(chat("What is the best way to handle errors in a REST API?"))
print(chat("Show me an example in FastAPI"))
print(chat("How do I add logging to this?"))
This pattern is useful for any interactive application where you want to monitor costs in real time.
Summary
| Feature | How |
|---|---|
| Basic call | client.messages.create(model, max_tokens, messages) |
| Multi-turn | Send full message history in every request |
| Streaming | Use client.messages.stream() (Python) or .stream() (TS) |
| Token tracking | Check response.usage.input_tokens and output_tokens |
| Error handling | Catch specific error types, retry with backoff |
| Long context | Beta header + Tier 4 access for 1M tokens |
The Messages API is simple but powerful. Master these patterns, and every other Claude feature will be easy to integrate.
What’s Next?
In the next article, we will cover Tool Use — teaching Claude to call your functions, get real data, and take actions.
Next: Tool Use (Function Calling)