Using Thinking Models

Both the kimi-k2-thinking and kimi-k2.6 models have powerful thinking capabilities, supporting deep reasoning and multi-step tool use to solve complex problems.

kimi-k2-thinking: A dedicated thinking model with thinking forcibly enabled

[Recommended] kimi-k2.6: A model that can enable or disable thinking capability, enabled by default. You can disable thinking by using {"type": "disabled"}

If you are doing benchmark testing with kimi api, please refer to this benchmark best practice.

Basic use case

Using the kimi-k2-thinking model

You can simply use it by switching the model parameter:

curl
python

$ curl https://api.moonshot.ai/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $MOONSHOT_API_KEY" \
    -d '{
        "model": "kimi-k2-thinking",
        "messages": [
            {
                "role": "system",
                "content": "You are Kimi."
            },
            {
                "role": "user",
                "content": "Please explain why 1+1=2."
            }
        ],
        "temperature": 1.0
   }'

import os
import openai

client = openai.Client(
    base_url="https://api.moonshot.ai/v1",
    api_key=os.getenv("MOONSHOT_API_KEY"),
)

stream = client.chat.completions.create(
    model="kimi-k2-thinking",
    messages=[
        {
            "role": "system",
            "content": "You are Kimi.",
        },
        {
            "role": "user",
            "content": "Please explain why 1+1=2."
        },
    ],
    max_tokens=1024*32,
    stream=True,
    temperature=1.0,
)

thinking = False
for chunk in stream:
    if chunk.choices:
        choice = chunk.choices[0]
        if choice.delta and hasattr(choice.delta, "reasoning_content"):
            if not thinking:
                thinking = True
                print("=============Start Reasoning=============")
            print(getattr(choice.delta, "reasoning_content"), end="")
        if choice.delta and choice.delta.content:
            if thinking:
                thinking = False
                print("\n=============End Reasoning=============")
            print(choice.delta.content, end="")

Using the Kimi K2.6 model with thinking enabled

For the kimi-k2.6 model, thinking is enabled by default, no need to manually specify it:

curl
python

$ curl https://api.moonshot.ai/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $MOONSHOT_API_KEY" \
    -d '{
        "model": "kimi-k2.6",
        "messages": [
            {
                "role": "system",
                "content": "You are Kimi."
            },
            {
                "role": "user",
                "content": "Please explain why 1+1=2."
            }
        ]
   }'

import os
import openai

client = openai.Client(
    base_url="https://api.moonshot.ai/v1",
    api_key=os.getenv("MOONSHOT_API_KEY"),
)

stream = client.chat.completions.create(
    model="kimi-k2.6",
    messages=[
        {
            "role": "system",
            "content": "You are Kimi.",
        },
        {
            "role": "user",
            "content": "Please explain why 1+1=2."
        },
    ],
    max_tokens=1024*32,
    stream=True,
    # temperature=1.0, # For k2.6 models, use default temperature, no need to explicitly specify
    # No additional parameters needed, thinking is enabled by default
)

thinking = False
for chunk in stream:
    if chunk.choices:
        choice = chunk.choices[0]
        if choice.delta and hasattr(choice.delta, "reasoning_content"):
            if not thinking:
                thinking = True
                print("=============Start Reasoning=============")
            print(getattr(choice.delta, "reasoning_content"), end="")
        if choice.delta and choice.delta.content:
            if thinking:
                thinking = False
                print("\n=============End Reasoning=============")
            print(choice.delta.content, end="")

Using the Kimi K2.6 model with thinking disabled

Please refer to Disable Thinking Capability Example

Accessing the reasoning content

In the API response for kimi-k2-thinking or kimi-k2.6 (with thinking enabled) models, we use the reasoning_content field as the carrier for the model’s reasoning. About the reasoning_content field:

In the OpenAI SDK, ChoiceDelta and ChatCompletionMessage types do not provide a reasoning_content field directly, so you cannot access it via .reasoning_content. You must use hasattr(obj, "reasoning_content") to check if the field exists, and if so, use getattr(obj, "reasoning_content") to retrieve its value.
If you use other frameworks or directly interface with the HTTP API, you can directly obtain the reasoning_content field at the same level as the content field.
In streaming output (stream=True), the reasoning_content field will always appear before the content field. In your business logic, you can detect if the content field has been output to determine if the reasoning (inference process) is finished.
Tokens in reasoning_content are also controlled by the max_tokens parameter: the sum of tokens in reasoning_content and content must be less than or equal to max_tokens.

Multi-Step Tool Call

Both kimi-k2-thinking and kimi-k2.6 (with thinking enabled) are designed to perform deep reasoning across multiple tool calls, enabling them to tackle highly complex tasks.

Usage Notes

To get reliable results, whether using kimi-k2-thinking or kimi-k2.6 (with thinking enabled by default), always follow these configuration rules:

Include the entire reasoning content from the context (the reasoning_content field) in your input. The model will decide which parts are necessary and forward them for further reasoning.
Set max_tokens ≥ 16,000 to ensure the full reasoning_content and final content can be returned without truncation.
Set temperature = 1.0 to get the best performance. Note that kimi-k2.6 model uses a fixed temperature of 1.0.
Enable streaming (stream = true). Because thinking models return both reasoning_content and regular content, the response is larger than usual. Streaming delivers a better user experience and helps avoid network-timeout issues.

Complete example

We walk through a complete example that shows how to properly use thinking models together with official tools for multi-step tool call and extended reasoning. The example below demonstrates a “Daily News Report Generation” scenario. The model will sequentially call official tools like date (to get the date) and web_search (to search today’s news), and will present deep reasoning throughout this process.

import os
import json
import httpx
import openai


class FormulaChatClient:
    def __init__(self, base_url: str, api_key: str):
        """Initialize Formula client"""
        self.base_url = base_url
        self.api_key = api_key
        self.openai = openai.Client(
            base_url=base_url,
            api_key=api_key,
        )
        self.httpx = httpx.Client(
            base_url=base_url,
            headers={"Authorization": f"Bearer {api_key}"},
            timeout=30.0,
        )
        # Using kimi-k2-thinking model
        # If using kimi-k2.6 model, change to "kimi-k2.6". Thinking is enabled by default
        self.model = "kimi-k2-thinking"

    def get_tools(self, formula_uri: str):
        """Get tool definitions from Formula API"""
        response = self.httpx.get(f"/formulas/{formula_uri}/tools")
        response.raise_for_status()
        
        try:
            return response.json().get("tools", [])
        except json.JSONDecodeError as e:
            print(f"Error: Unable to parse JSON (status code: {response.status_code})")
            print(f"Response content: {response.text[:500]}")
            raise

    def call_tool(self, formula_uri: str, function: str, args: dict):
        """Call an official tool"""
        response = self.httpx.post(
            f"/formulas/{formula_uri}/fibers",
            json={"name": function, "arguments": json.dumps(args)},
        )
        response.raise_for_status()
        fiber = response.json()
        
        if fiber.get("status", "") == "succeeded":
            return fiber["context"].get("output") or fiber["context"].get("encrypted_output")
        
        if "error" in fiber:
            return f"Error: {fiber['error']}"
        if "error" in fiber.get("context", {}):
            return f"Error: {fiber['context']['error']}"
        return "Error: Unknown error"

    def close(self):
        """Close the client connection"""
        self.httpx.close()


# Initialize client
base_url = os.getenv("MOONSHOT_BASE_URL", "https://api.moonshot.ai/v1")
api_key = os.getenv("MOONSHOT_API_KEY")

if not api_key:
    raise ValueError("MOONSHOT_API_KEY environment variable not set. Please set your API key.")

print(f"Base URL: {base_url}")
print(f"API Key: {api_key[:10]}...{api_key[-10:] if len(api_key) > 20 else api_key}\n")

client = FormulaChatClient(base_url, api_key)

# Define the official tool Formula URIs to use
formula_uris = [
    "moonshot/date:latest",
    "moonshot/web-search:latest"
]

# Load all tool definitions and build mapping
print("Loading official tools...")
all_tools = []
tool_to_uri = {}  # function.name -> formula_uri

for uri in formula_uris:
    try:
        tools = client.get_tools(uri)
        for tool in tools:
            func = tool.get("function")
            if func:
                func_name = func.get("name")
                if func_name:
                    tool_to_uri[func_name] = uri
                    all_tools.append(tool)
                    print(f"  Loaded tool: {func_name} from {uri}")
    except Exception as e:
        print(f"  Warning: Failed to load tool {uri}: {e}")
        continue

print(f"Loaded {len(all_tools)} tools in total\n")

if not all_tools:
    raise ValueError("No tools loaded. Please check API key and network connection.")

# Initialize message list
messages = [
    {
        "role": "system",
        "content": "You are Kimi, a professional news analyst. You excel at collecting, analyzing, and organizing information to generate high-quality news reports.",
    },
]

# User request to generate today's news report
user_request = "Please help me generate a daily news report including important technology, economy, and society news."
messages.append({
    "role": "user",
    "content": user_request
})

print(f"User request: {user_request}\n")

# Begin multi-step conversation loop
max_iterations = 10  # Prevent infinite loops
for iteration in range(max_iterations):
    try:
        completion = client.openai.chat.completions.create(
            model=client.model,
            messages=messages,
            max_tokens=1024 * 32,
            tools=all_tools,
            temperature=1.0,
        )
    except openai.AuthenticationError as e:
        print(f"Authentication error: {e}")
        print("Please check if the API key is correct and has the required permissions")
        raise
    except Exception as e:
        print(f"Error while calling the model: {e}")
        raise
    
    # Get response
    message = completion.choices[0].message
    
    # Print reasoning process
    if hasattr(message, "reasoning_content"):
        print(f"=============Reasoning round {iteration + 1} starts=============")
        reasoning = getattr(message, "reasoning_content")
        if reasoning:
            print(reasoning[:500] + "..." if len(reasoning) > 500 else reasoning)
        print(f"=============Reasoning round {iteration + 1} ends=============\n")
    
    # Add assistant message to context (preserve reasoning_content)
    messages.append(message)
    
    # If the model did not call any tools, conversation is done
    if not message.tool_calls:
        print("=============Final Answer=============")
        print(message.content)
        break
    
    # Handle tool calls
    print(f"The model decided to call {len(message.tool_calls)} tool(s):\n")
    
    for tool_call in message.tool_calls:
        func_name = tool_call.function.name
        args = json.loads(tool_call.function.arguments)
        
        print(f"Calling tool: {func_name}")
        print(f"Arguments: {json.dumps(args, ensure_ascii=False, indent=2)}")
        
        # Get corresponding formula_uri
        formula_uri = tool_to_uri.get(func_name)
        if not formula_uri:
            print(f"Error: Could not find Formula URI for tool {func_name}")
            continue
        
        # Call the tool
        result = client.call_tool(formula_uri, func_name, args)
        
        # Print result (truncate if too long)
        if len(str(result)) > 200:
            print(f"Tool result: {str(result)[:200]}...\n")
        else:
            print(f"Tool result: {result}\n")
        
        # Add tool result to message list
        tool_message = {
            "role": "tool",
            "tool_call_id": tool_call.id,
            "name": func_name,
            "content": result
        }
        messages.append(tool_message)

print("\nConversation completed!")

# Cleanup
client.close()

This process demonstrates how the kimi-k2-thinking or kimi-k2.6 (with thinking enabled) model uses deep reasoning to plan and execute complex multi-step tasks, with detailed reasoning steps (reasoning_content) preserved in the context to ensure accurate tool use at every stage.

Preserved Thinking

What is Preserved Thinking

Preserved Thinking means passing the reasoning_content of previous turns through to the model in a multi-turn conversation, so that the model can continue its prior chain of thought when reasoning in the current turn. For kimi-k2.6, use the thinking.keep parameter in the request body to control whether historical thinking is preserved:

Value	Behavior
`null` / omitted (default)	Historical `reasoning_content` is ignored. Shorter context and lower cost.
`"all"`	Historical `reasoning_content` is fully preserved, enabling Preserved Thinking.

thinking.keep only affects reasoning_content from historical turns; it does not change whether the model generates/outputs thinking content within the current turn (that is controlled by thinking.type). Recommended to use keep: "all" together with type: "enabled".

How to use

When using keep: "all", keep the reasoning_content from every historical assistant message in messages as-is. The simplest way is to append the assistant message returned from the previous API call directly back into messages.

curl
python

$ curl https://api.moonshot.ai/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $MOONSHOT_API_KEY" \
    -d '{
        "model": "kimi-k2.6",
        "messages": [
            {"role": "system", "content": "You are Kimi."},
            {"role": "user", "content": "First question..."},
            {
                "role": "assistant",
                "reasoning_content": "<reasoning_content returned by the previous API call>",
                "content": "<final answer returned by the previous API call>"
            },
            {"role": "user", "content": "Please continue the analysis and derive the next step."}
        ],
        "thinking": {
            "type": "enabled",
            "keep": "all"
        }
   }'

import os
import openai

client = openai.Client(
    base_url="https://api.moonshot.ai/v1",
    api_key=os.getenv("MOONSHOT_API_KEY"),
)

# Keep the assistant message (including reasoning_content) from every previous API call in messages
messages = [
    {"role": "system", "content": "You are Kimi."},
    {"role": "user", "content": "First question..."},
    {
        "role": "assistant",
        "reasoning_content": "<reasoning_content returned by the previous API call>",
        "content": "<final answer returned by the previous API call>",
    },
    {"role": "user", "content": "Please continue the analysis and derive the next step."},
]

response = client.chat.completions.create(
    model="kimi-k2.6",
    messages=messages,
    stream=True,
    extra_body={"thinking": {"type": "enabled", "keep": "all"}},
)

reasoning_content counts toward token consumption. When Preserved Thinking is enabled, historical thinking content keeps occupying the context window and is billed accordingly. Use it wisely.

Frequently Asked Questions

Q1: Why should I keep `reasoning_content`?

A: Keeping the reasoning_content ensures the model maintains reasoning continuity in multi-step reasoning scenarios, especially when calling tools. The server will automatically handle these fields; users do not need to manage them manually.

Q2: Does `reasoning_content` consume extra tokens?

A: Yes, reasoning_content counts towards your input/output token quota. For detailed pricing, please refer to MoonshotAI’s pricing documentation.

Overview

Quickstart

Next steps

Basic use case

Using the kimi-k2-thinking model

Using the Kimi K2.6 model with thinking enabled

Using the Kimi K2.6 model with thinking disabled

Accessing the reasoning content

Multi-Step Tool Call

Usage Notes

Complete example

Preserved Thinking

What is Preserved Thinking

How to use

Frequently Asked Questions

Q1: Why should I keep `reasoning_content`?

Q2: Does `reasoning_content` consume extra tokens?

Overview

Quickstart

Next steps

Documentation Index

​Basic use case

​Using the kimi-k2-thinking model

​Using the Kimi K2.6 model with thinking enabled

​Using the Kimi K2.6 model with thinking disabled

​Accessing the reasoning content

​Multi-Step Tool Call

​Usage Notes

​Complete example

​Preserved Thinking

​What is Preserved Thinking

​How to use

​Frequently Asked Questions

​Q1: Why should I keep reasoning_content?

​Q2: Does reasoning_content consume extra tokens?

Basic use case

Using the kimi-k2-thinking model

Using the Kimi K2.6 model with thinking enabled

Using the Kimi K2.6 model with thinking disabled

Accessing the reasoning content

Multi-Step Tool Call

Usage Notes

Complete example

Preserved Thinking

What is Preserved Thinking

How to use

Frequently Asked Questions

Q1: Why should I keep `reasoning_content`?

Q2: Does `reasoning_content` consume extra tokens?