> ## Documentation Index
> Fetch the complete documentation index at: https://platform.kimi.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Using Thinking Models

> Both the `kimi-k2-thinking` and `kimi-k2.6` models have powerful thinking capabilities, supporting deep reasoning and multi-step tool use to solve complex problems.
>
> * **`kimi-k2-thinking`**: A dedicated thinking model with thinking forcibly enabled
> * **\[Recommended] `kimi-k2.6`**: A model that can enable or disable thinking capability, enabled by default. You can disable thinking by using `{"type": "disabled"}`

If you are doing benchmark testing with kimi api, please refer to this [benchmark best practice](/guide/benchmark-best-practice).

## Basic use case

### Using the kimi-k2-thinking model

You can simply use it by switching the `model` parameter:

<Tabs>
  <Tab title="curl">
    ```bash theme={null}
    $ curl https://api.moonshot.ai/v1/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $MOONSHOT_API_KEY" \
        -d '{
            "model": "kimi-k2-thinking",
            "messages": [
                {
                    "role": "system",
                    "content": "You are Kimi."
                },
                {
                    "role": "user",
                    "content": "Please explain why 1+1=2."
                }
            ],
            "temperature": 1.0
       }'

    ```
  </Tab>

  <Tab title="python">
    ```python theme={null}
    import os
    import openai

    client = openai.Client(
        base_url="https://api.moonshot.ai/v1",
        api_key=os.getenv("MOONSHOT_API_KEY"),
    )

    stream = client.chat.completions.create(
        model="kimi-k2-thinking",
        messages=[
            {
                "role": "system",
                "content": "You are Kimi.",
            },
            {
                "role": "user",
                "content": "Please explain why 1+1=2."
            },
        ],
        max_tokens=1024*32,
        stream=True,
        temperature=1.0,
    )

    thinking = False
    for chunk in stream:
        if chunk.choices:
            choice = chunk.choices[0]
            if choice.delta and hasattr(choice.delta, "reasoning_content"):
                if not thinking:
                    thinking = True
                    print("=============Start Reasoning=============")
                print(getattr(choice.delta, "reasoning_content"), end="")
            if choice.delta and choice.delta.content:
                if thinking:
                    thinking = False
                    print("\n=============End Reasoning=============")
                print(choice.delta.content, end="")

    ```
  </Tab>
</Tabs>

### Using the Kimi K2.6 model with thinking enabled

For the `kimi-k2.6` model, thinking is enabled by default, no need to manually specify it:

<Tabs>
  <Tab title="curl">
    ```bash theme={null}
    $ curl https://api.moonshot.ai/v1/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $MOONSHOT_API_KEY" \
        -d '{
            "model": "kimi-k2.6",
            "messages": [
                {
                    "role": "system",
                    "content": "You are Kimi."
                },
                {
                    "role": "user",
                    "content": "Please explain why 1+1=2."
                }
            ]
       }'

    ```
  </Tab>

  <Tab title="python">
    ```python theme={null}
    import os
    import openai

    client = openai.Client(
        base_url="https://api.moonshot.ai/v1",
        api_key=os.getenv("MOONSHOT_API_KEY"),
    )

    stream = client.chat.completions.create(
        model="kimi-k2.6",
        messages=[
            {
                "role": "system",
                "content": "You are Kimi.",
            },
            {
                "role": "user",
                "content": "Please explain why 1+1=2."
            },
        ],
        max_tokens=1024*32,
        stream=True,
        # temperature=1.0, # For k2.6 models, use default temperature, no need to explicitly specify
        # No additional parameters needed, thinking is enabled by default
    )

    thinking = False
    for chunk in stream:
        if chunk.choices:
            choice = chunk.choices[0]
            if choice.delta and hasattr(choice.delta, "reasoning_content"):
                if not thinking:
                    thinking = True
                    print("=============Start Reasoning=============")
                print(getattr(choice.delta, "reasoning_content"), end="")
            if choice.delta and choice.delta.content:
                if thinking:
                    thinking = False
                    print("\n=============End Reasoning=============")
                print(choice.delta.content, end="")
    ```
  </Tab>
</Tabs>

### Using the Kimi K2.6 model with thinking disabled

Please refer to [Disable Thinking Capability Example](/guide/kimi-k2-6-quickstart#disable-thinking-capability-example)

## Accessing the reasoning content

In the API response for `kimi-k2-thinking` or `kimi-k2.6` (with thinking enabled) models, we use the `reasoning_content` field as the carrier for the model's reasoning. About the `reasoning_content` field:

* In the OpenAI SDK, `ChoiceDelta` and `ChatCompletionMessage` types do not provide a `reasoning_content` field directly, so you cannot access it via `.reasoning_content`. You must use `hasattr(obj, "reasoning_content")` to check if the field exists, and if so, use `getattr(obj, "reasoning_content")` to retrieve its value.
* If you use other frameworks or directly interface with the HTTP API, you can directly obtain the `reasoning_content` field at the same level as the `content` field.
* In streaming output (`stream=True`), the `reasoning_content` field will always appear before the `content` field. In your business logic, you can detect if the `content` field has been output to determine if the reasoning (inference process) is finished.
* Tokens in `reasoning_content` are also controlled by the `max_tokens` parameter: the sum of tokens in `reasoning_content` and `content` must be less than or equal to `max_tokens`.

## Multi-Step Tool Call

Both `kimi-k2-thinking` and `kimi-k2.6` (with thinking enabled) are designed to perform deep reasoning across multiple tool calls, enabling them to tackle highly complex tasks.

### Usage Notes

To get reliable results, **whether using `kimi-k2-thinking` or `kimi-k2.6` (with thinking enabled by default), always follow these configuration rules:**

* Include the entire reasoning content from the context (the reasoning\_content field) in your input. The model will decide which parts are necessary and forward them for further reasoning.
* Set max\_tokens ≥ 16,000 to ensure the full reasoning\_content and final content can be returned without truncation.
* **Set temperature = 1.0 to get the best performance. Note that `kimi-k2.6` model uses a fixed temperature of 1.0.**
* Enable streaming (stream = true). Because thinking models return both reasoning\_content and regular content, the response is larger than usual. Streaming delivers a better user experience and helps avoid network-timeout issues.

### Complete example

We walk through a complete example that shows how to properly use thinking models together with official tools for multi-step tool call and extended reasoning.

The example below demonstrates a "Daily News Report Generation" scenario. The model will sequentially call official tools like `date` (to get the date) and `web_search` (to search today's news), and will present deep reasoning throughout this process.

```python theme={null}
import os
import json
import httpx
import openai


class FormulaChatClient:
    def __init__(self, base_url: str, api_key: str):
        """Initialize Formula client"""
        self.base_url = base_url
        self.api_key = api_key
        self.openai = openai.Client(
            base_url=base_url,
            api_key=api_key,
        )
        self.httpx = httpx.Client(
            base_url=base_url,
            headers={"Authorization": f"Bearer {api_key}"},
            timeout=30.0,
        )
        # Using kimi-k2-thinking model
        # If using kimi-k2.6 model, change to "kimi-k2.6". Thinking is enabled by default
        self.model = "kimi-k2-thinking"

    def get_tools(self, formula_uri: str):
        """Get tool definitions from Formula API"""
        response = self.httpx.get(f"/formulas/{formula_uri}/tools")
        response.raise_for_status()
        
        try:
            return response.json().get("tools", [])
        except json.JSONDecodeError as e:
            print(f"Error: Unable to parse JSON (status code: {response.status_code})")
            print(f"Response content: {response.text[:500]}")
            raise

    def call_tool(self, formula_uri: str, function: str, args: dict):
        """Call an official tool"""
        response = self.httpx.post(
            f"/formulas/{formula_uri}/fibers",
            json={"name": function, "arguments": json.dumps(args)},
        )
        response.raise_for_status()
        fiber = response.json()
        
        if fiber.get("status", "") == "succeeded":
            return fiber["context"].get("output") or fiber["context"].get("encrypted_output")
        
        if "error" in fiber:
            return f"Error: {fiber['error']}"
        if "error" in fiber.get("context", {}):
            return f"Error: {fiber['context']['error']}"
        return "Error: Unknown error"

    def close(self):
        """Close the client connection"""
        self.httpx.close()


# Initialize client
base_url = os.getenv("MOONSHOT_BASE_URL", "https://api.moonshot.ai/v1")
api_key = os.getenv("MOONSHOT_API_KEY")

if not api_key:
    raise ValueError("MOONSHOT_API_KEY environment variable not set. Please set your API key.")

print(f"Base URL: {base_url}")
print(f"API Key: {api_key[:10]}...{api_key[-10:] if len(api_key) > 20 else api_key}\n")

client = FormulaChatClient(base_url, api_key)

# Define the official tool Formula URIs to use
formula_uris = [
    "moonshot/date:latest",
    "moonshot/web-search:latest"
]

# Load all tool definitions and build mapping
print("Loading official tools...")
all_tools = []
tool_to_uri = {}  # function.name -> formula_uri

for uri in formula_uris:
    try:
        tools = client.get_tools(uri)
        for tool in tools:
            func = tool.get("function")
            if func:
                func_name = func.get("name")
                if func_name:
                    tool_to_uri[func_name] = uri
                    all_tools.append(tool)
                    print(f"  Loaded tool: {func_name} from {uri}")
    except Exception as e:
        print(f"  Warning: Failed to load tool {uri}: {e}")
        continue

print(f"Loaded {len(all_tools)} tools in total\n")

if not all_tools:
    raise ValueError("No tools loaded. Please check API key and network connection.")

# Initialize message list
messages = [
    {
        "role": "system",
        "content": "You are Kimi, a professional news analyst. You excel at collecting, analyzing, and organizing information to generate high-quality news reports.",
    },
]

# User request to generate today's news report
user_request = "Please help me generate a daily news report including important technology, economy, and society news."
messages.append({
    "role": "user",
    "content": user_request
})

print(f"User request: {user_request}\n")

# Begin multi-step conversation loop
max_iterations = 10  # Prevent infinite loops
for iteration in range(max_iterations):
    try:
        completion = client.openai.chat.completions.create(
            model=client.model,
            messages=messages,
            max_tokens=1024 * 32,
            tools=all_tools,
            temperature=1.0,
        )
    except openai.AuthenticationError as e:
        print(f"Authentication error: {e}")
        print("Please check if the API key is correct and has the required permissions")
        raise
    except Exception as e:
        print(f"Error while calling the model: {e}")
        raise
    
    # Get response
    message = completion.choices[0].message
    
    # Print reasoning process
    if hasattr(message, "reasoning_content"):
        print(f"=============Reasoning round {iteration + 1} starts=============")
        reasoning = getattr(message, "reasoning_content")
        if reasoning:
            print(reasoning[:500] + "..." if len(reasoning) > 500 else reasoning)
        print(f"=============Reasoning round {iteration + 1} ends=============\n")
    
    # Add assistant message to context (preserve reasoning_content)
    messages.append(message)
    
    # If the model did not call any tools, conversation is done
    if not message.tool_calls:
        print("=============Final Answer=============")
        print(message.content)
        break
    
    # Handle tool calls
    print(f"The model decided to call {len(message.tool_calls)} tool(s):\n")
    
    for tool_call in message.tool_calls:
        func_name = tool_call.function.name
        args = json.loads(tool_call.function.arguments)
        
        print(f"Calling tool: {func_name}")
        print(f"Arguments: {json.dumps(args, ensure_ascii=False, indent=2)}")
        
        # Get corresponding formula_uri
        formula_uri = tool_to_uri.get(func_name)
        if not formula_uri:
            print(f"Error: Could not find Formula URI for tool {func_name}")
            continue
        
        # Call the tool
        result = client.call_tool(formula_uri, func_name, args)
        
        # Print result (truncate if too long)
        if len(str(result)) > 200:
            print(f"Tool result: {str(result)[:200]}...\n")
        else:
            print(f"Tool result: {result}\n")
        
        # Add tool result to message list
        tool_message = {
            "role": "tool",
            "tool_call_id": tool_call.id,
            "name": func_name,
            "content": result
        }
        messages.append(tool_message)

print("\nConversation completed!")

# Cleanup
client.close()
```

This process demonstrates how the `kimi-k2-thinking` or `kimi-k2.6` (with thinking enabled) model uses deep reasoning to plan and execute complex multi-step tasks, with detailed reasoning steps (`reasoning_content`) preserved in the context to ensure accurate tool use at every stage.

## Preserved Thinking

### What is Preserved Thinking

Preserved Thinking means passing the `reasoning_content` of previous turns through to the model in a multi-turn conversation, so that the model can continue its prior chain of thought when reasoning in the current turn.

For `kimi-k2.6`, use the `thinking.keep` parameter in the request body to control whether historical thinking is preserved:

| Value                      | Behavior                                                                        |
| -------------------------- | ------------------------------------------------------------------------------- |
| `null` / omitted (default) | Historical `reasoning_content` is ignored. Shorter context and lower cost.      |
| `"all"`                    | Historical `reasoning_content` is fully preserved, enabling Preserved Thinking. |

<Note>
  `thinking.keep` only affects `reasoning_content` from historical turns; it does **not** change whether the model generates/outputs thinking content within the current turn (that is controlled by `thinking.type`). Recommended to use `keep: "all"` together with `type: "enabled"`.
</Note>

### How to use

When using `keep: "all"`, keep the `reasoning_content` from every historical assistant message in `messages` as-is. The simplest way is to append the assistant message returned from the previous API call directly back into `messages`.

<Tabs>
  <Tab title="curl">
    ```bash theme={null}
    $ curl https://api.moonshot.ai/v1/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $MOONSHOT_API_KEY" \
        -d '{
            "model": "kimi-k2.6",
            "messages": [
                {"role": "system", "content": "You are Kimi."},
                {"role": "user", "content": "First question..."},
                {
                    "role": "assistant",
                    "reasoning_content": "<reasoning_content returned by the previous API call>",
                    "content": "<final answer returned by the previous API call>"
                },
                {"role": "user", "content": "Please continue the analysis and derive the next step."}
            ],
            "thinking": {
                "type": "enabled",
                "keep": "all"
            }
       }'
    ```
  </Tab>

  <Tab title="python">
    ```python theme={null}
    import os
    import openai

    client = openai.Client(
        base_url="https://api.moonshot.ai/v1",
        api_key=os.getenv("MOONSHOT_API_KEY"),
    )

    # Keep the assistant message (including reasoning_content) from every previous API call in messages
    messages = [
        {"role": "system", "content": "You are Kimi."},
        {"role": "user", "content": "First question..."},
        {
            "role": "assistant",
            "reasoning_content": "<reasoning_content returned by the previous API call>",
            "content": "<final answer returned by the previous API call>",
        },
        {"role": "user", "content": "Please continue the analysis and derive the next step."},
    ]

    response = client.chat.completions.create(
        model="kimi-k2.6",
        messages=messages,
        stream=True,
        extra_body={"thinking": {"type": "enabled", "keep": "all"}},
    )
    ```
  </Tab>
</Tabs>

<Warning>
  `reasoning_content` counts toward token consumption. When Preserved Thinking is enabled, historical thinking content keeps occupying the context window and is billed accordingly. Use it wisely.
</Warning>

## Frequently Asked Questions

### Q1: Why should I keep `reasoning_content`?

A: Keeping the `reasoning_content` ensures the model maintains reasoning continuity in multi-step reasoning scenarios, especially when calling tools. The server will automatically handle these fields; users do not need to manage them manually.

### Q2: Does `reasoning_content` consume extra tokens?

A: Yes, `reasoning_content` counts towards your input/output token quota. For detailed pricing, please refer to MoonshotAI's pricing documentation.
