🎉 Kimi K2.7 Code model has been officially released, Kimi's strongest Coding model. Highspeed version launched simultaneously. Limited-time promotion in progress
kimi-k2.7-code (latest): code-focused; thinking is always on, and Preserved Thinking is always on. Its high-speed variant kimi-k2.7-code-highspeed is the same model with identical thinking behavior, and everything on this page applies to it as well.
kimi-k2.6: the general-purpose thinking model; thinking is on by default, can be disabled, and supports Preserved Thinking.
kimi-k2.5: a general-purpose thinking model; thinking is on by default and can be disabled, but does not support Preserved Thinking.
The thinking parameter behaves differently across these models:
thinking field
kimi-k2.7-code
kimi-k2.6
kimi-k2.5
type (thinking switch)
Only "enabled"; always thinks. Passing "disabled" errors
"enabled" (default) / "disabled"
"enabled" (default) / "disabled"
keep (Preserved Thinking)
Omitting it or passing the valid value "all" is treated as "all" (always on, cannot be turned off); any other invalid value errors
null (default, not kept) / "all" (enables it)
No such parameter; not supported
If you are doing benchmark testing with kimi api, please refer to this benchmark best practice.
kimi-k2.7-code is the latest code-focused thinking model, sharing the same thinking mechanism as kimi-k2.6 (reasoning_content, multi-step tool calls, streaming, etc.); the only difference is in the thinking parameter (see the comparison table above).When using kimi-k2.7-code you do not need to (and should not) pass the thinking parameter — just switch the model, and the model always emits reasoning_content. Because Preserved Thinking is always on, in multi-turn conversations you must keep the reasoning_content of every historical assistant message in messages as-is.
import osimport openaiclient = openai.Client( base_url="https://api.moonshot.ai/v1", api_key=os.getenv("MOONSHOT_API_KEY"),)stream = client.chat.completions.create( model="kimi-k2.7-code", messages=[ { "role": "system", "content": "You are Kimi.", }, { "role": "user", "content": "Implement quicksort in Python." }, ], max_tokens=1024*32, stream=True, # temperature is not modifiable and thinking is always on; neither needs to be set)thinking = Falsefor chunk in stream: if chunk.choices: choice = chunk.choices[0] if choice.delta and hasattr(choice.delta, "reasoning_content"): if not thinking: thinking = True print("=============Start Reasoning=============") print(getattr(choice.delta, "reasoning_content"), end="") if choice.delta and choice.delta.content: if thinking: thinking = False print("\n=============End Reasoning=============") print(choice.delta.content, end="")
kimi-k2.6 is the general-purpose thinking model. Thinking is enabled by default, so the basic call below outputs reasoning content without passing the thinking parameter (to disable thinking or enable Preserved Thinking, see The thinking parameter below):
import osimport openaiclient = openai.Client( base_url="https://api.moonshot.ai/v1", api_key=os.getenv("MOONSHOT_API_KEY"),)stream = client.chat.completions.create( model="kimi-k2.6", messages=[ { "role": "system", "content": "You are Kimi.", }, { "role": "user", "content": "Please explain why 1+1=2." }, ], max_tokens=1024*32, stream=True, # temperature is not modifiable, so no need to set it; thinking is enabled by default, no extra parameters needed)thinking = Falsefor chunk in stream: if chunk.choices: choice = chunk.choices[0] if choice.delta and hasattr(choice.delta, "reasoning_content"): if not thinking: thinking = True print("=============Start Reasoning=============") print(getattr(choice.delta, "reasoning_content"), end="") if choice.delta and choice.delta.content: if thinking: thinking = False print("\n=============End Reasoning=============") print(choice.delta.content, end="")
kimi-k2.6 controls thinking behavior via the thinking parameter, which has two sub-fields:
thinking.type: "enabled" (default) | "disabled" — controls whether thinking is on. Since it defaults to "enabled", the example above thinks without passing it explicitly; for a disable example see Disable Thinking Capability Example.
thinking.keep: null (default, ignores historical turns’ thinking) | "all" (keeps previous turns’ reasoning_content, enabling Preserved Thinking — see that section for usage).
In the API response for thinking models such as kimi-k2.7-code and kimi-k2.6 (with thinking enabled), we use the reasoning_content field as the carrier for the model’s reasoning. About the reasoning_content field:
In the OpenAI SDK, ChoiceDelta and ChatCompletionMessage types do not provide a reasoning_content field directly, so you cannot access it via .reasoning_content. You must use hasattr(obj, "reasoning_content") to check if the field exists, and if so, use getattr(obj, "reasoning_content") to retrieve its value.
If you use other frameworks or directly interface with the HTTP API, you can directly obtain the reasoning_content field at the same level as the content field.
In streaming output (stream=True), the reasoning_content field will always appear before the content field. In your business logic, you can detect if the content field has been output to determine if the reasoning (inference process) is finished.
Tokens in reasoning_content are also controlled by the max_tokens parameter: the sum of tokens in reasoning_content and content must be less than or equal to max_tokens.
kimi-k2.7-code and kimi-k2.6 (with thinking enabled) are designed to perform deep reasoning across multiple tool calls, enabling them to tackle highly complex tasks.
To get reliable results, when using thinking models such as kimi-k2.7-code and kimi-k2.6, always follow these configuration rules:
Within a single task (the multi-step reasoning produced during one tool-call loop), keep all of the reasoning content from the context (the reasoning_content field) and send it back with the request; the model will choose which parts are necessary and forward them for reasoning. Whether historical thinking is preserved across turns is controlled by thinking.keep (kimi-k2.6 defaults to null and does not keep it, while kimi-k2.7-code always keeps it).
Set max_tokens >= 16000 to ensure the full reasoning_content and content can be returned without truncation.
Do not set temperature. For kimi-k2.7-code and kimi-k2.6, temperature is not modifiable — use the default and do not pass it explicitly (see Model Parameter Reference).
Enable streaming (stream=True). Because thinking models return both reasoning_content and regular content, the response is larger than usual. Streaming delivers a better user experience and helps avoid network-timeout issues.
We walk through a complete example that shows how to properly use thinking models together with official tools for multi-step tool call and extended reasoning.The example below demonstrates a “Daily News Report Generation” scenario. The model will sequentially call official tools like date (to get the date) and web_search (to search today’s news), and will present deep reasoning throughout this process.
import osimport jsonimport httpximport openaiclass FormulaChatClient: def __init__(self, base_url: str, api_key: str): """Initialize Formula client""" self.base_url = base_url self.api_key = api_key self.openai = openai.Client( base_url=base_url, api_key=api_key, ) self.httpx = httpx.Client( base_url=base_url, headers={"Authorization": f"Bearer {api_key}"}, timeout=30.0, ) # Using kimi-k2.6 model. Thinking is enabled by default self.model = "kimi-k2.6" def get_tools(self, formula_uri: str): """Get tool definitions from Formula API""" response = self.httpx.get(f"/formulas/{formula_uri}/tools") response.raise_for_status() try: return response.json().get("tools", []) except json.JSONDecodeError as e: print(f"Error: Unable to parse JSON (status code: {response.status_code})") print(f"Response content: {response.text[:500]}") raise def call_tool(self, formula_uri: str, function: str, args: dict): """Call an official tool""" response = self.httpx.post( f"/formulas/{formula_uri}/fibers", json={"name": function, "arguments": json.dumps(args)}, ) response.raise_for_status() fiber = response.json() if fiber.get("status", "") == "succeeded": return fiber["context"].get("output") or fiber["context"].get("encrypted_output") if "error" in fiber: return f"Error: {fiber['error']}" if "error" in fiber.get("context", {}): return f"Error: {fiber['context']['error']}" return "Error: Unknown error" def close(self): """Close the client connection""" self.httpx.close()# Initialize clientbase_url = os.getenv("MOONSHOT_BASE_URL", "https://api.moonshot.ai/v1")api_key = os.getenv("MOONSHOT_API_KEY")if not api_key: raise ValueError("MOONSHOT_API_KEY environment variable not set. Please set your API key.")print(f"Base URL: {base_url}")print(f"API Key: {api_key[:10]}...{api_key[-10:] if len(api_key) > 20 else api_key}\n")client = FormulaChatClient(base_url, api_key)# Define the official tool Formula URIs to useformula_uris = [ "moonshot/date:latest", "moonshot/web-search:latest"]# Load all tool definitions and build mappingprint("Loading official tools...")all_tools = []tool_to_uri = {} # function.name -> formula_urifor uri in formula_uris: try: tools = client.get_tools(uri) for tool in tools: func = tool.get("function") if func: func_name = func.get("name") if func_name: tool_to_uri[func_name] = uri all_tools.append(tool) print(f" Loaded tool: {func_name} from {uri}") except Exception as e: print(f" Warning: Failed to load tool {uri}: {e}") continueprint(f"Loaded {len(all_tools)} tools in total\n")if not all_tools: raise ValueError("No tools loaded. Please check API key and network connection.")# Initialize message listmessages = [ { "role": "system", "content": "You are Kimi, a professional news analyst. You excel at collecting, analyzing, and organizing information to generate high-quality news reports.", },]# User request to generate today's news reportuser_request = "Please help me generate a daily news report including important technology, economy, and society news."messages.append({ "role": "user", "content": user_request})print(f"User request: {user_request}\n")# Begin multi-step conversation loopmax_iterations = 10 # Prevent infinite loopsfor iteration in range(max_iterations): try: completion = client.openai.chat.completions.create( model=client.model, messages=messages, max_tokens=1024 * 32, tools=all_tools, ) except openai.AuthenticationError as e: print(f"Authentication error: {e}") print("Please check if the API key is correct and has the required permissions") raise except Exception as e: print(f"Error while calling the model: {e}") raise # Get response message = completion.choices[0].message # Print reasoning process if hasattr(message, "reasoning_content"): print(f"=============Reasoning round {iteration + 1} starts=============") reasoning = getattr(message, "reasoning_content") if reasoning: print(reasoning[:500] + "..." if len(reasoning) > 500 else reasoning) print(f"=============Reasoning round {iteration + 1} ends=============\n") # Add assistant message to context (preserve reasoning_content) messages.append(message) # If the model did not call any tools, conversation is done if not message.tool_calls: print("=============Final Answer=============") print(message.content) break # Handle tool calls print(f"The model decided to call {len(message.tool_calls)} tool(s):\n") for tool_call in message.tool_calls: func_name = tool_call.function.name args = json.loads(tool_call.function.arguments) print(f"Calling tool: {func_name}") print(f"Arguments: {json.dumps(args, ensure_ascii=False, indent=2)}") # Get corresponding formula_uri formula_uri = tool_to_uri.get(func_name) if not formula_uri: print(f"Error: Could not find Formula URI for tool {func_name}") continue # Call the tool result = client.call_tool(formula_uri, func_name, args) # Print result (truncate if too long) if len(str(result)) > 200: print(f"Tool result: {str(result)[:200]}...\n") else: print(f"Tool result: {result}\n") # Add tool result to message list tool_message = { "role": "tool", "tool_call_id": tool_call.id, "name": func_name, "content": result } messages.append(tool_message)print("\nConversation completed!")# Cleanupclient.close()
This process demonstrates how thinking models such as kimi-k2.7-code and kimi-k2.6 (with thinking enabled) use deep reasoning to plan and execute complex multi-step tasks, with detailed reasoning steps (reasoning_content) preserved in the context to ensure accurate tool use at every stage.
Preserved Thinking means passing the reasoning_content of previous turns through to the model in a multi-turn conversation, so that the model can continue its prior chain of thought when reasoning in the current turn.For kimi-k2.6, use the thinking.keep parameter in the request body to control whether historical thinking is preserved:
Value
Behavior
null / omitted (default)
Historical reasoning_content is ignored. Shorter context and lower cost.
"all"
Historical reasoning_content is fully preserved, enabling Preserved Thinking.
thinking.keep only affects reasoning_content from historical turns; it does not change whether the model generates/outputs thinking content within the current turn (that is controlled by thinking.type). Recommended to use keep: "all" together with type: "enabled".
For kimi-k2.7-code, Preserved Thinking is always on and cannot be turned off: thinking.keep is treated as "all" whether you omit it or pass the only valid value "all" (passing any other invalid value returns an error). When using this model you must therefore (not optionally) keep the reasoning_content of historical assistant messages in messages as-is, exactly as shown in the example below.
When using keep: "all", keep the reasoning_content from every historical assistant message in messages as-is. The simplest way is to append the assistant message returned from the previous API call directly back into messages.
curl
python
$ curl https://api.moonshot.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $MOONSHOT_API_KEY" \ -d '{ "model": "kimi-k2.6", "messages": [ {"role": "system", "content": "You are Kimi."}, {"role": "user", "content": "First question..."}, { "role": "assistant", "reasoning_content": "<reasoning_content returned by the previous API call>", "content": "<final answer returned by the previous API call>" }, {"role": "user", "content": "Please continue the analysis and derive the next step."} ], "thinking": { "type": "enabled", "keep": "all" } }'
import osimport openaiclient = openai.Client( base_url="https://api.moonshot.ai/v1", api_key=os.getenv("MOONSHOT_API_KEY"),)# Keep the assistant message (including reasoning_content) from every previous API call in messagesmessages = [ {"role": "system", "content": "You are Kimi."}, {"role": "user", "content": "First question..."}, { "role": "assistant", "reasoning_content": "<reasoning_content returned by the previous API call>", "content": "<final answer returned by the previous API call>", }, {"role": "user", "content": "Please continue the analysis and derive the next step."},]response = client.chat.completions.create( model="kimi-k2.6", messages=messages, stream=True, extra_body={"thinking": {"type": "enabled", "keep": "all"}},)
reasoning_content counts toward token consumption. When Preserved Thinking is enabled, historical thinking content keeps occupying the context window and is billed accordingly. Use it wisely.
A: Keeping the reasoning_content ensures the model maintains reasoning continuity in multi-step reasoning scenarios, especially when calling tools. The server will automatically handle these fields; users do not need to manage them manually.