Frequently Asked Questions and Solutions

Why Does the Kimi Large Language Model Repeatedly Call the Same Tool When Using `tool_calls`?

When using tool calls (tool_calls), the Kimi large language model may issue multiple tool calls based on the current context. If the model repeatedly calls the same tool, each call uses the same function.name and function.arguments, and the tool result does not provide new useful information, you can treat it as a repeated tool call. When handling this issue, we recommend checking the message layout first:

When the Kimi API returns finish_reason=tool_calls, make sure the returned choice.message has been added to the messages list as is;
Make sure each tool_call has a corresponding message with role=tool;
Make sure the tool_call_id in the role=tool message exactly matches the corresponding tool_call.id;
If you use stream output stream=True, make sure the streamed tool_calls have been assembled correctly, especially the function.arguments field.

If the message layout is correct but the model still repeatedly calls the same tool with the same arguments, you can add repeated-call detection on the client side and append a reminder to the system prompt in the next request. When the same tool and the same arguments are repeated 3 consecutive times, you can append:

<system-reminder>
You are repeating the exact same tool call with identical parameters. Please carefully analyze the previous result. If the task is not yet complete, try a different method or parameters instead of repeating the same call.
</system-reminder>

When the repeated call reaches 5 consecutive times, you can append a stronger reminder that includes the tool name, repeat count, and arguments:

<system-reminder>
You have repeatedly called the same tool with identical parameters many times.
Repeated tool call detected:
- tool: {tool_name}
- repeated_times: {repeat_count}
- arguments: {tool_arguments}
The previous repeated calls did not make progress. Do not call this exact same tool with the exact same arguments again.
Carefully inspect the latest tool result and choose a different next action, different parameters, or finish the task if enough evidence has been gathered.
</system-reminder>

If the same tool and the same arguments are repeated 8 consecutive times, we recommend appending the stronger reminder again. Note: <system-reminder> is only an example prompt, not a special field of the Kimi API. You can merge it into the next role=system message, or write it into the system prompt based on your own message management logic. To avoid false positives, we recommend triggering this reminder only when the same tool, the same arguments, consecutive repetition, and no new progress from the tool result are all true. Note: <system-reminder> is only an example prompt, not a special field of the Kimi API. You can merge it into the next role=system message, or write it into the system prompt based on your own message management logic. To avoid false positives, we recommend triggering this reminder only when the same tool, the same arguments, consecutive repetition, and no new progress from the tool result are all true. Note: <system-reminder> is only an example prompt, not a special field of the Kimi API. You can merge it into the next role=system message, or write it into the system prompt based on your own message management logic. To avoid false positives, we recommend triggering this reminder only when the same tool, the same arguments, consecutive repetition, and no new progress from the tool result are all true. Note: <system-reminder> is only an example prompt, not a special field of the Kimi API. You can merge it into the next role=system message, or write it into the system prompt based on your own message management logic. To avoid false positives, we recommend triggering this reminder only when the same tool, the same arguments, consecutive repetition, and no new progress from the tool result are all true. Note: <system-reminder> is only an example prompt, not a special field of the Kimi API. You can merge it into the next role=system message, or write it into the system prompt based on your own message management logic. To avoid false positives, we recommend triggering this reminder only when the same tool, the same arguments, consecutive repetition, and no new progress from the tool result are all true. Note: <system-reminder> is only an example prompt, not a special field of the Kimi API. You can merge it into the next role=system message, or write it into the system prompt based on your own message management logic. To avoid false positives, we recommend triggering this reminder only when the same tool, the same arguments, consecutive repetition, and no new progress from the tool result are all true.

Why are the results from the API different from those from the Kimi large language model?

The API and the Kimi large language model use the same underlying model. If you notice discrepancies in the output, you can try modifying the System Prompt. Additionally, the Kimi large language model includes tools like a calculator, which are not provided by default in the API. Users need to assemble these tools themselves.

Does the Kimi API have the “web surfing” feature of the Kimi large language model?

No. The Kimi API only provides the interaction functionality of the large language model itself and does not have additional “content search” or “web page browsing” features, which are commonly referred to as “internet search” capabilities. Now, the Kimi API offers web search functionality. Please refer to our guide: Using the Web Search Feature of the Kimi API If you want to implement web search functionality through the Kimi API yourself, you can also refer to our tool_calls guide: Using the Kimi API for Tool Calls If you seek assistance from the open-source community, you can refer to the following open-source projects:

If you are looking for services provided by professional vendors, the following options are available:

The content returned by the Kimi API is incomplete or truncated

If you find that the content returned by the Kimi API is incomplete, truncated, or does not meet the expected length, you can first check the value of the choice.finish_reason field in the response. If this value is length, it means that the number of Tokens in the content generated by the current model exceeds the max_completion_tokens parameter in the request. In this case, the Kimi API will only return content within the max_completion_tokens limit, and any excess content will be discarded, resulting in the aforementioned “incomplete content” or “truncated content.” When encountering finish_reason=length, if you want the Kimi large language model to continue generating content from where it left off, you can use the Partial Mode provided by the Kimi API. For detailed documentation, please refer to: Using the Partial Mode Feature of the Kimi API To avoid finish_reason=length, we recommend increasing the value of max_completion_tokens. Our best practice suggestion is: use the estimate-token-count API to calculate the number of Tokens in the input content, then subtract this number from the maximum context window supported by your selected model. For example, moonshot-v1-32k supports up to 32k Tokens, while kimi-k2.6,kimi-k2.5, kimi-k2-0905-preview, and kimi-k2-turbo-preview support up to 256k Tokens. The remaining value can be used as the upper bound for max_completion_tokens in the current request.

What is the output length of the Kimi large language model?

For the moonshot-v1-8k model, the maximum output length is 8*1024 - prompt_tokens;
For the moonshot-v1-32k model, the maximum output length is 32*1024 - prompt_tokens;
For the moonshot-v1-128k model, the maximum output length is 128*1024 - prompt_tokens;
For the kimi-k2.6, kimi-k2.5, kimi-k2-0905-preview, and kimi-k2-turbo-preview models, the maximum output length is 256*1024 - prompt_tokens;

How many Chinese characters does the Kimi large language model support?

The moonshot-v1-8k model supports approximately 15,000 Chinese characters;
The moonshot-v1-32k model supports approximately 60,000 Chinese characters;
The moonshot-v1-128k model supports approximately 200,000 Chinese characters;
The kimi-k2.6, kimi-k2.5, kimi-k2-0905-preview, and kimi-k2-turbo-preview models support approximately 400,000 Chinese characters;

Note: These are estimated values and actual results may vary.

Inaccurate file content extraction or inability to recognize images

We offer file upload and parsing services for various file formats. For text files, we extract the text content; for image files, we use OCR to recognize text in the images; for PDF documents, if the PDF contains only images, we use OCR to extract text from the images, otherwise we only extract the text content.; Note that for images, we only use OCR to extract text content, so if your image does not contain any text, it will result in a parsing failure error. For a complete list of supported file formats, please refer to: File Interface

When using the `files` interface, I want to reference file content using `file_id`

We currently do not support referencing file content using the file file_id.

Error `content_filter: The request was rejected because it was considered high risk`

The input to the Kimi API or the output from the Kimi large language model contains unsafe or sensitive content. Note: The content generated by the Kimi large language model may also contain unsafe or sensitive content, which can lead to the content_filter error. If you frequently encounter errors such as Connection Error or Connection Time Out while using the Kimi API, please check the following in order:

Whether your program code or the SDK you are using has a default timeout setting;
Whether you are using any type of proxy server and check the network and timeout settings of the proxy server;

Another scenario that may lead to connection-related errors is when the number of Tokens generated by the Kimi large language model is too high and stream output stream=True is not enabled. This can cause the waiting time for the Kimi large language model to generate content to exceed the timeout settings of an intermediate gateway. Typically, some gateway applications determine whether a request is valid by detecting whether a status_code and header are received from the server. When not using stream output stream=True, the Kimi server will wait for the Kimi large language model to finish generating content before sending the header. While waiting for the header to return, some gateway applications may close connections that have been waiting for too long, resulting in connection-related errors. We recommend enabling stream output stream=True to minimize connection-related errors.

The TPM and RPM limits shown in the error message do not match my account Tier level

If you encounter a rate_limit_reached_error while using the Kimi API, such as:

rate_limit_reached_error: Your account {uid}<{ak-id}> request reached TPM rate limit, current:{current_tpm}, limit:{max_tpm}

and the TPM or RPM limits in the error message do not match the TPM and RPM you see in the backend, please first check whether you are using the correct api_key for your account. In most cases, the reason for the mismatch between TPM and RPM and expectations is the use of an incorrect api_key, such as mistakenly using an api_key provided by another user, or mixing up api_keys when you have multiple accounts.

`model_not_found` Error

Make sure you have correctly set base_url=https://api.moonshot.ai/v1 in your SDK. The model_not_found error usually occurs because the base_url value is not set when using the OpenAI SDK. As a result, requests are sent to the OpenAI server, and OpenAI returns the model_not_found error.

Numerical Calculation Errors in the Kimi Large Language Model

Due to the uncertainty in the generation process of the Kimi large language model, it may produce calculation errors of varying degrees when performing numerical computations. We recommend using tool calls (tool_calls) to provide the Kimi large language model with calculator functionality. For more information on tool calls (tool_calls), you can refer to our guide on Using the Kimi API for Tool Calls (tool_calls).

The Kimi Large Language Model Cannot Answer Today’s Date

The Kimi large language model cannot access highly time-sensitive information such as the current date. However, you can provide this information to the Kimi large language model through the system prompt. For example:

python
node.js

import os
from datetime import datetime
from openai import OpenAI

client = OpenAI(
    api_key=os.environ['MOONSHOT_API_KEY'],
    base_url="https://api.moonshot.ai/v1",
)

# We generate the current date using the datetime library and add it to the system prompt
system_prompt = f"""
You are Kimi, and today's date is {datetime.now().strftime('%d.%m.%Y %H:%M:%S')}
"""

completion = client.chat.completions.create(
    model="moonshot-v1-128k",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "What's today's date?"},
    ],
    temperature=0.3,
)

print(completion.choices[0].message.content)  # Output: Today's date is July 31, 2024.

const OpenAI = require('openai')
 
client = new OpenAI({
    apiKey: process.env.MOONSHOT_API_KEY,
    baseURL: "https://api.moonshot.ai/v1",
})
 
// We generate the current date using the datetime library and add it to the system prompt
system_prompt = `You are Kimi, and today's date is ${new Date().toString()}`

async function main() {
    completion = await client.chat.completions.create({
        model: "moonshot-v1-128k",
        messages: [
            {role: "system", content: system_prompt},
            {role: "user", content: "What's today's date?"},
        ],
        temperature: 0.3,
    })
     
    console.log(completion.choices[0].message.content)  // Output: Today's date is July 31, 2024.
}

main()
 

How to Handle Errors Without Using an SDK

In some cases, you might need to directly interface with the Kimi API (instead of using the OpenAI SDK). When interfacing with the Kimi API directly, you need to determine the subsequent processing logic based on the status returned by the API. Typically, we use the HTTP status code 200 to indicate a successful request, while 4xx and 5xx status codes indicate a failed request. We provide error information in JSON format. For specific handling logic based on the request status, please refer to the following code snippets:

python
node.js

import os
import httpx

header = {
    "Authorization": f"Bearer {os.environ['MOONSHOT_API_KEY']}",
}

messages = [
    {"role": "system", "content": "You are Kimi"},
    {"role": "user", "content": "Hello."},
]

r = httpx.post("https://api.moonshot.ai/v1/chat/completions",
               headers=header,
               json={
                   "model": "moonshot-v1-128k",  # <-- If you use a correct model, the code will enter the if status_code==200 branch below
                   # "model": "moonshot-v1-129k",  # <-- If you use an incorrect model name, the code will enter the else branch below
                   "messages": messages,
                   "temperature": 0.3,
               })

if r.status_code == 200:  # When a correct model is used for the request, this branch is entered for normal processing
    completion = r.json()
    print(completion["choices"][0]["message"]["content"])
else:  # When an incorrect model name is used for the request, this branch is entered for error handling
    # Here, for demonstration purposes, we simply print the error.
    # In actual code logic, you might need more processing, such as logging the error, interrupting the request, or retrying.
    error = r.json()
    print(f"error: status={r.status_code}, type='{error['error']['type']}', message='{error['error']['message']}'")

const axios = require('axios');
 
header = {
    "Authorization": `Bearer ${process.env.MOONSHOT_API_KEY}`,
}
 
messages = [
    {"role": "system", "content": "You are Kimi"},
    {"role": "user", "content": "Hello."},
]

async function main() {
    r = await axios.post("https://api.moonshot.ai/v1/chat/completions",
        {
            "model": "moonshot-v1-128k",  // <-- If you use a correct model, the code will enter the if status_code==200 branch below
            //"model": "moonshot-v1-129k",  // <-- If you use an incorrect model name, the code will enter the else branch below
            "messages": messages,
            "temperature": 0.3,
        },
        {
            headers: header,
            validateStatus: function (status) {
                return status == 200; // Resolve only if the status code is less than 500
            }
        },
     ).catch(function (error) {
        console.log(`error: ${error.message}`)
     })

    if (r) {  // When a correct model is used for the request, this branch is entered for normal processing
        console.log(r.data.choices[0].message.content)
    }
}

main()

Our error messages will follow this format:

{
	"error": {
		"type": "error_type",
		"message": "error_message"
	}
}

For a detailed list of error messages, please refer to the following section: Error Description

Why Do Some Requests Respond Quickly While Others Respond Slowly When the Prompt Is Similar?

If you find that some requests respond quickly (e.g., in just 3 seconds) while others respond slowly (e.g., taking up to 20 seconds) with similar prompts, it is usually because the Kimi large language model generates a different number of tokens. Generally, the number of tokens generated by the Kimi large language model is directly proportional to the response time of the Kimi API; the more tokens generated, the longer the complete response time. It is important to note that the number of tokens generated by the Kimi large language model only affects the response time for the complete request (i.e., after generating the last token). You can set stream=True and observe the time to first token (TTFT) return time. Under normal circumstances, when the length of the prompt is similar, the first token response time will not vary significantly.

I Set `max_completion_tokens=2000` to Have Kimi Output 2000 Characters, but the Output Is Less Than 2000 Characters

Note: max_tokens is deprecated. Please use max_completion_tokens instead. Both fields have the same meaning.

The max_completion_tokens parameter means: When calling /v1/chat/completions, it specifies the maximum number of tokens the model is allowed to generate. When the number of tokens already generated by the model exceeds the set max_completion_tokens, the model will stop generating the next token. The purpose of max_completion_tokens is:

To help the caller determine which model to use (for example, when prompt_tokens + max_completion_tokens ≤ 8 * 1024, you can choose the moonshot-v1-8k model);
To prevent the Kimi model from generating excessive unexpected content in certain unexpected situations, which could lead to additional cost consumption (for example, the Kimi model repeatedly outputs blank characters).

max_completion_tokens does not indicate how many tokens the Kimi large language model will output. In other words, max_completion_tokens will not be used as part of the prompt input to the Kimi large language model. If you want the model to output a specific number of characters, you can refer to the following general solutions:

For occasions where the output content should be within 1000 characters:
1. Specify the number of characters in the prompt to the Kimi large language model;
2. Manually or programmatically check if the output character count meets expectations. If not, in the second round of conversation, indicate to the Kimi large language model that the “character count is too high” or “character count is too low” to generate a new round of content.
For occasions where the output content should be more than 1000 characters or even more:
1. Try to break down the expected output content into several parts by structure or chapter and create a template, using placeholders to mark the positions where you want the Kimi large language model to output content;
2. Have the Kimi large language model fill in each placeholder of the template one by one, and finally assemble the complete long text.

I Made Only One Request in a Minute, but Triggered the `Your account reached max request` Error

Typically, the SDK provided by OpenAI includes a retry mechanism:

Certain errors are automatically retried 2 times by default, with a short exponential backoff. Connection errors (for example, due to a network connectivity problem), 408 Request Timeout, 409 Conflict, 429 Rate Limit, and >=500 Internal errors are all retried by default.

This retry mechanism will automatically retry 2 times (a total of 3 requests) when encountering an error. Generally speaking, in cases of unstable network conditions or other situations that may cause request errors, using the OpenAI SDK can amplify a single request into 2 to 3 requests, all of which will count towards your RPM (requests per minute) limit. Note: For users using the OpenAI SDK with a tier0 account level, due to the default retry mechanism, a single erroneous request can exhaust the entire RPM quota.

To Facilitate Transmission, I Used `base64` Encoding for My Text Content

Please do not do this. Encoding your files with base64 will result in a huge consumption of tokens. If your file type is supported by our /v1/files file interface, you can simply upload the file and extract its content using the file interface. For binary or other encoded file formats, the Kimi large language model currently cannot parse the content, so please do not add it to the context.

Why Can’t I Use the Key Applied on the platform.kimi.com Platform on the platform.kimi.ai Platform?

Kimi Open Platform officially provides two platforms, with the mainland China platform recommended to use platform.kimi.com, and the international platform recommended to use platform.kimi.ai. The account and key on the two platforms are completely independent and cannot be mixed. If you use the wrong platform, you will receive a 401 invalid_authentication_error error. If you receive a 401 error, please first check if you are using the wrong platform key.

Domestic open platform base_url: https://api.moonshot.cn/v1
International open platform base_url: https://api.moonshot.ai/v1

​Why Does the Kimi Large Language Model Repeatedly Call the Same Tool When Using tool_calls?

​Why are the results from the API different from those from the Kimi large language model?

​Does the Kimi API have the “web surfing” feature of the Kimi large language model?

​The content returned by the Kimi API is incomplete or truncated

​What is the output length of the Kimi large language model?

​How many Chinese characters does the Kimi large language model support?

​Inaccurate file content extraction or inability to recognize images

​When using the files interface, I want to reference file content using file_id

​Error content_filter: The request was rejected because it was considered high risk

​Connection-related errors

​The TPM and RPM limits shown in the error message do not match my account Tier level

​model_not_found Error

​Numerical Calculation Errors in the Kimi Large Language Model

​The Kimi Large Language Model Cannot Answer Today’s Date

​How to Handle Errors Without Using an SDK

​Why Do Some Requests Respond Quickly While Others Respond Slowly When the Prompt Is Similar?

​I Set max_completion_tokens=2000 to Have Kimi Output 2000 Characters, but the Output Is Less Than 2000 Characters

​I Made Only One Request in a Minute, but Triggered the Your account reached max request Error

​To Facilitate Transmission, I Used base64 Encoding for My Text Content

​Why Can’t I Use the Key Applied on the platform.kimi.com Platform on the platform.kimi.ai Platform?