🎉 New kimi k2.5 Multi-modal Model released! Now supports multimodal understanding and processing.
Docs
Getting Started Guide
Frequently Asked Questions and Solutions

Frequently Asked Questions and Solutions

Why are the results from the API different from those from the Kimi large language model?

The API and the Kimi large language model use the same underlying model. If you notice discrepancies in the output, you can try modifying the System Prompt. Additionally, the Kimi large language model includes tools like a calculator, which are not provided by default in the API. Users need to assemble these tools themselves.

Does the Kimi API have the "web surfing" feature of the Kimi large language model?

No. The Kimi API only provides the interaction functionality of the large language model itself and does not have additional "content search" or "web page browsing" features, which are commonly referred to as "internet search" capabilities.

Now, the Kimi API offers web search functionality. Please refer to our guide:

Using the Web Search Feature of the Kimi API

If you want to implement web search functionality through the Kimi API yourself, you can also refer to our tool_calls guide:

Using the Kimi API for Tool Calls

If you seek assistance from the open-source community, you can refer to the following open-source projects:

If you are looking for services provided by professional vendors, the following options are available:

The content returned by the Kimi API is incomplete or truncated

If you find that the content returned by the Kimi API is incomplete, truncated, or does not meet the expected length, you can first check the value of the choice.finish_reason field in the response. If this value is length, it means that the number of Tokens in the content generated by the current model exceeds the max_tokens parameter in the request. In this case, the Kimi API will only return content within the max_tokens limit, and any excess content will be discarded, resulting in the aforementioned "incomplete content" or "truncated content."

When encountering finish_reason=length, if you want the Kimi large language model to continue generating content from where it left off, you can use the Partial Mode provided by the Kimi API. For detailed documentation, please refer to:

Using the Partial Mode Feature of the Kimi API

To avoid finish_reason=length, we recommend increasing the value of max_tokens. Our best practice suggestion is: use the estimate-token-count (opens in a new tab) API to calculate the number of Tokens in the input content, then subtract this number from the maximum number of Tokens supported by the Kimi large language model (for example, for the moonshot-v1-32k model, the maximum is 32k Tokens). The resulting value should be used as the max_tokens value for the current request. The maximum value of max_tokens is 32k.

What is the output length of the Kimi large language model?

  • For the moonshot-v1-8k model, the maximum output length is 8*1024 - prompt_tokens;
  • For the moonshot-v1-32k model, the maximum output length is 32*1024 - prompt_tokens;
  • For the moonshot-v1-128k model, the maximum output length is 128*1024 - prompt_tokens;

How many Chinese characters does the Kimi large language model support?

  • The moonshot-v1-8k model supports approximately 15,000 Chinese characters;
  • The moonshot-v1-32k model supports approximately 60,000 Chinese characters;
  • The moonshot-v1-128k model supports approximately 200,000 Chinese characters;

Note: These are estimated values and actual results may vary.

Inaccurate file content extraction or inability to recognize images

We offer file upload and parsing services for various file formats. For text files, we extract the text content; for image files, we use OCR to recognize text in the images; for PDF documents, if the PDF contains only images, we use OCR to extract text from the images, otherwise we only extract the text content.;

Note that for images, we only use OCR to extract text content, so if your image does not contain any text, it will result in a parsing failure error.

For a complete list of supported file formats, please refer to:

File Interface (opens in a new tab)

When using the files interface, I want to reference file content using file_id

We currently do not support referencing file content using the file file_id.

Error content_filter: The request was rejected because it was considered high risk

The input to the Kimi API or the output from the Kimi large language model contains unsafe or sensitive content. Note: The content generated by the Kimi large language model may also contain unsafe or sensitive content, which can lead to the content_filter error.

Connection-related errors

If you frequently encounter errors such as Connection Error or Connection Time Out while using the Kimi API, please check the following in order:

  1. Whether your program code or the SDK you are using has a default timeout setting;
  2. Whether you are using any type of proxy server and check the network and timeout settings of the proxy server;

Another scenario that may lead to connection-related errors is when the number of Tokens generated by the Kimi large language model is too high and stream output stream=True is not enabled. This can cause the waiting time for the Kimi large language model to generate content to exceed the timeout settings of an intermediate gateway. Typically, some gateway applications determine whether a request is valid by detecting whether a status_code and header are received from the server. When not using stream output stream=True, the Kimi server will wait for the Kimi large language model to finish generating content before sending the header. While waiting for the header to return, some gateway applications may close connections that have been waiting for too long, resulting in connection-related errors.

We recommend enabling stream output stream=True to minimize connection-related errors.

The TPM and RPM limits shown in the error message do not match my account Tier level

If you encounter a rate_limit_reached_error while using the Kimi API, such as:

rate_limit_reached_error: Your account {uid}<{ak-id}> request reached TPM rate limit, current:{current_tpm}, limit:{max_tpm}

and the TPM or RPM limits in the error message do not match the TPM and RPM you see in the backend, please first check whether you are using the correct api_key for your account. In most cases, the reason for the mismatch between TPM and RPM and expectations is the use of an incorrect api_key, such as mistakenly using an api_key provided by another user, or mixing up api_keys when you have multiple accounts.

Make sure you have correctly set base_url=https://api.moonshot.ai in your SDK. The model_not_found error usually occurs because the base_url value is not set when using the OpenAI SDK. As a result, requests are sent to the OpenAI server, and OpenAI returns the model_not_found error.

Numerical Calculation Errors in the Kimi Large Language Model

Due to the uncertainty in the generation process of the Kimi large language model, it may produce calculation errors of varying degrees when performing numerical computations. We recommend using tool calls (tool_calls) to provide the Kimi large language model with calculator functionality. For more information on tool calls (tool_calls), you can refer to our guide on Using the Kimi API for Tool Calls (tool_calls).

The Kimi Large Language Model Cannot Answer Today's Date

The Kimi large language model cannot access highly time-sensitive information such as the current date. However, you can provide this information to the Kimi large language model through the system prompt. For example:

import os
from datetime import datetime
from openai import OpenAI
 
client = OpenAI(
    api_key=os.environ['MOONSHOT_API_KEY'],
    base_url="https://api.moonshot.ai/v1",
)
 
# We generate the current date using the datetime library and add it to the system prompt
system_prompt = f"""
You are Kimi, and today's date is {datetime.now().strftime('%d.%m.%Y %H:%M:%S')}
"""
 
completion = client.chat.completions.create(
    model="moonshot-v1-128k",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "What's today's date?"},
    ],
    temperature=0.3,
)
 
print(completion.choices[0].message.content)  # Output: Today's date is July 31, 2024.
 

How to Handle Errors Without Using an SDK

In some cases, you might need to directly interface with the Kimi API (instead of using the OpenAI SDK). When interfacing with the Kimi API directly, you need to determine the subsequent processing logic based on the status returned by the API. Typically, we use the HTTP status code 200 to indicate a successful request, while 4xx and 5xx status codes indicate a failed request. We provide error information in JSON format. For specific handling logic based on the request status, please refer to the following code snippets:

import os
import httpx
 
header = {
    "Authorization": f"Bearer {os.environ['MOONSHOT_API_KEY']}",
}
 
messages = [
    {"role": "system", "content": "You are Kimi"},
    {"role": "user", "content": "Hello."},
]
 
r = httpx.post("https://api.moonshot.ai/v1/chat/completions",
               headers=header,
               json={
                   "model": "moonshot-v1-128k",  # <-- If you use a correct model, the code will enter the if status_code==200 branch below
                   # "model": "moonshot-v1-129k",  # <-- If you use an incorrect model name, the code will enter the else branch below
                   "messages": messages,
                   "temperature": 0.3,
               })
 
if r.status_code == 200:  # When a correct model is used for the request, this branch is entered for normal processing
    completion = r.json()
    print(completion["choices"][0]["message"]["content"])
else:  # When an incorrect model name is used for the request, this branch is entered for error handling
    # Here, for demonstration purposes, we simply print the error.
    # In actual code logic, you might need more processing, such as logging the error, interrupting the request, or retrying.
    error = r.json()
    print(f"error: status={r.status_code}, type='{error['error']['type']}', message='{error['error']['message']}'")

Our error messages will follow this format:

{
	"error": {
		"type": "error_type",
		"message": "error_message"
	}
}

For a detailed list of error messages, please refer to the following section:

Error Description

Why Do Some Requests Respond Quickly While Others Respond Slowly When the Prompt Is Similar?

If you find that some requests respond quickly (e.g., in just 3 seconds) while others respond slowly (e.g., taking up to 20 seconds) with similar prompts, it is usually because the Kimi large language model generates a different number of tokens. Generally, the number of tokens generated by the Kimi large language model is directly proportional to the response time of the Kimi API; the more tokens generated, the longer the complete response time.

It is important to note that the number of tokens generated by the Kimi large language model only affects the response time for the complete request (i.e., after generating the last token). You can set stream=True and observe the time to first token (TTFT) return time. Under normal circumstances, when the length of the prompt is similar, the first token response time will not vary significantly.

I Set max_tokens=2000 to Have Kimi Output 2000 Characters, but the Output Is Less Than 2000 Characters

The max_tokens parameter means: When calling /v1/chat/completions, it specifies the maximum number of tokens the model is allowed to generate. When the number of tokens already generated by the model exceeds the set max_tokens, the model will stop generating the next token.

The purpose of max_tokens is:

  1. To help the caller determine which model to use (for example, when prompt_tokens + max_tokens ≤ 8 * 1024, you can choose the moonshot-v1-8k model);
  2. To prevent the Kimi model from generating excessive unexpected content in certain unexpected situations, which could lead to additional cost consumption (for example, the Kimi model repeatedly outputs blank characters).

max_tokens does not indicate how many tokens the Kimi large language model will output. In other words, max_tokens will not be used as part of the prompt input to the Kimi large language model. If you want the model to output a specific number of characters, you can refer to the following general solutions:

  • For occasions where the output content should be within 1000 characters:
    1. Specify the number of characters in the prompt to the Kimi large language model;
    2. Manually or programmatically check if the output character count meets expectations. If not, in the second round of conversation, indicate to the Kimi large language model that the "character count is too high" or "character count is too low" to generate a new round of content.
  • For occasions where the output content should be more than 1000 characters or even more:
    1. Try to break down the expected output content into several parts by structure or chapter and create a template, using placeholders to mark the positions where you want the Kimi large language model to output content;
    2. Have the Kimi large language model fill in each placeholder of the template one by one, and finally assemble the complete long text.

I Made Only One Request in a Minute, but Triggered the Your account reached max request Error

Typically, the SDK provided by OpenAI includes a retry mechanism:

Certain errors are automatically retried 2 times by default, with a short exponential backoff. Connection errors (for example, due to a network connectivity problem), 408 Request Timeout, 409 Conflict, 429 Rate Limit, and >=500 Internal errors are all retried by default.

This retry mechanism will automatically retry 2 times (a total of 3 requests) when encountering an error. Generally speaking, in cases of unstable network conditions or other situations that may cause request errors, using the OpenAI SDK can amplify a single request into 2 to 3 requests, all of which will count towards your RPM (requests per minute) limit.

Note: For users using the OpenAI SDK with a tier0 account level, due to the default retry mechanism, a single erroneous request can exhaust the entire RPM quota.

To Facilitate Transmission, I Used base64 Encoding for My Text Content

Please do not do this. Encoding your files with base64 will result in a huge consumption of tokens. If your file type is supported by our /v1/files file interface, you can simply upload the file and extract its content using the file interface.

For binary or other encoded file formats, the Kimi large language model currently cannot parse the content, so please do not add it to the context.

Why Can't I Use the Key Applied on the platform.moonshot.cn Platform on the platform.moonshot.ai Platform?

Kimi Open Platform officially provides two platforms, with the mainland China platform recommended to use platform.moonshot.cn, and the international platform recommended to use platform.moonshot.ai. The account and key on the two platforms are completely independent and cannot be mixed.

If you use the wrong platform, you will receive a 401 invalid_authentication_error error. If you receive a 401 error, please first check if you are using the wrong platform key.