🎉 New kimi k2.5 Multi-modal Model released! Now supports multimodal understanding and processing.
Docs
Product Pricing
Chat Pricing

Model Inference Pricing Explanation

Concepts

Billing Unit

Token: A token represents a common sequence of characters. The number of tokens used for each English character may vary. For example, a single character like "antidisestablishmentarianism" might be broken down into several tokens, while a short and common phrase like "word" might use just one token.Generally speaking, for a typical English text, 1 token is roughly equivalent to 3-4 English characters. The exact number of tokens generated by each call can be obtained through the Token Calculation API.

Billing Logic

Chat Completion API charges: We bill both the Input and Output based on usage. If you upload and extract content from a document and then pass the extracted content as Input to the model, the document content will also be billed based on usage.File-related interfaces (file content extraction/file storage) are temporarily free. In other words, if you only upload and extract a document, this API itself will not incur any charges.

Product Pricing

Explanation:The prices listed below are all inclusive of tax.

Multi-modal Model kimi-k2.5

  • kimi-k2.5 is Kimi's most versatile model to date, featuring a native multimodal architecture that supports both visual and text input, thinking and non-thinking modes, and dialogue and agent tasks.
  • Context length 256k, supports long thinking and deep reasoning.
  • Supports automatic context caching functionality, ToolCalls, JSON Mode, Partial Mode, and internet search functionality.

Generation Model kimi-k2

  • kimi-k2 is a Mixture-of-Experts (MoE) foundation model with exceptional coding and agent capabilities, featuring 1 trillion total parameters and 32 billion activated parameters. In benchmark evaluations covering general knowledge reasoning, programming, mathematics, and agent-related tasks, the K2 model outperforms other leading open-source models
  • kimi-k2-0905-preview: Context length 256k. Based on kimi-k2-0711-preview, with enhanced agentic coding abilities, improved frontend code quality and practicality, and better context understanding
  • kimi-k2-turbo-preview: Context length 256k. High-speed version of kimi-k2, always aligned with the latest kimi-k2 (kimi-k2-0905-preview). Same model parameters as kimi-k2, output speed up to 60 tokens/sec (max 100 tokens/sec)
  • kimi-k2-0711-preview: Context length 128k
  • kimi-k2-thinking: Context length 256k. A thinking model with general agentic and reasoning capabilities, specializing in deep reasoning tasks Usage Notes
  • kimi-k2-thinking-turbo: Context length 256k. High-speed version of kimi-k2-thinking, suitable for scenarios requiring both deep reasoning and extremely fast responses
  • Supports ToolCalls, JSON Mode, Partial Mode, and internet search functionality
  • Does not support vision functionality
  • Supports automatic context caching functionality. Cached tokens are charged at the input price (cache hit) rate. You can view "context caching" type cost details in the console

Generation Model Moonshot-v1

Here, 1M = 1,000,000. The prices in the table represent the cost per 1M tokens consumed.