Create Chat Completions

POST

/chat/completions

Generate responses from a large language model based on a conversation.

cURL Example

curl 'https://api.hpc-ai.com/inference/v1/chat/completions' \
  -H 'content-type: application/json' \
  -H 'Authorization: Bearer <your_token_here>' \
  --data '
  {
    "model": "minimax/minimax-m2.5",
    "messages": [{ "role": "user", "content": "Hello, how are you?" }],
    "max_tokens": 100,
    "temperature": 0.7,
    "top_k": 50,
    "top_p": 0.9,
    "n": 1,
    "stop": null,
    "response_format": { "type": "json_object" },
    "seed": 42
  }
'

Authorization

The AccessToken must be included in the request as a header when making REST API requests, along with the Content-Type header. You can use the following format for authorization:

--header 'Authorization: Bearer <your_token_here>'
--header 'Content-Type: application/json'

Note: Replace your_token_here with your actual AccessToken. It contains information that allows the server to verify your identity and permissions.

Request Body

Field	Type	Required	Description
`model`	string	Yes	The name of the model to use for generating responses
`messages`	array	Yes	An array of message objects representing the conversation history
`tools`	array	No	An array of tool objects that the model can use to generate responses
`tool_choice`	string	No	The strategy for choosing which tool to use when generating a response (e.g., "auto", "none", "required")
`stream`	boolean, null	No	Whether to stream the response back as it's generated (default: false)
`max_tokens`	integer, null	No	The maximum number of tokens to generate in the response
`temperature`	number, null	No	The sampling temperature to use when generating response
`top_k`	integer, null	No	The number of highest probability tokens to keep for top-k sampling when generating response, helping to control the randomness of the output. Required range: `0 <= x <= 100`
`top_p`	number, null	No	The nucleus sampling probability to use when generating response, helping balance randomness and coherence in the generated response.
`stop`	string, string[]	No	One or more sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
`presence_penalty`	number, null	No	A penalty applied to tokens based on whether they appear in the conversation history, helping to reduce repetition. Required range: `-2 <= x <= 2`
`frequency_penalty`	number, null	No	A penalty applied to tokens based on their frequency in the conversation history, helping to reduce common tokens. Required range: `-2 <= x <= 2`
`response_format`	object	No	The format of the response to generate.
`n`	integer	No	The number of response choices to generate for each input message.
`seed`	integer, null	No	The random seed to use for generating responses, ensuring reproducibility.

Message Configuration

Each object in the messages array should have the following structure:

Field	Type	Required	Description
`role`	string	Yes	The role of the message sender (e.g., "user", "assistant", "system")
`content`	string	Yes	The content of the message

Tool Configuration

Each object in the tools array should have the following structure:

Field	Type	Required	Description
`type`	string	Yes	The type of the tool, currently only "function" is supported
`function`	object	Yes	The function definition that the model can call, including the function name, description, and parameters

The function object should have the following structure:

Field	Type	Required	Description
`name`	string	Yes	The name of the function to be called
`description`	string, null	No	A description of what the function does, used by the model to choose when and how to call the function.
`parameters`	object	Yes	A JSON Schema object that defines the parameters that the function accepts. This should include a "type" field (which should be "object"), a "required" field, an array of strings indicating which parameters are required. a "properties" field that defines the individual parameters. Each parameter in the "properties" field should have its own type and description.
`strict`	boolean, null	No	Whether the model should strictly adhere to the function definition when calling it (default: false)

Example Json Schema for parameters:

{
  "type": "object",
  "required": ["param1", "param2"],
  "properties": {
    "param1": {
      "type": "string",
      "description": "The first parameter"
    },
    "param2": {
      "type": "integer",
      "description": "The second parameter"
    }
  }
}

Response Format Configuration

Field	Type	Required	Description
`type`	string	Yes	The type of response format, `json_object`, `json_schema`, `grammar`, `text`
`schema`	string, object	No	The schema field defines the expected structure of the model’s output. It guides the model to generate responses that follow a specified JSON format.
`grammar`	string, null	No	The grammar field defines a set of rules that constrain the model’s output format. It ensures the generated text follows a specified grammar or structured pattern.
`json_schema`	string, object	No	The json_schema field defines a strict JSON Schema that the model’s output must follow. It ensures the generated response matches the specified structure and data types.

Example Request

{
  "model": "minimax/minimax-m2.5",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "<string>",
        "description": "<string>",
        "parameters": {},
        "strict": true
      }
    }
  ],
  "tool_choice": "auto",
  "stream": false,
  "max_tokens": 100,
  "temperature": 0.7,
  "top_k": 50,
  "top_p": 0.9,
  "n": 1,
  "stop": null,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "response_schema",
      "schema": {
        "type": "object",
        "properties": {
          "response": {
            "type": "string",
            "description": "The assistant response to the user message."
          }
        },
        "required": ["response"],
        "additionalProperties": false
      }
    }
  },
  "seed": 42
}

Example Response

Success Response

Field	Type	Description
`id`	string	The unique identifier for the chat completion response
`created`	integer	The timestamp (in seconds since the Unix epoch) when the chat completion was created
`model`	string	The name of the model used to generate the response
`choices`	array	An array of response choices generated by the model. Each choice includes the generated message, the reason for finishing, and the index of the choice.
`object`	string	The type of object returned, which is "chat.completion" for this endpoint
`usage`	object	An object containing information about token usage for the request and response, including the number of tokens in the prompt, the number of tokens in the completion, and the total number of tokens used.
`metadata`	object	An object containing additional metadata about the response, such as the weight version used for the model.

{
    "id": "e63095aef9bc4d7292b769edb2cb6583",
    "object": "chat.completion",
    "created": 1773651537,
    "model": "minimax/minimax-m2.5",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hi there! I'm doing well, thank you for asking. How about you? How's your day going so far? Is there anything I can help you with today?",
                "reasoning_content": null,
                "tool_calls": null
            },
            "logprobs": null,
            "finish_reason": "stop",
            "matched_stop": 248046
        }
    ],
    "usage": {
        "prompt_tokens": 15,
        "total_tokens": 692,
        "completion_tokens": 677,
        "prompt_tokens_details": null,
        "reasoning_tokens": 0
    },
    "metadata": {
        "weight_version": "default"
    }
}

{
  "error": {
      "message": "temperature must be between 0 and 2",
      "type": "error",
      "code": "invalid_request"
    }
}

{
  "error": {
      "code": "invalid_api_key",
      "message": "Invalid token provided.",
      "param": null,
      "type": "invalid_request_error"
    }
}

{
  "error": {
      "message": "Model not found",
      "type": "error",
      "code": "model_not_found"
    }
}

{
  "error": {
      "code": "rate_limit_exceeded",
      "details": {
          "limit_rpm": 2000,
          "limit_tpm": 20000,
          "retry_after_ms": 21472
      },
      "message": "Rate limit exceeded. Please try again later.",
      "type": "rate_limit_error"
    }
}

{
  "error": {
    "message": "Service temporarily unavailable, please try again later",
    "type": "error",
    "param": null,
    "code": "internal_error"
  }
}

{
  "error": {
    "message": "Capacity exceeded, please try again later",
    "type": "error",
    "param": null,
    "code": "capacity_exceeded"
  }
}

cURL Example​

Authorization​

Request Body​

Message Configuration​

Tool Configuration​

Response Format Configuration​

Example Request​

Example Response​

Success Response​