Skip to main content

Create Chat Completions

POST
/chat/completions

Generate responses from a large language model based on a conversation.

cURL Example

curl 'https://api.hpc-ai.com/inference/v1/chat/completions' \
-H 'content-type: application/json' \
-H 'Authorization: Bearer <your_token_here>' \
--data '
{
"model": "minimax/minimax-m2.5",
"messages": [{ "role": "user", "content": "Hello, how are you?" }],
"max_tokens": 100,
"temperature": 0.7,
"top_k": 50,
"top_p": 0.9,
"n": 1,
"stop": null,
"response_format": { "type": "json_object" },
"seed": 42
}
'

Authorization

The AccessToken must be included in the request as a header when making REST API requests, along with the Content-Type header. You can use the following format for authorization:

--header 'Authorization: Bearer <your_token_here>'
--header 'Content-Type: application/json'

Note: Replace your_token_here with your actual AccessToken. It contains information that allows the server to verify your identity and permissions.

Request Body

FieldTypeRequiredDescription
modelstringYesThe name of the model to use for generating responses
messagesarrayYesAn array of message objects representing the conversation history
toolsarrayNoAn array of tool objects that the model can use to generate responses
tool_choicestringNoThe strategy for choosing which tool to use when generating a response (e.g., "auto", "none", "required")
streamboolean, nullNoWhether to stream the response back as it's generated (default: false)
max_tokensinteger, nullNoThe maximum number of tokens to generate in the response
temperaturenumber, nullNoThe sampling temperature to use when generating response
top_kinteger, nullNoThe number of highest probability tokens to keep for top-k sampling when generating response, helping to control the randomness of the output. Required range: 0 <= x <= 100
top_pnumber, nullNoThe nucleus sampling probability to use when generating response, helping balance randomness and coherence in the generated response.
stopstring, string[]NoOne or more sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
presence_penaltynumber, nullNoA penalty applied to tokens based on whether they appear in the conversation history, helping to reduce repetition. Required range: -2 <= x <= 2
frequency_penaltynumber, nullNoA penalty applied to tokens based on their frequency in the conversation history, helping to reduce common tokens. Required range: -2 <= x <= 2
response_formatobjectNoThe format of the response to generate.
nintegerNoThe number of response choices to generate for each input message.
seedinteger, nullNoThe random seed to use for generating responses, ensuring reproducibility.

Message Configuration

Each object in the messages array should have the following structure:

FieldTypeRequiredDescription
rolestringYesThe role of the message sender (e.g., "user", "assistant", "system")
contentstringYesThe content of the message

Tool Configuration

Each object in the tools array should have the following structure:

FieldTypeRequiredDescription
typestringYesThe type of the tool, currently only "function" is supported
functionobjectYesThe function definition that the model can call, including the function name, description, and parameters

The function object should have the following structure:

FieldTypeRequiredDescription
namestringYesThe name of the function to be called
descriptionstring, nullNoA description of what the function does, used by the model to choose when and how to call the function.
parametersobjectYesA JSON Schema object that defines the parameters that the function accepts. This should include a "type" field (which should be "object"), a "required" field, an array of strings indicating which parameters are required. a "properties" field that defines the individual parameters. Each parameter in the "properties" field should have its own type and description.
strictboolean, nullNoWhether the model should strictly adhere to the function definition when calling it (default: false)

Example Json Schema for parameters:

{
"type": "object",
"required": ["param1", "param2"],
"properties": {
"param1": {
"type": "string",
"description": "The first parameter"
},
"param2": {
"type": "integer",
"description": "The second parameter"
}
}
}

Response Format Configuration

FieldTypeRequiredDescription
typestringYesThe type of response format, json_object, json_schema, grammar, text
schemastring, objectNoThe schema field defines the expected structure of the model’s output. It guides the model to generate responses that follow a specified JSON format.
grammarstring, nullNoThe grammar field defines a set of rules that constrain the model’s output format. It ensures the generated text follows a specified grammar or structured pattern.
json_schemastring, objectNoThe json_schema field defines a strict JSON Schema that the model’s output must follow. It ensures the generated response matches the specified structure and data types.

Example Request

{
"model": "minimax/minimax-m2.5",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "<string>",
"description": "<string>",
"parameters": {},
"strict": true
}
}
],
"tool_choice": "auto",
"stream": false,
"max_tokens": 100,
"temperature": 0.7,
"top_k": 50,
"top_p": 0.9,
"n": 1,
"stop": null,
"frequency_penalty": 0,
"presence_penalty": 0,
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "response_schema",
"schema": {
"type": "object",
"properties": {
"response": {
"type": "string",
"description": "The assistant response to the user message."
}
},
"required": ["response"],
"additionalProperties": false
}
}
},
"seed": 42
}

Example Response

Success Response

FieldTypeDescription
idstringThe unique identifier for the chat completion response
createdintegerThe timestamp (in seconds since the Unix epoch) when the chat completion was created
modelstringThe name of the model used to generate the response
choicesarrayAn array of response choices generated by the model. Each choice includes the generated message, the reason for finishing, and the index of the choice.
objectstringThe type of object returned, which is "chat.completion" for this endpoint
usageobjectAn object containing information about token usage for the request and response, including the number of tokens in the prompt, the number of tokens in the completion, and the total number of tokens used.
metadataobjectAn object containing additional metadata about the response, such as the weight version used for the model.
{
"id": "e63095aef9bc4d7292b769edb2cb6583",
"object": "chat.completion",
"created": 1773651537,
"model": "minimax/minimax-m2.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hi there! I'm doing well, thank you for asking. How about you? How's your day going so far? Is there anything I can help you with today?",
"reasoning_content": null,
"tool_calls": null
},
"logprobs": null,
"finish_reason": "stop",
"matched_stop": 248046
}
],
"usage": {
"prompt_tokens": 15,
"total_tokens": 692,
"completion_tokens": 677,
"prompt_tokens_details": null,
"reasoning_tokens": 0
},
"metadata": {
"weight_version": "default"
}
}