Skip to main content

Create Completions

POST
/completions

cURL Example

curl 'https://api.hpc-ai.com/inference/v1/completions' \
-H 'content-type: application/json' \
-H 'Authorization: Bearer <your_token_here>' \
--data '
{
"model": "minimax/minimax-m2.5",
"prompt": "Explain the concept of a polymer in simple terms.",
"max_tokens": 100,
"temperature": 0.7,
"top_k": 50,
"top_p": 0.9,
"n": 1,
"stop": null,
"response_format": { "type": "json_object" },
"seed": 42
}'

Authorization

The AccessToken must be included in the request as a header when making REST API requests, along with the Content-Type header. You can use the following format for authorization:

--header 'Authorization: Bearer <your_token_here>'
--header 'Content-Type: application/json'

Note: Replace your_token_here with your actual AccessToken. It contains information that allows the server to verify your identity and permissions.

Request Body

FieldTypeRequiredDescription
modelstringYesThe name of the model to use for generating responses
promptstring, string[]YesThe prompt to use for generating responses, it can be a single string or an array of strings
streamboolean, nullNoWhether to stream the response back as it's generated (default: false)
max_tokensinteger, nullNoThe maximum number of tokens to generate in the response
temperaturenumber, nullNoThe sampling temperature to use when generating response
top_kinteger, nullNoThe number of highest probability tokens to keep for top-k sampling when generating response, helping to control the randomness of the output.
top_pnumber, nullNoThe nucleus sampling probability to use when generating response, helping balance randomness and coherence in the generated response.
stopstring, string[]NoOne or more sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
presence_penaltynumber, nullNoA penalty applied to tokens based on whether they appear in the conversation history, helping to reduce repetition.
frequency_penaltynumber, nullNoA penalty applied to tokens based on their frequency in the conversation history, helping to reduce common tokens.
response_formatobjectNoThe format of the response to generate.
nintegerNoThe number of response choices to generate for each input message.
seedinteger, nullNoThe random seed to use for generating responses, ensuring reproducibility.

Response Format Configuration

FieldTypeRequiredDescription
typestringYesThe type of response format, json_object, json_schema, grammar, text
schemastring, objectNoThe schema field defines the expected structure of the model’s output. It guides the model to generate responses that follow a specified JSON format.
grammarstring, nullNoThe grammar field defines a set of rules that constrain the model’s output format. It ensures the generated text follows a specified grammar or structured pattern.
json_schemastring, objectNoThe json_schema field defines a strict JSON Schema that the model’s output must follow. It ensures the generated response matches the specified structure and data types.

Example Request

{
"model": "minimax/minimax-m2.5",
"prompt": "Hello, how are you?"
"stream": false,
"max_tokens": 100,
"temperature": 0.7,
"top_k": 50,
"top_p": 0.9,
"n": 1,
"stop": null,
"frequency_penalty": 0,
"presence_penalty": 0,
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "response_schema",
"schema": {
"type": "object",
"properties": {
"response": {
"type": "string",
"description": "The assistant response to the user message."
}
},
"required": ["response"],
"additionalProperties": false
}
}
},
"seed": 42
}

Example Response

Success Response

FieldTypeDescription
idstringThe unique identifier for the chat completion response
createdintegerThe timestamp (in seconds since the Unix epoch) when the chat completion was created
modelstringThe name of the model used to generate the response
choicesarrayAn array of response choices generated by the model. Each choice includes the generated message, the reason for finishing, and the index of the choice.
objectstringThe type of object returned, which is "completion" for this endpoint
usageobjectAn object containing information about token usage for the request and response, including the number of tokens in the prompt, the number of tokens in the completion, and the total number of tokens used.
metadataobjectAn object containing additional metadata about the response, such as the weight version used for the model.
{
"id": "e63095aef9bc4d7292b769edb2cb6583",
"object": "chat.completion",
"created": 1773651537,
"model": "minimax/minimax-m2.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hi there! I'm doing well, thank you for asking. How about you? How's your day going so far? Is there anything I can help you with today?",
"reasoning_content": null,
"tool_calls": null
},
"logprobs": null,
"finish_reason": "stop",
"matched_stop": 248046
}
],
"usage": {
"prompt_tokens": 15,
"total_tokens": 692,
"completion_tokens": 677,
"prompt_tokens_details": null,
"reasoning_tokens": 0
},
"metadata": {
"weight_version": "default"
}
}