Create Completions
POST
/completionscURL Example
curl 'https://api.hpc-ai.com/inference/v1/completions' \
-H 'content-type: application/json' \
-H 'Authorization: Bearer <your_token_here>' \
--data '
{
"model": "minimax/minimax-m2.5",
"prompt": "Explain the concept of a polymer in simple terms.",
"max_tokens": 100,
"temperature": 0.7,
"top_k": 50,
"top_p": 0.9,
"n": 1,
"stop": null,
"response_format": { "type": "json_object" },
"seed": 42
}'
Authorization
The AccessToken must be included in the request as a header when making REST API requests, along with the Content-Type header. You can use the following format for authorization:
--header 'Authorization: Bearer <your_token_here>'
--header 'Content-Type: application/json'
Note: Replace your_token_here with your actual AccessToken. It contains information that allows the server to verify your identity and permissions.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | The name of the model to use for generating responses |
prompt | string, string[] | Yes | The prompt to use for generating responses, it can be a single string or an array of strings |
stream | boolean, null | No | Whether to stream the response back as it's generated (default: false) |
max_tokens | integer, null | No | The maximum number of tokens to generate in the response |
temperature | number, null | No | The sampling temperature to use when generating response |
top_k | integer, null | No | The number of highest probability tokens to keep for top-k sampling when generating response, helping to control the randomness of the output. |
top_p | number, null | No | The nucleus sampling probability to use when generating response, helping balance randomness and coherence in the generated response. |
stop | string, string[] | No | One or more sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. |
presence_penalty | number, null | No | A penalty applied to tokens based on whether they appear in the conversation history, helping to reduce repetition. |
frequency_penalty | number, null | No | A penalty applied to tokens based on their frequency in the conversation history, helping to reduce common tokens. |
response_format | object | No | The format of the response to generate. |
n | integer | No | The number of response choices to generate for each input message. |
seed | integer, null | No | The random seed to use for generating responses, ensuring reproducibility. |
Response Format Configuration
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | The type of response format, json_object, json_schema, grammar, text |
schema | string, object | No | The schema field defines the expected structure of the model’s output. It guides the model to generate responses that follow a specified JSON format. |
grammar | string, null | No | The grammar field defines a set of rules that constrain the model’s output format. It ensures the generated text follows a specified grammar or structured pattern. |
json_schema | string, object | No | The json_schema field defines a strict JSON Schema that the model’s output must follow. It ensures the generated response matches the specified structure and data types. |
Example Request
{
"model": "minimax/minimax-m2.5",
"prompt": "Hello, how are you?"
"stream": false,
"max_tokens": 100,
"temperature": 0.7,
"top_k": 50,
"top_p": 0.9,
"n": 1,
"stop": null,
"frequency_penalty": 0,
"presence_penalty": 0,
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "response_schema",
"schema": {
"type": "object",
"properties": {
"response": {
"type": "string",
"description": "The assistant response to the user message."
}
},
"required": ["response"],
"additionalProperties": false
}
}
},
"seed": 42
}
Example Response
Success Response
| Field | Type | Description |
|---|---|---|
id | string | The unique identifier for the chat completion response |
created | integer | The timestamp (in seconds since the Unix epoch) when the chat completion was created |
model | string | The name of the model used to generate the response |
choices | array | An array of response choices generated by the model. Each choice includes the generated message, the reason for finishing, and the index of the choice. |
object | string | The type of object returned, which is "completion" for this endpoint |
usage | object | An object containing information about token usage for the request and response, including the number of tokens in the prompt, the number of tokens in the completion, and the total number of tokens used. |
metadata | object | An object containing additional metadata about the response, such as the weight version used for the model. |
- 200
- 400
- 401
- 404
- 429
- 500
- 503
{
"id": "e63095aef9bc4d7292b769edb2cb6583",
"object": "chat.completion",
"created": 1773651537,
"model": "minimax/minimax-m2.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hi there! I'm doing well, thank you for asking. How about you? How's your day going so far? Is there anything I can help you with today?",
"reasoning_content": null,
"tool_calls": null
},
"logprobs": null,
"finish_reason": "stop",
"matched_stop": 248046
}
],
"usage": {
"prompt_tokens": 15,
"total_tokens": 692,
"completion_tokens": 677,
"prompt_tokens_details": null,
"reasoning_tokens": 0
},
"metadata": {
"weight_version": "default"
}
}
{
"error": {
"message": "temperature must be between 0 and 2",
"type": "error",
"code": "invalid_request"
}
}
{
"error": {
"code": "invalid_api_key",
"message": "Invalid token provided.",
"param": null,
"type": "invalid_request_error"
}
}
{
"error": {
"message": "Model not found",
"type": "error",
"code": "model_not_found"
}
}
{
"error": {
"code": "rate_limit_exceeded",
"details": {
"limit_rpm": 2000,
"limit_tpm": 20000,
"retry_after_ms": 21472
},
"message": "Rate limit exceeded. Please try again later.",
"type": "rate_limit_error"
}
}
{
"error": {
"message": "Service temporarily unavailable, please try again later",
"type": "error",
"param": null,
"code": "internal_error"
}
}
{
"error": {
"message": "Capacity exceeded, please try again later",
"type": "error",
"param": null,
"code": "capacity_exceeded"
}
}