Create Completions

POST

/completions

cURL Example

curl 'https://api.hpc-ai.com/inference/v1/completions' \
  -H 'content-type: application/json' \
  -H 'Authorization: Bearer <your_token_here>' \
  --data ' 
  {                                  
    "model": "minimax/minimax-m2.5",
    "prompt": "Explain the concept of a polymer in simple terms.",
    "max_tokens": 100,
    "temperature": 0.7,
    "top_k": 50,
    "top_p": 0.9,
    "n": 1,
    "stop": null,
    "response_format": { "type": "json_object" },
    "seed": 42
  }' 

Authorization

The AccessToken must be included in the request as a header when making REST API requests, along with the Content-Type header. You can use the following format for authorization:

--header 'Authorization: Bearer <your_token_here>'
--header 'Content-Type: application/json'

Note: Replace your_token_here with your actual AccessToken. It contains information that allows the server to verify your identity and permissions.

Request Body

Field	Type	Required	Description
`model`	string	Yes	The name of the model to use for generating responses
`prompt`	string, string[]	Yes	The prompt to use for generating responses, it can be a single string or an array of strings
`stream`	boolean, null	No	Whether to stream the response back as it's generated (default: false)
`max_tokens`	integer, null	No	The maximum number of tokens to generate in the response
`temperature`	number, null	No	The sampling temperature to use when generating response
`top_k`	integer, null	No	The number of highest probability tokens to keep for top-k sampling when generating response, helping to control the randomness of the output.
`top_p`	number, null	No	The nucleus sampling probability to use when generating response, helping balance randomness and coherence in the generated response.
`stop`	string, string[]	No	One or more sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
`presence_penalty`	number, null	No	A penalty applied to tokens based on whether they appear in the conversation history, helping to reduce repetition.
`frequency_penalty`	number, null	No	A penalty applied to tokens based on their frequency in the conversation history, helping to reduce common tokens.
`response_format`	object	No	The format of the response to generate.
`n`	integer	No	The number of response choices to generate for each input message.
`seed`	integer, null	No	The random seed to use for generating responses, ensuring reproducibility.

Response Format Configuration

Field	Type	Required	Description
`type`	string	Yes	The type of response format, `json_object`, `json_schema`, `grammar`, `text`
`schema`	string, object	No	The schema field defines the expected structure of the model’s output. It guides the model to generate responses that follow a specified JSON format.
`grammar`	string, null	No	The grammar field defines a set of rules that constrain the model’s output format. It ensures the generated text follows a specified grammar or structured pattern.
`json_schema`	string, object	No	The `json_schema` field defines a strict JSON Schema that the model’s output must follow. It ensures the generated response matches the specified structure and data types.

Example Request

{
  "model": "minimax/minimax-m2.5",
  "prompt": "Hello, how are you?"
  "stream": false,
  "max_tokens": 100,
  "temperature": 0.7,
  "top_k": 50,
  "top_p": 0.9,
  "n": 1,
  "stop": null,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "response_schema",
      "schema": {
        "type": "object",
        "properties": {
          "response": {
            "type": "string",
            "description": "The assistant response to the user message."
          }
        },
        "required": ["response"],
        "additionalProperties": false
      }
    }
  },
  "seed": 42
}

Example Response

Success Response

Field	Type	Description
`id`	string	The unique identifier for the chat completion response
`created`	integer	The timestamp (in seconds since the Unix epoch) when the chat completion was created
`model`	string	The name of the model used to generate the response
`choices`	array	An array of response choices generated by the model. Each choice includes the generated message, the reason for finishing, and the index of the choice.
`object`	string	The type of object returned, which is "completion" for this endpoint
`usage`	object	An object containing information about token usage for the request and response, including the number of tokens in the prompt, the number of tokens in the completion, and the total number of tokens used.
`metadata`	object	An object containing additional metadata about the response, such as the weight version used for the model.

{
    "id": "e63095aef9bc4d7292b769edb2cb6583",
    "object": "chat.completion",
    "created": 1773651537,
    "model": "minimax/minimax-m2.5",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hi there! I'm doing well, thank you for asking. How about you? How's your day going so far? Is there anything I can help you with today?",
                "reasoning_content": null,
                "tool_calls": null
            },
            "logprobs": null,
            "finish_reason": "stop",
            "matched_stop": 248046
        }
    ],
    "usage": {
        "prompt_tokens": 15,
        "total_tokens": 692,
        "completion_tokens": 677,
        "prompt_tokens_details": null,
        "reasoning_tokens": 0
    },
    "metadata": {
        "weight_version": "default"
    }
}

{
  "error": {
      "message": "temperature must be between 0 and 2",
      "type": "error",
      "code": "invalid_request"
    }
}

{
  "error": {
      "code": "invalid_api_key",
      "message": "Invalid token provided.",
      "param": null,
      "type": "invalid_request_error"
    }
}

{
  "error": {
      "message": "Model not found",
      "type": "error",
      "code": "model_not_found"
    }
}

{
  "error": {
      "code": "rate_limit_exceeded",
      "details": {
          "limit_rpm": 2000,
          "limit_tpm": 20000,
          "retry_after_ms": 21472
      },
      "message": "Rate limit exceeded. Please try again later.",
      "type": "rate_limit_error"
    }
}

{
  "error": {
    "message": "Service temporarily unavailable, please try again later",
    "type": "error",
    "param": null,
    "code": "internal_error"
  }
}

{
  "error": {
    "message": "Capacity exceeded, please try again later",
    "type": "error",
    "param": null,
    "code": "capacity_exceeded"
  }
}

cURL Example​

Authorization​

Request Body​

Response Format Configuration​

Example Request​

Example Response​

Success Response​