Text Models

Text models generate natural language from text input. You can use them for chatbots, writing assistants, summarization, classification, extraction, drafting, translation, and many other language tasks.

For most new integrations, we recommend using the OpenAI-compatible /chat/completions API. It supports single-turn prompts, multi-turn conversations, streaming, structured output, and advanced features such as tool calling.

What Text Models Can Do

Text models are general-purpose language models that can:

Answer questions and follow instructions
Continue or rewrite text in a specific style
Summarize long passages or documents
Extract structured information from unstructured text
Generate code, documentation, or reports
Support multi-turn conversational experiences

Different models may vary in speed, cost, context length, reasoning ability, and instruction-following quality. Choose a model based on your latency, quality, and budget requirements.

Basic Text Generation

The simplest workflow is to send a list of messages and receive one generated reply.

curl 'https://api.hpc-ai.com/inference/v1/chat/completions' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <your_token_here>' \
  --data '{
    "model": "minimax/minimax-m2.5",
    "messages": [
      {
        "role": "user",
        "content": "Write a short product description for a wireless ergonomic keyboard."
      }
    ],
    "max_tokens": 200,
    "temperature": 0.7
  }'

The response will contain one or more choices. In most cases, you can read the generated text from:

choices[0].message.content

Message Roles

The messages array is the core input format for text generation.

system: Sets high-level behavior, style, or constraints for the assistant
user: Contains the end user's request or prompt
assistant: Includes previous model responses for multi-turn conversations

Example:

[
  {
    "role": "system",
    "content": "You are a concise technical writer. Use clear headings and bullet points."
  },
  {
    "role": "user",
    "content": "Explain vector databases to a beginner."
  }
]

Use the system message for durable instructions such as tone, formatting, safety boundaries, or task-specific behavior.

Common Usage Patterns

Single-Turn Prompting

Use a single user message when you only need one response and do not need to preserve conversation history.

{
  "model": "minimax/minimax-m2.5",
  "messages": [
    {
      "role": "user",
      "content": "Summarize the following paragraph in one sentence: <your_text_here>"
    }
  ]
}

Multi-Turn Conversation

For chat experiences, include previous user and assistant messages so the model can respond with context.

{
  "model": "minimax/minimax-m2.5",
  "messages": [
    { "role": "system", "content": "You are a helpful travel assistant." },
    { "role": "user", "content": "I'm visiting Tokyo for 3 days." },
    { "role": "assistant", "content": "Great choice. What kind of trip are you planning?" },
    { "role": "user", "content": "Food and photography." }
  ]
}

Streaming Output

Enable streaming when you want tokens to arrive incrementally. This improves perceived latency and is especially useful for long responses.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.hpc-ai.com/inference/v1",
)

stream = client.chat.completions.create(
    model="minimax/minimax-m2.5",
    messages=[
        {"role": "user", "content": "Write a step-by-step guide to planting tomatoes."}
    ],
    max_tokens=600,
    stream=True,
)

for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Streaming is recommended for longer outputs to reduce timeout risk and provide a better user experience.

JSON Output

If your application needs machine-readable results, ask the model for structured output with response_format.

curl 'https://api.hpc-ai.com/inference/v1/chat/completions' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <your_token_here>' \
  --data '{
    "model": "minimax/minimax-m2.5",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant that only outputs valid JSON."
      },
      {
        "role": "user",
        "content": "Extract the company name, country, and industry from: OpenAI is an AI company based in the United States."
      }
    ],
    "response_format": {
      "type": "json_object"
    }
  }'

This is useful for extraction, classification, and downstream automation workflows.

Prompting Tips

Prompt quality matters as much as parameter tuning. These practices usually improve output quality:

Be explicit about the task, format, tone, and constraints
Put durable instructions in the system message
Ask for a specific structure such as a list, table, or JSON object
Provide examples if the task is format-sensitive
Keep irrelevant context out of the prompt
For long inputs, summarize or chunk content when possible

Example:

{
  "role": "system",
  "content": "You are a professional editor. Rewrite text to be clear, concise, and suitable for product documentation."
}

Choosing a Model

When selecting a text model, consider:

Quality: Better instruction following and stronger reasoning usually improve output quality
Latency: Smaller models are often faster
Cost: Higher-capability models may cost more per token
Context length: Longer context windows help with large documents and multi-turn chats
Specialization: Some models are stronger for coding, reasoning, or multilingual tasks

If you are unsure where to start, begin with a general-purpose instruct model, then evaluate faster or more capable models based on real prompts from your application.

Best Practices

Prefer /chat/completions for new text-generation applications
Use streaming for long outputs
Set max_tokens intentionally instead of leaving output length unconstrained
Reserve context space for both input and output
Validate structured output before using it in production systems
Retry transient failures such as rate limits with backoff
Log prompts and responses during development so you can debug quality issues

Common Issues

Output Is Truncated

This usually happens when:

max_tokens is too small
The client times out before generation finishes
Non-streaming requests are used for very long outputs

Try increasing max_tokens, enabling streaming, and reviewing client timeout settings.

Output Is Repetitive or Low Quality

Try:

Lowering temperature
Adjusting top_p or top_k
Adding frequency_penalty
Rewriting the prompt to be more specific

The Model Does Not Follow Format Instructions

For stricter formatting:

Use a stronger system instruction
Request JSON output with response_format
Provide an explicit schema or example output

What Text Models Can Do​

Basic Text Generation​

Message Roles​

Common Usage Patterns​

Single-Turn Prompting​

Multi-Turn Conversation​

Streaming Output​

JSON Output​

Prompting Tips​

Choosing a Model​

Best Practices​

Common Issues​

Output Is Truncated​

Output Is Repetitive or Low Quality​

The Model Does Not Follow Format Instructions​

Related Guides​