Text Models
Text models generate natural language from text input. You can use them for chatbots, writing assistants, summarization, classification, extraction, drafting, translation, and many other language tasks.
For most new integrations, we recommend using the OpenAI-compatible /chat/completions API. It supports single-turn prompts, multi-turn conversations, streaming, structured output, and advanced features such as tool calling.
What Text Models Can Do
Text models are general-purpose language models that can:
- Answer questions and follow instructions
- Continue or rewrite text in a specific style
- Summarize long passages or documents
- Extract structured information from unstructured text
- Generate code, documentation, or reports
- Support multi-turn conversational experiences
Different models may vary in speed, cost, context length, reasoning ability, and instruction-following quality. Choose a model based on your latency, quality, and budget requirements.
Basic Text Generation
The simplest workflow is to send a list of messages and receive one generated reply.
curl 'https://api.hpc-ai.com/inference/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <your_token_here>' \
--data '{
"model": "minimax/minimax-m2.5",
"messages": [
{
"role": "user",
"content": "Write a short product description for a wireless ergonomic keyboard."
}
],
"max_tokens": 200,
"temperature": 0.7
}'
The response will contain one or more choices. In most cases, you can read the generated text from:
choices[0].message.content
Message Roles
The messages array is the core input format for text generation.
system: Sets high-level behavior, style, or constraints for the assistantuser: Contains the end user's request or promptassistant: Includes previous model responses for multi-turn conversations
Example:
[
{
"role": "system",
"content": "You are a concise technical writer. Use clear headings and bullet points."
},
{
"role": "user",
"content": "Explain vector databases to a beginner."
}
]
Use the system message for durable instructions such as tone, formatting, safety boundaries, or task-specific behavior.
Common Usage Patterns
Single-Turn Prompting
Use a single user message when you only need one response and do not need to preserve conversation history.
{
"model": "minimax/minimax-m2.5",
"messages": [
{
"role": "user",
"content": "Summarize the following paragraph in one sentence: <your_text_here>"
}
]
}
Multi-Turn Conversation
For chat experiences, include previous user and assistant messages so the model can respond with context.
{
"model": "minimax/minimax-m2.5",
"messages": [
{ "role": "system", "content": "You are a helpful travel assistant." },
{ "role": "user", "content": "I'm visiting Tokyo for 3 days." },
{ "role": "assistant", "content": "Great choice. What kind of trip are you planning?" },
{ "role": "user", "content": "Food and photography." }
]
}
Streaming Output
Enable streaming when you want tokens to arrive incrementally. This improves perceived latency and is especially useful for long responses.
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.hpc-ai.com/inference/v1",
)
stream = client.chat.completions.create(
model="minimax/minimax-m2.5",
messages=[
{"role": "user", "content": "Write a step-by-step guide to planting tomatoes."}
],
max_tokens=600,
stream=True,
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Streaming is recommended for longer outputs to reduce timeout risk and provide a better user experience.
JSON Output
If your application needs machine-readable results, ask the model for structured output with response_format.
curl 'https://api.hpc-ai.com/inference/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <your_token_here>' \
--data '{
"model": "minimax/minimax-m2.5",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that only outputs valid JSON."
},
{
"role": "user",
"content": "Extract the company name, country, and industry from: OpenAI is an AI company based in the United States."
}
],
"response_format": {
"type": "json_object"
}
}'
This is useful for extraction, classification, and downstream automation workflows.
Prompting Tips
Prompt quality matters as much as parameter tuning. These practices usually improve output quality:
- Be explicit about the task, format, tone, and constraints
- Put durable instructions in the
systemmessage - Ask for a specific structure such as a list, table, or JSON object
- Provide examples if the task is format-sensitive
- Keep irrelevant context out of the prompt
- For long inputs, summarize or chunk content when possible
Example:
{
"role": "system",
"content": "You are a professional editor. Rewrite text to be clear, concise, and suitable for product documentation."
}
Choosing a Model
When selecting a text model, consider:
- Quality: Better instruction following and stronger reasoning usually improve output quality
- Latency: Smaller models are often faster
- Cost: Higher-capability models may cost more per token
- Context length: Longer context windows help with large documents and multi-turn chats
- Specialization: Some models are stronger for coding, reasoning, or multilingual tasks
If you are unsure where to start, begin with a general-purpose instruct model, then evaluate faster or more capable models based on real prompts from your application.
Best Practices
- Prefer
/chat/completionsfor new text-generation applications - Use streaming for long outputs
- Set
max_tokensintentionally instead of leaving output length unconstrained - Reserve context space for both input and output
- Validate structured output before using it in production systems
- Retry transient failures such as rate limits with backoff
- Log prompts and responses during development so you can debug quality issues
Common Issues
Output Is Truncated
This usually happens when:
max_tokensis too small- The client times out before generation finishes
- Non-streaming requests are used for very long outputs
Try increasing max_tokens, enabling streaming, and reviewing client timeout settings.
Output Is Repetitive or Low Quality
Try:
- Lowering
temperature - Adjusting
top_portop_k - Adding
frequency_penalty - Rewriting the prompt to be more specific
The Model Does Not Follow Format Instructions
For stricter formatting:
- Use a stronger
systeminstruction - Request JSON output with
response_format - Provide an explicit schema or example output