Skip to main content

ServiceClient API Reference

Overview

The ServiceClient is the main entry point for interacting with the HPC-AI Fine-Tune SDK. It provides methods to discover server capabilities, create training clients, and manage REST API operations.

Base URL: https://www.hpc-ai.com/finetunesdk


Table of Contents


get_server_capabilities()

Description

Queries the API server to discover what models and capabilities are available. Returns a list of supported models that can be used for training or inference operations.

This is typically the first method called to check server availability and discover available models before creating training clients.

Signature

def get_server_capabilities() -> GetServerCapabilitiesResponse

Parameters

None

Returns

GetServerCapabilitiesResponse: A response object containing:

  • supported_models (List[SupportedModel]): A list of model information objects, where each SupportedModel contains:
    • model_name (Optional[str]): The name/identifier of the model

Example

import hpcai

BASE_URL = "https://www.hpc-ai.com/finetunesdk"
API_KEY = "your-api-key-here"

service_client = hpcai.ServiceClient(base_url=BASE_URL, api_key=API_KEY)
capabilities = service_client.get_server_capabilities()

print(f"Server supports {len(capabilities.supported_models)} models")
for model in capabilities.supported_models:
if model.model_name:
print(f" - {model.model_name}")

Additional Notes

  • This API supports an async version: get_server_capabilities_async()

create_lora_training_client()

Description

Creates a new LoRA-adapted model instance on the server and returns a TrainingClient that can be used to train the model.

The method performs the following operations:

  1. Validates the initialization input
  2. Creates a new model instance on the server with LoRA adapters applied
  3. Returns a TrainingClient bound to the newly created model

Signature

def create_lora_training_client(
base_model: str,
rank: int = 32,
seed: int | None = None,
train_mlp: bool = True,
train_attn: bool = True,
train_unembed: bool = True,
) -> TrainingClient

Parameters

ParameterTypeDefaultDescription
base_modelstrrequiredThe name or identifier of the base model to fine-tune. Must be a model supported by the server (check with get_server_capabilities()).
rankint32The rank (dimension) used in LoRA adapters. Higher ranks allow more capacity but increase trainable parameters. Common values range from 8 to 128. Typical values:
- Small models (< 1B): 8-16
- Medium models (1B-7B): 16-32
- Large models (> 7B): 32-64
seedint | NoneNoneOptional random seed for reproducible LoRA weight initialization. If provided, the LoRA adapters will be initialized deterministically.
train_mlpboolTrueWhether to apply LoRA adapters to MLP (Multi-Layer Perceptron) layers, including gate, up, and down projection layers.
train_attnboolTrueWhether to apply LoRA adapters to attention layers, including query (q_proj), key (k_proj), value (v_proj), and output (o_proj) projection matrices.
train_unembedboolTrueWhether to apply LoRA adapters to the unembedding layer (language modeling head, lm_head).

Returns

TrainingClient: A client instance bound to the newly created LoRA model.

Example

Basic usage with default LoRA configuration:

import hpcai

BASE_URL = "https://www.hpc-ai.com/finetunesdk"
API_KEY = "your-api-key-here"

service_client = hpcai.ServiceClient(base_url=BASE_URL, api_key=API_KEY)
training_client = service_client.create_lora_training_client(
base_model="Qwen/Qwen3-8B"
)

print(f"Created model ID {training_client.model_id}")

Custom LoRA configuration for attention-only fine-tuning:

training_client = service_client.create_lora_training_client(
base_model="Qwen/Qwen2.5-7B-Instruct",
rank=64,
seed=42,
train_mlp=False,
train_attn=True, # only fine tune on attention layers
train_unembed=False
)

Additional Notes

  • This API supports an async version: create_lora_training_client_async()

create_training_client_from_state()

Description

Creates a training session from a previously saved checkpoint, which allows you to resume training from a trained checkpoint. It automatically retrieves the training run metadata (base model, LoRA rank) from the checkpoint path, creates a new LoRA model instance with matching configuration, and loads the saved weights into it.

Signature

def create_training_client_from_state(path: str) -> TrainingClient

Parameters

ParameterTypeDescription
pathstrA URI path to the checkpoint. Must follow the format:
- Training checkpoint: "hpcai://{training_run_id}/weights/{checkpoint_id}"
- Sampler checkpoint: "hpcai://{training_run_id}/sampler_weights/{checkpoint_id}"

Returns

TrainingClient: A client instance bound to the restored model with checkpoint weights loaded.

Example

Resuming training from a saved checkpoint:

import hpcai

BASE_URL = "https://www.hpc-ai.com/finetunesdk"
API_KEY = "your-api-key-here"
CHECKPOINT_PATH = "hpcai://{model_id}/weights/step_0010"

service_client = hpcai.ServiceClient(base_url=BASE_URL, api_key=API_KEY)
training_client = service_client.create_training_client_from_state(CHECKPOINT_PATH)

print(f"Restored model ID: {training_client.model_id}")

# Continue training
training_data = [...] # Your training data
fwd_bwd = training_client.forward_backward(training_data, loss_fn="cross_entropy")
optim = training_client.optim_step(hpcai.AdamParams(learning_rate=1e-4))

fwd_bwd.result()
optim.result()

# Clean up
training_client.unload_model().result()

Getting checkpoint path from RestClient and restoring:

import hpcai

BASE_URL = "https://www.hpc-ai.com/finetunesdk"
API_KEY = "your-api-key-here"

service_client = hpcai.ServiceClient(base_url=BASE_URL, api_key=API_KEY)
rest_client = service_client.create_rest_client()

# List checkpoints for a training run
training_run_id = "eb78693b-380d-40f8-a709-ffe0da185718"
checkpoints_response = rest_client.list_checkpoints(training_run_id).result()

# Find the latest checkpoint
if checkpoints_response.checkpoints:
latest_checkpoint = checkpoints_response.checkpoints[-1]
checkpoint_path = latest_checkpoint.checkpoint_path
print(f"Restoring from: {checkpoint_path}")

# Restore training client
training_client = service_client.create_training_client_from_state(checkpoint_path)
print(f"Restored model ID: {training_client.model_id}")

Additional Notes

  • This API supports an async version: create_training_client_from_state_async()

create_training_client()

Description

Creates a TrainingClient instance for an existing or new model. This method creates a TrainingClient wrapper around a model identified by model_id. Use this method when you already have a model_id from a previous session.

Signature

def create_training_client(
model_id: types.ModelID | None = None
) -> TrainingClient

Parameters

ParameterTypeDefaultDescription
model_idtypes.ModelID | NoneNoneOptional unique identifier of an existing model on the server. If provided, the TrainingClient will be bound to this model. If None, the client is created without a bound model.

Returns

TrainingClient: A client instance that can be used for training operations. If model_id is provided, the client is bound to that model. If model_id is None, the client has no bound model and requires loading a checkpoint or creating a model before use.

Example

Creating a client for an existing model from a previous session:

import hpcai

BASE_URL = "https://www.hpc-ai.com/finetunesdk"
API_KEY = "your-api-key-here"

# You have a model_id from a previous training session
existing_model_id = "eb78693b-380d-40f8-a709-ffe0da185718"

service_client = hpcai.ServiceClient(base_url=BASE_URL, api_key=API_KEY)
training_client = service_client.create_training_client(model_id=existing_model_id)

print(f"Client bound to model: {training_client.model_id}")

# Use the client for training
training_data = [...] # Your training data
fwd_bwd = training_client.forward_backward(training_data, loss_fn="cross_entropy")
optim = training_client.optim_step(hpcai.AdamParams(learning_rate=1e-4))

fwd_bwd.result()
optim.result()

Creating a client without model_id, then loading a checkpoint:

import hpcai

BASE_URL = "https://www.hpc-ai.com/finetunesdk"
API_KEY = "your-api-key-here"
CHECKPOINT_PATH = "hpcai://eb78693b-380d-40f8-a709-ffe0da185718/weights/step_0010"

service_client = hpcai.ServiceClient(base_url=BASE_URL, api_key=API_KEY)

# Create client without model_id
training_client = service_client.create_training_client(model_id=None)

# Load checkpoint to bind the model
load_future = training_client.load_state(CHECKPOINT_PATH)
load_future.result()

print(f"Model loaded: {training_client.model_id}")

# Now you can use it for training
training_data = [...] # Your training data
result = training_client.forward_backward(training_data, loss_fn="cross_entropy")
result.result()

create_rest_client()

Description

Creates a RestClient instance for REST API operations. This method creates a RestClient that provides access to various REST endpoints for querying training runs, checkpoints, and other metadata.

Signature

def create_rest_client() -> RestClient

Parameters

None

Returns

RestClient: Client for REST API operations like listing checkpoints and metadata.

Example

import hpcai

BASE_URL = "https://www.hpc-ai.com/finetunesdk"
API_KEY = "your-api-key-here"
TRAINING_RUN_ID = "eb78693b-380d-40f8-a709-ffe0da185718"

service_client = hpcai.ServiceClient(base_url=BASE_URL, api_key=API_KEY)
rest_client = service_client.create_rest_client()

# Use RestClient methods
checkpoints_response = rest_client.list_checkpoints(TRAINING_RUN_ID).result()
training_run = rest_client.get_training_run(TRAINING_RUN_ID).result()