Skip to main content

Quick Inference via the CLI

You can access inference quickly using the naptha inference CLI command:

# Format: naptha inference "<prompt>" -m "<model_name>"
naptha inference "How can we create scaling laws for multi-agent systems?" -m "hermes3:8b"

Accessing Inference in Modules via the Naptha SDK

Configuring Inference

Inference is configured using the LLMConfig class:

#naptha_sdk/schemas.py
class LLMConfig(BaseModel):
config_name: Optional[str] = "llm_config"
client: Optional[LLMClientType] = None
model: Optional[str] = None
max_tokens: Optional[int] = None
temperature: Optional[float] = None
api_base: Optional[str] = None

Or in the llm_configs.json file in the configs folder of the module:

[
{
"config_name": "open",
"client": "ollama",
"model": "hermes3:8b",
"temperature": 0.7,
"max_tokens": 1000,
"api_base": "http://localhost:11434"
},
{
"config_name": "closed",
"client": "openai",
"model": "gpt-4o-mini",
"temperature": 0.7,
"max_tokens": 1000,
"api_base": "https://api.openai.com/v1"
}
]

Running Inference

Naptha Modules can import the InferenceClient class from the naptha_sdk.inference module to interact with the inference provider.

import asyncio
from naptha_sdk.schemas import NodeConfigUser
from naptha_sdk.inference import InferenceClient

# Configure your node connection
node = NodeConfigUser(ip="node2.naptha.ai", http_port=None, server_type="https")
inference_client = InferenceClient(node)

# Prepare your messages
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]

# Run inference
response = asyncio.run(inference_client.run_inference({
"model": llm_config.model, # The model to use for inference
"messages": messages,
"temperature": llm_config.temperature,
"max_tokens": llm_config.max_tokens
}))

# Extract the response
content = response.choices[0].message.content
print("Output: ", content)

Use the API directly

You can also write your own code to interact with the inference API. See the FastAPI docs for the inference API. If you prefer, you can also interact with the LiteLLM API directly.

Need Help?