Skip to main content

Using LLMBoost CLI

Standard API Integration

You can jump straight to Using the OpenAI API to integrate LLMBoost with any OpenAI-compatible client or tool.

note

Before we begin, make sure you are authenticated with the HuggingFace CLI. Follow HF Authentication Guide for more details.

Here are the features of llmboost CLI:

1. Simple chatbot (llmboost chat)​

Run this command to have a simple chat session with LLMBoost.

llmboost chat --model_name meta-llama/Llama-3.1-8B-Instruct

2. Single server (llmboost serve)​

This will run a single application server.

llmboost serve --model_name meta-llama/Llama-3.1-8B-Instruct

3. Multi model deployment (llmboost deploy)​

LLMBoost supports the deployment of multiple models on a multi-GPU server. To use this feature, you'll need a configuration file that specifies the deployment details for each model. An example configuration file is shown below:

common:
kv_cache_dtype: auto
host: 127.0.0.1
tp: 1

models:
- model_path: meta-llama/Llama-3.1-8B-Instruct
port: 8011
dp: 2
- model_path: microsoft/Phi-3-mini-4k-instruct
port: 8012
dp: 2

Finally, you can initiate the deployment by running :

llmboost deploy --config config.yaml

Check deployment status​

You can check the LLMBoost instances status by running llmboost status, and will get similar output as below:

+------+------------------------+---------+
| Port | Name | Status |
+------+------------------------+---------+
| 8011 | Llama-3.2-1B-Instruct | running |
| 8012 | Llama-3.1-70B-Instruct | running |
+------+------------------------+---------+

Shutdown instance​

You can run llmboost shutdown --port XXXX to delete a specific instance. Or, you can use llmboost shutdown --all to shutdown all instances in the current server.

4. Simple CLI client​

Once you have an LLMBoost instance up and running, you can use llmboost client to connect to it.

llmboost client --port 8011

5. Python client​

Once you have a model running in a specific IP/port, you can connect to it through our Python client API.

from llmboost.entrypoints.client import send_prompt

response = send_prompt(
host="localhost", port=8011,
model_path="meta-llama/Llama-3.1-8B-Instruct",
role="user", user_input="What is the most famous landmark of Seattle?"
)