Using LLMBoost CLI

Standard API Integration

You can jump straight to Using the OpenAI API to integrate LLMBoost with any OpenAI-compatible client or tool.

note

Before we begin, make sure you are authenticated with the HuggingFace CLI. Follow HF Authentication Guide for more details.

Here are the features of llmboost CLI:

1. Simple chatbot (`llmboost chat`)

Run this command to have a simple chat session with LLMBoost.

llmboost chat --model_name meta-llama/Llama-3.1-8B-Instruct

2. Single server (`llmboost serve`)

This will run a single application server.

llmboost serve --model_name meta-llama/Llama-3.1-8B-Instruct

3. Multi model deployment (`llmboost deploy`)

LLMBoost supports the deployment of multiple models on a multi-GPU server. To use this feature, you'll need a configuration file that specifies the deployment details for each model. An example configuration file is shown below:

common:
  kv_cache_dtype: auto
  host: 127.0.0.1
  tp: 1

models:
  - model_path: meta-llama/Llama-3.1-8B-Instruct
    port: 8011
    dp: 2
  - model_path: microsoft/Phi-3-mini-4k-instruct
    port: 8012
    dp: 2

Finally, you can initiate the deployment by running :

llmboost deploy --config config.yaml

Check deployment status

You can check the LLMBoost instances status by running llmboost status, and will get similar output as below:

+------+------------------------+---------+
| Port |          Name          |  Status |
+------+------------------------+---------+
| 8011 | Llama-3.2-1B-Instruct  | running |
| 8012 | Llama-3.1-70B-Instruct | running |
+------+------------------------+---------+

Shutdown instance

You can run llmboost shutdown --port XXXX to delete a specific instance. Or, you can use llmboost shutdown --all to shutdown all instances in the current server.

4. Simple CLI client

Once you have an LLMBoost instance up and running, you can use llmboost client to connect to it.

llmboost client --port 8011

5. Python client

Once you have a model running in a specific IP/port, you can connect to it through our Python client API.

from llmboost.entrypoints.client import send_prompt

response = send_prompt(
    host="localhost", port=8011,
    model_path="meta-llama/Llama-3.1-8B-Instruct",
    role="user", user_input="What is the most famous landmark of Seattle?"
)

1. Simple chatbot (llmboost chat)​

2. Single server (llmboost serve)​

3. Multi model deployment (llmboost deploy)​

Check deployment status​

Shutdown instance​

4. Simple CLI client​

5. Python client​