Using LLMBoost CLI
You can jump straight to Using the OpenAI API to integrate LLMBoost with any OpenAI-compatible client or tool.
Before we begin, make sure you are authenticated with the HuggingFace CLI. Follow HF Authentication Guide for more details.
Here are the features of llmboost
CLI:
1. Simple chatbot (llmboost chat
)​
Run this command to have a simple chat session with LLMBoost.
llmboost chat --model_name meta-llama/Llama-3.1-8B-Instruct
2. Single server (llmboost serve
)​
This will run a single application server.
llmboost serve --model_name meta-llama/Llama-3.1-8B-Instruct
3. Multi model deployment (llmboost deploy
)​
LLMBoost supports the deployment of multiple models on a multi-GPU server. To use this feature, you'll need a configuration file that specifies the deployment details for each model. An example configuration file is shown below:
common:
kv_cache_dtype: auto
host: 127.0.0.1
tp: 1
models:
- model_path: meta-llama/Llama-3.1-8B-Instruct
port: 8011
dp: 2
- model_path: microsoft/Phi-3-mini-4k-instruct
port: 8012
dp: 2
Finally, you can initiate the deployment by running :
llmboost deploy --config config.yaml
Check deployment status​
You can check the LLMBoost instances status by running llmboost status
, and will get similar output as below:
+------+------------------------+---------+
| Port | Name | Status |
+------+------------------------+---------+
| 8011 | Llama-3.2-1B-Instruct | running |
| 8012 | Llama-3.1-70B-Instruct | running |
+------+------------------------+---------+
Shutdown instance​
You can run llmboost shutdown --port XXXX
to delete a specific instance. Or, you can use llmboost shutdown --all
to shutdown all instances in the current server.
4. Simple CLI client​
Once you have an LLMBoost instance up and running, you can use llmboost client
to connect to it.
llmboost client --port 8011
5. Python client​
Once you have a model running in a specific IP/port, you can connect to it through our Python client API.
from llmboost.entrypoints.client import send_prompt
response = send_prompt(
host="localhost", port=8011,
model_path="meta-llama/Llama-3.1-8B-Instruct",
role="user", user_input="What is the most famous landmark of Seattle?"
)