Using OpenAI API
LLMBoost supports an OpenAI-compatible web endpoint, which is compatible with the OpenAI Chat Completion API. You can use OpenAI's client librarries or third-party libraries that expect an OpenAI-compatible endpoint. Below are some examples of how to utilize this compatibility.
Making a request​
You can make a request to our OpenAI API using curl. Here's an example:
curl localhost:8011/v1/chat/completions \
-X POST \
-d '{
"model": "meta-llama/Llama-3.1-8B-Instruct",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is deep learning?"
}
],
"stream": true,
"max_tokens": 1000
}' \
-H 'Content-Type: application/json'
Streaming​
You can also use OpenAI's integrated Python library to make streaming requests:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8011/v1",
api_key="-"
)
chat_completion = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is deep learning?"}
],
stream=True
)
for chunk in chat_completion:
print(chunk.choices[0].delta.content or "", end="", flush=True)
Synchronous​
You can also make synchronous requests if that is preferred for your application
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8011/v1",
api_key="-"
)
chat_completion = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant." },
{"role": "user", "content": "What is deep learning?"}
],
stream=False
)
print(chat_completion.choices[0].message.content)