add llama.cpp server README.md
This commit is contained in:
144
llama-cpp-logits/README.md
Normal file
144
llama-cpp-logits/README.md
Normal file
@@ -0,0 +1,144 @@
|
||||
## `docker-compose.yml`: llama.cpp CUDA server
|
||||
```yml
|
||||
services:
|
||||
gemma:
|
||||
image: ghcr.io/ggerganov/llama.cpp:server-cuda
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- llama-cache:/root/.cache/llama.cpp
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- capabilities: [gpu]
|
||||
command: >
|
||||
--host 0.0.0.0
|
||||
--hf-repo bartowski/gemma-2-9b-it-GGUF
|
||||
--hf-file gemma-2-9b-it-IQ4_XS.gguf
|
||||
--gpu-layers 99
|
||||
--main-gpu 0
|
||||
volumes:
|
||||
llama-cache:
|
||||
```
|
||||
|
||||
## cURL test
|
||||
|
||||
```bash
|
||||
$ curl --request POST \
|
||||
--url http://localhost:8080/completion \
|
||||
--header "Content-Type: application/json" \
|
||||
--data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'
|
||||
```
|
||||
|
||||
```json
|
||||
{"content":"\n\n**1. Define Your Purpose:**\n\n* What do you want to achieve with your website? (e.g., sell products, share information, build a community)\n* Who is your target audience?\n\n**2. Choose a Domain Name:**\n\n* Select a memorable and relevant name that reflects your website's purpose.\n* Check availability and register your domain name.\n\n**3. Select a Web Hosting Provider:**\n\n* Choose a reliable hosting provider that offers the necessary resources (storage, bandwidth, etc.) for your website.\n* Consider factors like price, uptime, and customer support.\n\n**4.","id_slot":0, ... }
|
||||
```
|
||||
|
||||
## Python Client
|
||||
|
||||
```python
|
||||
import requests
|
||||
response = requests.post(
|
||||
'http://localhost:8080/completion',
|
||||
json={"prompt": "Building a website can be done in 10 simple steps:",
|
||||
"n_predict": 128}
|
||||
).json()
|
||||
|
||||
print(response['content'])
|
||||
````
|
||||
|
||||
```
|
||||
**1. Define Your Purpose:**
|
||||
|
||||
* What do you want to achieve with your website? (e.g., sell products, share information, build a community)
|
||||
* Who is your target audience?
|
||||
|
||||
**2. Choose a Domain Name:**
|
||||
|
||||
* Select a memorable and relevant name that reflects your website's purpose.
|
||||
* Check availability and register your domain name.
|
||||
|
||||
**3. Select a Web Hosting Provider:**
|
||||
|
||||
* Choose a reliable hosting provider that offers the necessary resources (storage, bandwidth, etc.) for your website.
|
||||
* Consider factors like price, uptime, and customer support.
|
||||
|
||||
**4.
|
||||
```
|
||||
|
||||
## Single token prediction with probs
|
||||
|
||||
```sh
|
||||
pip install requests polars
|
||||
```
|
||||
|
||||
```python
|
||||
import requests
|
||||
import polars as pl
|
||||
pl.Config.set_tbl_rows(40)
|
||||
response = requests.post(
|
||||
'http://localhost:8080/completion',
|
||||
json={"prompt": prompt,
|
||||
"temperature": -1,
|
||||
"n_predict": 1,
|
||||
"n_probs":40
|
||||
}
|
||||
).json()
|
||||
|
||||
# print(response['content'])
|
||||
|
||||
|
||||
df = pl.DataFrame(response['completion_probabilities'][0]['probs'])
|
||||
print(df)
|
||||
```
|
||||
|
||||
```
|
||||
shape: (40, 2)
|
||||
┌─────────┬──────────┐
|
||||
│ tok_str ┆ prob │
|
||||
│ --- ┆ --- │
|
||||
│ str ┆ f64 │
|
||||
╞═════════╪══════════╡
|
||||
│ D ┆ 0.996538 │
|
||||
│ ** ┆ 0.002495 │
|
||||
│ ** ┆ 0.000539 │
|
||||
│ C ┆ 0.000053 │
|
||||
│ B ┆ 0.000047 │
|
||||
│ A ┆ 0.000032 │
|
||||
│ D ┆ 0.000029 │
|
||||
│ **( ┆ 0.000024 │
|
||||
│ d ┆ 0.000024 │
|
||||
│ **) ┆ 0.00002 │
|
||||
│ E ┆ 0.000017 │
|
||||
│ Seoul ┆ 0.000015 │
|
||||
│ ㄷ ┆ 0.000008 │
|
||||
│ ㄹ ┆ 0.000007 │
|
||||
│ 주 ┆ 0.000006 │
|
||||
│ Д ┆ 0.000004 │
|
||||
│ **, ┆ 0.000004 │
|
||||
│ 답 ┆ 0.000004 │
|
||||
│ 디 ┆ 0.000004 │
|
||||
│ 도 ┆ 0.000004 │
|
||||
│ ㅁ ┆ 0.000003 │
|
||||
│ D ┆ 0.000003 │
|
||||
│ Answer ┆ 0.000003 │
|
||||
│ 가 ┆ 0.000003 │
|
||||
│ ) ┆ 0.000003 │
|
||||
│ ④ ┆ 0.000003 │
|
||||
│ )** ┆ 0.000003 │
|
||||
│ ד ┆ 0.000003 │
|
||||
│ **** ┆ 0.000003 │
|
||||
│ ㄱ ┆ 0.000002 │
|
||||
│ 다 ┆ 0.000002 │
|
||||
│ 을 ┆ 0.000002 │
|
||||
│ ㅇ ┆ 0.000002 │
|
||||
│ 유 ┆ 0.000002 │
|
||||
│ Korean ┆ 0.000002 │
|
||||
│ 4 ┆ 0.000002 │
|
||||
│ G ┆ 0.000002 │
|
||||
│ 이 ┆ 0.000001 │
|
||||
│ *** ┆ 0.000001 │
|
||||
│ 하 ┆ 0.000001 │
|
||||
└─────────┴──────────┘
|
||||
```
|
||||
Reference in New Issue
Block a user