Files
wiki/llama-cpp-logits/README.md

152 lines
3.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## `docker-compose.yml`: llama.cpp CUDA server
```yml
services:
gemma:
image: ghcr.io/ggerganov/llama.cpp:server-cuda
ports:
- "8080:8080"
volumes:
- llama-cache:/root/.cache/llama.cpp
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
command: >
--host 0.0.0.0
--hf-repo bartowski/gemma-2-9b-it-GGUF
--hf-file gemma-2-9b-it-IQ4_XS.gguf
--gpu-layers 99
--main-gpu 0
volumes:
llama-cache:
```
## http://localhost:8080 브라우저 접속
- llama.cpp에서 minimal Web UI를 제공한다.
- 첫 실행시에는 `llama-cache` 볼륨에 모델을 다운로드 받기 위해 시간이 조금 소요될 수 있다.
## cURL test
```bash
sudo apt-get install -y curl jq
curl -s \
--request POST \
--url http://localhost:8080/completion \
--header "Content-Type: application/json" \
--data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}' \
| jq
# {"content":"\n\n**1. Define Your Purpose:**\n\n* What do you want to achieve with your website?", ... }
```
## Python Client
```python
import requests
response = requests.post(
'http://localhost:8080/completion',
json={"prompt": "Building a website can be done in 10 simple steps:",
"n_predict": 128}
).json()
print(response['content'])
````
```
**1. Define Your Purpose:**
* What do you want to achieve with your website? (e.g., sell products, share information, build a community)
* Who is your target audience?
**2. Choose a Domain Name:**
...
```
## Single token prediction with probs
```sh
pip install requests polars
```
```python
import requests
import polars as pl
pl.Config.set_tbl_rows(40)
prompt = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: 한국의 수도는 어디인가요? 아래 선택지 중 골라주세요.
(A) 경성
(B) 부산
(C) 평양
(D) 서울
(E) 전주
Assistant:
정답은 ("""
response = requests.post(
'http://localhost:8080/completion',
json={"prompt": prompt,
"temperature": -1,
"n_predict": 1,
"n_probs":40
}
).json()
df = pl.DataFrame(response['completion_probabilities'][0]['probs'])
print(df)
```
```
shape: (40, 2)
┌─────────┬──────────┐
│ tok_str ┆ prob │
│ --- ┆ --- │
│ str ┆ f64 │
╞═════════╪══════════╡
│ D ┆ 0.996538 │
│ ** ┆ 0.002495 │
│ ** ┆ 0.000539 │
│ C ┆ 0.000053 │
│ B ┆ 0.000047 │
│ A ┆ 0.000032 │
│ D ┆ 0.000029 │
│ **( ┆ 0.000024 │
│ d ┆ 0.000024 │
│ **) ┆ 0.00002 │
│ E ┆ 0.000017 │
│ Seoul ┆ 0.000015 │
│ ㄷ ┆ 0.000008 │
│ ㄹ ┆ 0.000007 │
│ 주 ┆ 0.000006 │
│ Д ┆ 0.000004 │
│ **, ┆ 0.000004 │
│ 답 ┆ 0.000004 │
│ 디 ┆ 0.000004 │
│ 도 ┆ 0.000004 │
│ ㅁ ┆ 0.000003 │
┆ 0.000003 │
│ Answer ┆ 0.000003 │
│ 가 ┆ 0.000003 │
│ ) ┆ 0.000003 │
│ ④ ┆ 0.000003 │
│ )** ┆ 0.000003 │
│ ד ┆ 0.000003 │
│ **** ┆ 0.000003 │
│ ㄱ ┆ 0.000002 │
│ 다 ┆ 0.000002 │
│ 을 ┆ 0.000002 │
│ ㅇ ┆ 0.000002 │
│ 유 ┆ 0.000002 │
│ Korean ┆ 0.000002 │
│ 4 ┆ 0.000002 │
│ G ┆ 0.000002 │
│ 이 ┆ 0.000001 │
│ *** ┆ 0.000001 │
│ 하 ┆ 0.000001 │
└─────────┴──────────┘
```