docker-compose.yml: llama.cpp CUDA server
services:
gemma:
image: ghcr.io/ggerganov/llama.cpp:server-cuda
ports:
- "8080:8080"
volumes:
- llama-cache:/root/.cache/llama.cpp
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
command: >
--host 0.0.0.0
--hf-repo bartowski/gemma-2-9b-it-GGUF
--hf-file gemma-2-9b-it-IQ4_XS.gguf
--gpu-layers 99
--temp 0
volumes:
llama-cache:
http://localhost:8080 브라우저 접속
- llama.cpp에서 minimal Web UI를 제공한다.
- 첫 실행시에는
llama-cache볼륨에 모델을 다운로드 받기 위해 시간이 조금 소요될 수 있다.
cURL test
sudo apt-get install -y curl jq
curl -s \
--request POST \
--url http://localhost:8080/completion \
--header "Content-Type: application/json" \
--data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}' \
| jq
# {"content":"\n\n**1. Define Your Purpose:**\n\n* What do you want to achieve with your website?", ... }
Python Client
import requests
response = requests.post(
'http://localhost:8080/completion',
json={"prompt": "Building a website can be done in 10 simple steps:",
"n_predict": 128}
).json()
print(response['content'])
**1. Define Your Purpose:**
* What do you want to achieve with your website? (e.g., sell products, share information, build a community)
* Who is your target audience?
**2. Choose a Domain Name:**
...
Single token prediction with probs
pip install requests polars
import requests
import polars as pl
pl.Config.set_tbl_rows(40)
prompt = """A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: 한국의 수도는 어디인가요? 아래 선택지 중 골라주세요.
(A) 경성
(B) 부산
(C) 평양
(D) 서울
(E) 전주
Assistant:
정답은 ("""
response = requests.post(
'http://localhost:8080/completion',
json={"prompt": prompt,
"temperature": -1,
"n_predict": 1,
"n_probs":40
}
).json()
df = pl.DataFrame(response['completion_probabilities'][0]['probs'])
print(df)
shape: (40, 2)
┌─────────┬──────────┐
│ tok_str ┆ prob │
│ --- ┆ --- │
│ str ┆ f64 │
╞═════════╪══════════╡
│ D ┆ 0.996538 │
│ ** ┆ 0.002495 │
│ ** ┆ 0.000539 │
│ C ┆ 0.000053 │
│ B ┆ 0.000047 │
│ A ┆ 0.000032 │
│ D ┆ 0.000029 │
│ **( ┆ 0.000024 │
│ d ┆ 0.000024 │
│ **) ┆ 0.00002 │
│ E ┆ 0.000017 │
│ Seoul ┆ 0.000015 │
│ ㄷ ┆ 0.000008 │
│ ㄹ ┆ 0.000007 │
│ 주 ┆ 0.000006 │
│ Д ┆ 0.000004 │
│ **, ┆ 0.000004 │
│ 답 ┆ 0.000004 │
│ 디 ┆ 0.000004 │
│ 도 ┆ 0.000004 │
│ ㅁ ┆ 0.000003 │
│ D ┆ 0.000003 │
│ Answer ┆ 0.000003 │
│ 가 ┆ 0.000003 │
│ ) ┆ 0.000003 │
│ ④ ┆ 0.000003 │
│ )** ┆ 0.000003 │
│ ד ┆ 0.000003 │
│ **** ┆ 0.000003 │
│ ㄱ ┆ 0.000002 │
│ 다 ┆ 0.000002 │
│ 을 ┆ 0.000002 │
│ ㅇ ┆ 0.000002 │
│ 유 ┆ 0.000002 │
│ Korean ┆ 0.000002 │
│ 4 ┆ 0.000002 │
│ G ┆ 0.000002 │
│ 이 ┆ 0.000001 │
│ *** ┆ 0.000001 │
│ 하 ┆ 0.000001 │
└─────────┴──────────┘