Replace the Ollama service with a custom ROCm image combining ghcr.io/ggml-org/llama.cpp:server-rocm and llama-swap v199. Main motivations: - Unblock qwen35 HF GGUFs (qwen35 architecture not supported in Ollama 0.20.4 for HF-imported models) - Stay current with llama.cpp upstream without waiting for Ollama releases Changes: - ollama/Dockerfile: build llama-swap on top of llama.cpp:server-rocm - ollama/llama-swap.yaml: define 4 models with full sampler config, GPU offload, and mmproj for the two multimodal HF fine-tunes - ollama/docker-compose.yml: replace Ollama image with local build; fix broken volume mount (was /ubuntu/.ollama, now explicit /models) - ollama/Caddyfile: update upstream port 11434→8080 (llama-swap default) - ai/docker-compose.yml: switch Open WebUI from OLLAMA_BASE_URL to OPENAI_API_BASE_URL pointing at llama-swap /v1 endpoint
45 lines
1.5 KiB
YAML
45 lines
1.5 KiB
YAML
healthCheckTimeout: 180
|
|
logLevel: info
|
|
|
|
models:
|
|
"qwen3.5:9b":
|
|
cmd: |
|
|
/app/llama-server
|
|
--host 0.0.0.0 --port ${PORT}
|
|
--model /models/qwen3.5-9b.gguf
|
|
--alias qwen3.5:9b
|
|
--n-gpu-layers 999
|
|
--ctx-size 8192
|
|
--temp 1 --top-k 20 --top-p 0.95 --presence-penalty 1.5
|
|
|
|
"qwen3.5:9bctxSmall":
|
|
cmd: |
|
|
/app/llama-server
|
|
--host 0.0.0.0 --port ${PORT}
|
|
--model /models/qwen3.5-9b.gguf
|
|
--alias qwen3.5:9bctxSmall
|
|
--n-gpu-layers 999
|
|
--ctx-size 131072
|
|
--temp 1 --top-k 20 --top-p 0.95 --presence-penalty 1.5
|
|
|
|
"hf.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive:q4_k_m":
|
|
cmd: |
|
|
/app/llama-server
|
|
--host 0.0.0.0 --port ${PORT}
|
|
--model /models/HauhauCS-Qwen3.5-9B-Uncensored-Aggressive.q4_k_m.gguf
|
|
--mmproj /models/HauhauCS-Qwen3.5-9B-Uncensored-Aggressive.mmproj.gguf
|
|
--alias "hf.co/HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive:q4_k_m"
|
|
--n-gpu-layers 999
|
|
--ctx-size 32768
|
|
|
|
"hf.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:q4_k_m":
|
|
cmd: |
|
|
/app/llama-server
|
|
--host 0.0.0.0 --port ${PORT}
|
|
--model /models/Jackrong-Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2.q4_k_m.gguf
|
|
--mmproj /models/Jackrong-Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2.mmproj.gguf
|
|
--alias "hf.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:q4_k_m"
|
|
--n-gpu-layers 999
|
|
--ctx-size 32768
|
|
--temp 0.6 --top-k 20 --top-p 0.95 --repeat-penalty 1
|