Replace the Ollama service with a custom ROCm image combining ghcr.io/ggml-org/llama.cpp:server-rocm and llama-swap v199. Main motivations: - Unblock qwen35 HF GGUFs (qwen35 architecture not supported in Ollama 0.20.4 for HF-imported models) - Stay current with llama.cpp upstream without waiting for Ollama releases Changes: - ollama/Dockerfile: build llama-swap on top of llama.cpp:server-rocm - ollama/llama-swap.yaml: define 4 models with full sampler config, GPU offload, and mmproj for the two multimodal HF fine-tunes - ollama/docker-compose.yml: replace Ollama image with local build; fix broken volume mount (was /ubuntu/.ollama, now explicit /models) - ollama/Caddyfile: update upstream port 11434→8080 (llama-swap default) - ai/docker-compose.yml: switch Open WebUI from OLLAMA_BASE_URL to OPENAI_API_BASE_URL pointing at llama-swap /v1 endpoint
13 lines
503 B
Docker
13 lines
503 B
Docker
# syntax=docker/dockerfile:1
|
|
FROM ghcr.io/ggml-org/llama.cpp:server-rocm
|
|
|
|
ARG LLAMA_SWAP_VERSION=v199
|
|
ADD https://github.com/mostlygeek/llama-swap/releases/download/${LLAMA_SWAP_VERSION}/llama-swap_199_linux_amd64.tar.gz /tmp/llama-swap.tar.gz
|
|
RUN tar -xzf /tmp/llama-swap.tar.gz -C /usr/local/bin llama-swap \
|
|
&& chmod +x /usr/local/bin/llama-swap \
|
|
&& rm /tmp/llama-swap.tar.gz
|
|
|
|
EXPOSE 8080
|
|
ENTRYPOINT ["/usr/local/bin/llama-swap"]
|
|
CMD ["-config", "/etc/llama-swap/config.yaml", "-listen", ":8080"]
|