Davide Polonio 3034f987d7 feat(ollama): migrate from Ollama to llama.cpp + llama-swap
Replace the Ollama service with a custom ROCm image combining
ghcr.io/ggml-org/llama.cpp:server-rocm and llama-swap v199.

Main motivations:
- Unblock qwen35 HF GGUFs (qwen35 architecture not supported in
  Ollama 0.20.4 for HF-imported models)
- Stay current with llama.cpp upstream without waiting for Ollama releases

Changes:
- ollama/Dockerfile: build llama-swap on top of llama.cpp:server-rocm
- ollama/llama-swap.yaml: define 4 models with full sampler config,
  GPU offload, and mmproj for the two multimodal HF fine-tunes
- ollama/docker-compose.yml: replace Ollama image with local build;
  fix broken volume mount (was /ubuntu/.ollama, now explicit /models)
- ollama/Caddyfile: update upstream port 11434→8080 (llama-swap default)
- ai/docker-compose.yml: switch Open WebUI from OLLAMA_BASE_URL to
  OPENAI_API_BASE_URL pointing at llama-swap /v1 endpoint
2026-04-09 23:14:43 +02:00

13 lines
503 B
Docker

# syntax=docker/dockerfile:1
FROM ghcr.io/ggml-org/llama.cpp:server-rocm
ARG LLAMA_SWAP_VERSION=v199
ADD https://github.com/mostlygeek/llama-swap/releases/download/${LLAMA_SWAP_VERSION}/llama-swap_199_linux_amd64.tar.gz /tmp/llama-swap.tar.gz
RUN tar -xzf /tmp/llama-swap.tar.gz -C /usr/local/bin llama-swap \
&& chmod +x /usr/local/bin/llama-swap \
&& rm /tmp/llama-swap.tar.gz
EXPOSE 8080
ENTRYPOINT ["/usr/local/bin/llama-swap"]
CMD ["-config", "/etc/llama-swap/config.yaml", "-listen", ":8080"]