desktop-dotfiles

Author	SHA1	Message	Date
Davide Polonio	ebc71492c3	fix(ollama): restrict to RX 9070 XT, restore mmproj - Set HIP_VISIBLE_DEVICES=0 to use only the discrete GPU (gfx1201). llama.cpp was trying to split layers across the iGPU (gfx1036) which caused segfaults when loading the multimodal projector. - Restore --mmproj for both HF models (multimodal works correctly with single GPU). - Keep qwen3.5:9b disabled (Ollama-extracted GGUF uses old mrope_sections key format incompatible with this llama.cpp build).	2026-04-10 00:09:12 +02:00
Davide Polonio	3034f987d7	feat(ollama): migrate from Ollama to llama.cpp + llama-swap Replace the Ollama service with a custom ROCm image combining ghcr.io/ggml-org/llama.cpp:server-rocm and llama-swap v199. Main motivations: - Unblock qwen35 HF GGUFs (qwen35 architecture not supported in Ollama 0.20.4 for HF-imported models) - Stay current with llama.cpp upstream without waiting for Ollama releases Changes: - ollama/Dockerfile: build llama-swap on top of llama.cpp:server-rocm - ollama/llama-swap.yaml: define 4 models with full sampler config, GPU offload, and mmproj for the two multimodal HF fine-tunes - ollama/docker-compose.yml: replace Ollama image with local build; fix broken volume mount (was /ubuntu/.ollama, now explicit /models) - ollama/Caddyfile: update upstream port 11434→8080 (llama-swap default) - ai/docker-compose.yml: switch Open WebUI from OLLAMA_BASE_URL to OPENAI_API_BASE_URL pointing at llama-swap /v1 endpoint	2026-04-09 23:14:43 +02:00
Davide Polonio	299d712400	fix: add persistence in Ollama container	2026-04-09 16:49:45 +02:00
Davide Polonio	7fdd996f29	feat: add first ollama stack	2026-03-11 14:27:02 +01:00

4 Commits