Why Ollama?
Ollama lets you run AI models locally. Your resume data never leaves your machine. No API keys, no usage fees, complete privacy.
Prerequisites
- Docker Desktop installed
- 16GB+ RAM (32GB recommended for larger models)
- 10GB+ free disk space
Setup Options
Option 1: Ollama on Host (Recommended)
Install Ollama on your machine, then connect Resume Matcher to it.
- Install Ollama: ollama.com
- Pull a model:
ollama run gemini-3-flash-preview - Start Resume Matcher with Docker
- Configure in Settings:
- Provider:
Ollama - Model:
gemini-3-flash-preview(or your chosen model from list) - Server URL: See table below
- Provider:
Ollama Server URL by Platform:
| Platform | URL |
|---|---|
| Mac/Windows (Docker Desktop) | http://host.docker.internal:11434 |
| Linux (default) | http://172.17.0.1:11434 |
| Linux (host network) | http://localhost:11434 |
Option 2: Ollama in Docker
Run both Resume Matcher and Ollama as containers:
# docker-compose.yml
services:
resume-matcher:
build: .
ports:
- "3000:3000"
- "8000:8000"
environment:
- LLM_PROVIDER=ollama
- LLM_MODEL=llama3.2
- LLM_API_BASE=http://ollama:11434
depends_on:
- ollama
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
volumes:
ollama_data:
After starting, pull a model:
docker exec -it ollama ollama pull llama3.2
Recommended Models
| Model | Size | Speed | Quality |
|---|---|---|---|
llama3.2 | 3B | Fast | Good for most tasks |
llama3.2:7b | 7B | Medium | Better quality |
mistral | 7B | Medium | Good balance |
gemma2 | 9B | Medium | Google’s model |
Start with llama3.2 and upgrade if you need better output.
GPU Acceleration
NVIDIA GPUs: Install NVIDIA Container Toolkit, then add to your docker-compose.yml:
ollama:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Apple Silicon: Works automatically. No extra config needed.
Troubleshooting
“Connection refused” error?
- Check Ollama is running:
curl http://localhost:11434/api/tags - Verify the Server URL matches your platform
Slow responses?
- Use a smaller model:
ollama pull llama3.2:3b-q4_0 - Check available RAM
- First request is always slower (model loading)
Out of memory?
- Increase Docker memory in Desktop settings
- Use a quantized model (q4_0 suffix)
Next Steps
- Features - Explore capabilities
- Contributing - Help improve the project