Docker + Ollama Setup | Resume Matcher

Why Ollama?

Ollama lets you run AI models locally. Your resume data never leaves your machine. No API keys, no usage fees, complete privacy.

Prerequisites

Docker Desktop installed
16GB+ RAM (32GB recommended for larger models)
10GB+ free disk space

Setup Options

Option 1: Ollama on Host (Recommended)

Install Ollama on your machine, then connect Resume Matcher to it.

Install Ollama: ollama.com
Pull a model: ollama pull qwen3:8b
Start Resume Matcher with Docker
Configure in Settings:
- Provider: Ollama
- Model: qwen3:8b (or your chosen model from Ollama Library)
- Server URL: See table below

Ollama Server URL by Platform:

Platform	URL
Mac/Windows (Docker Desktop)	`http://host.docker.internal:11434`
Linux (default)	`http://172.17.0.1:11434`
Linux (host network)	`http://localhost:11434`

Option 2: Ollama in Docker

Run both Resume Matcher and Ollama as containers:

# docker-compose.yml
services:
  resume-matcher:
    build: .
    ports:
      - "3000:3000"
      - "8000:8000"
    environment:
      - LLM_PROVIDER=ollama
      - LLM_MODEL=qwen3:8b
      - LLM_API_BASE=http://ollama:11434
    depends_on:
      - ollama

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

volumes:
  ollama_data:

After starting, pull a model:

docker exec -it ollama ollama pull qwen3:8b

Recommended Models

Resume Matcher requires models that reliably return structured JSON. These models are tested and optimized for JSON schema compliance:

Model	Size	Speed	Best For
`qwen3:4b`	4B	Fast	Quick iterations, lower RAM
`qwen3:8b`	8B	Medium	Recommended — best balance of speed and quality
`granite4:3b`	3B	Fast	Lightweight, built for structured output
`glm-4.7-flash`	30B	Slower	Highest quality, needs 32GB+ RAM

Why Qwen3? Alibaba’s Qwen3 models are specifically optimized for reasoning and structured responses. They consistently produce valid JSON even at smaller sizes.

GPU Acceleration

NVIDIA GPUs: Install NVIDIA Container Toolkit, then add to your docker-compose.yml:

ollama:
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]

Apple Silicon: Works automatically. No extra config needed.

Troubleshooting

“Connection refused” error?

Check Ollama is running: curl http://localhost:11434/api/tags
Verify the Server URL matches your platform

Slow responses?

Use a smaller model: ollama pull qwen3:4b
Check available RAM
First request is always slower (model loading)

Out of memory?

Increase Docker memory in Desktop settings
Use a quantized model (q4_0 suffix)

Next Steps

Features - Explore capabilities
Contributing - Help improve the project