When you deploy Ollama, the following Kubernetes objects are created:
Client → Istio Gateway → VirtualService → Ollama Service (port 11434) → Ollama Pod → PVC (model storage)
| Object | Purpose |
|--------|---------|
| **Deployment** | Manages the Ollama pod (standalone mode) |
| **PVC** | Persistent storage for downloaded models |
| **Service** | ClusterIP exposing port 11434 (REST) and 8080 (management) |
| **VirtualService** | Istio routing rule for external access |
| **ConfigMap** | Ollama server configuration |
Before deploying Ollama, ensure you have:
standard-rwo or Longhornhyperplane.dev/nodeType: hyperplane-stack-component-pool)Choose your hardware based on the models you plan to run:
| Target Model | Min GPU | Min VRAM | Min System RAM | PVC Size |
|-------------|---------|----------|----------------|----------|
| phi3:mini (3.8B) | None (CPU OK) | — | 8 GB | 20 GB |
| llama3.1:8b | T4 or L4 | 16 GB | 16 GB | 50 GB |
| mistral:7b | T4 or L4 | 16 GB | 16 GB | 50 GB |
| codellama:13b | A10G | 24 GB | 32 GB | 80 GB |
| llama3.1:70b | A100 | 80 GB | 128 GB | 200 GB |
GPU recommendations by use case:
Key points:
ollama.models.clean: false — Never enable this initially. It can delete models you've already pulled.podLabels.sidecar.istio.io/inject: "true" — Without this, Istio won't inject the sidecar proxy, and external routing will fail.updateStrategy.type: Recreate — Ollama uses a single pod with PVC. RollingUpdate won't work properly with PVC binding.helm upgrade --install ollama <chart-path> \\
-n hyperplane-ollama \\
-f values.yaml \\
--create-namespace \\
--wait \\
--timeout 10m
--upgrade --install — Installs if new, upgrades if existing--wait — Blocks until all pods are ready--timeout 10m — Fails if deployment doesn't complete in 10 minutes⚠️ This is a real issue we've hit in production. After a Helm upgrade, the VirtualService was missing, causing a 404 error. You may need to create it manually.
Ollama needs an Istio VirtualService to be accessible externally. Check if one exists:
kubectl get virtualservice -n hyperplane-ollama
If missing, create one:
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: ollama-vs
namespace: hyperplane-ollama
labels:
app.kubernetes.io/name: ollama
hyperplane-service-name: ollama
hyperplane.dev/stack-component: ollama
spec:
gateways:
- hyperplane-istio/ingress-gateway # Your gateway name
hosts:
- ollama.staging.canopyhub.io # Your domain
http:
- match:
- uri:
prefix: /
route:
- destination:
host: ollama # Must match the Service name
port:
number: 11434
Apply it:
kubectl apply -f ollama-virtualservice.yaml
Common mistakes:
kubectl get gateway -n hyperplane-istioollama)After deployment, you need to pull at least one model before you can use Ollama.
Option A: Pull from within the pod
# Get the pod name
kubectl get pods -n hyperplane-ollama
# Pull a model
kubectl exec -n hyperplane-ollama <pod-name> -- ollama pull llama3.1
Option B: Pull via API
curl <http://ollama.hyperplane-ollama.svc.cluster.local:11434/api/pull> \\
-d '{"name": "llama3.1"}'
Option C: Auto-pull via Helm values
ollama:
models:
pull:
- llama3.1
- mistral
Recommended first model: llama3.1:8b — good balance of quality and speed.
Run these checks to confirm everything is working:
# 1. Pod is running and ready
kubectl get pods -n hyperplane-ollama
# 2. Service exists and points to the pod
kubectl get svc -n hyperplane-ollama
# 3. API responds
kubectl exec -n hyperplane-ollama <pod-name> -- curl -s <http://localhost:11434/api/version>
# 4. Model inference works
kubectl exec -n hyperplane-ollama <pod-name> -- \\
curl -s <http://localhost:11434/api/generate> \\
-d '{"model": "llama3.1", "prompt": "Hello", "stream": false}'
# 5. External access works (if VirtualService configured)
curl -s https://ollama.<your-domain>/api/version
# 6. Istio sidecar is injected
kubectl get pods -n hyperplane-ollama -o jsonpath='{.items[*].spec.containers[*].name}'
# Should show both "ollama" and "istio-proxy"
Success criteria:
For GPU-accelerated inference, add these to your values.yaml:
ollama:
gpu:
type: nvidia
# Resource requests for GPU
resources:
limits:
nvidia.com/gpu: 1
# Tolerations for GPU nodes
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
GPU setup checklist:
nvidia-device-plugin DaemonSet running on GPU nodesstandard-rwo if Longhorn is not present)nvidia if required by your clusterDRA (Dynamic Resource Allocation): The Helm chart supports DRA for GPU allocation. This is a newer Kubernetes feature — leave it disabled unless your cluster explicitly supports it.
Based on our real upgrade from chart 1.18.0 (app 0.11.3) to chart 1.50.0 (app 0.17.7):
helm get values ollama -n hyperplane-ollama -o yaml > ollama-values-backup.yaml
kubectl exec -n hyperplane-ollama <pod-name> -- ollama list > ollama-models-backup.txt
kubectl get deployment ollama -n hyperplane-ollama -o yaml > ollama-deployment-backup.yaml
# Dry run first
helm upgrade ollama <new-chart-path> \\
-n hyperplane-ollama \\
-f values.yaml \\
--dry-run --debug
# If dry run looks good, execute
helm upgrade ollama <new-chart-path> \\
-n hyperplane-ollama \\
-f values.yaml \\
--wait --timeout 15m
ollama --version to confirm new versionollama list to confirm models survived the upgradehelm rollback ollama -n hyperplane-ollama
Recreate update strategy, so there's a brief outage during upgrademodels.clean: false — Don't enable model cleanup during upgrades