Local Kubernetes Cluster with K3s on Nvidia DGX Spark

shahizat · December 26, 2025, 11:55am

Hello all, this guide outlines the steps to set up a K3s cluster using Docker as the container runtime with NVIDIA GPU support on Nvidia DGX Spark to serve the Qwen3-4B model using vLLM.

Before installing K3s, you must ensure Docker is configured to use the NVIDIA Container Runtime. This allows your Kubernetes pods to access the GPU hardware on Nvidia DGX Spark.

Edit /etc/docker/daemon.json and add the following configuration:

{
  "runtimes": {
    "nvidia": {
      "args": [],
      "path": "nvidia-container-runtime"
    }
  },
  "default-runtime": "nvidia"
}

Restart the Docker service to apply the new runtime settings:

sudo systemctl restart docker

By default, K3s uses containerd. Since we have configured Docker with the NVIDIA runtime, we must explicitly tell K3s to use Docker.

Standard Installation:

curl -sfL ``https://get.k3s.io`` | INSTALL_K3S_EXEC=“–docker --write-kubeconfig-mode 644 --disable=traefik” sh -

Installation with Custom DNS

curl -sfL ``https://get.k3s.io`` | INSTALL_K3S_EXEC=“–docker --write-kubeconfig-mode 644 --disable=traefik --resolv-conf /etc/k3s-dns.conf” sh -

Grant your user permission to access the cluster configuration and set the environment variable.

sudo chmod 644 /etc/rancher/k3s/k3s.yaml
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
echo “export KUBECONFIG=/etc/rancher/k3s/k3s.yaml” >> ~/.bashrc

Check if your node is ready:

NAME STATUS ROLES AGE VERSION
gx10-868a Ready control-plane,master 3m53s v1.33.6+k3s1

Then, create Persistent Volume Claim:

# pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: qwen3-4b-storage
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50G

Create a secret to store your Hugging Face token for model access

# secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: hf-token-secret
type: Opaque
stringData:
  token: "YOUR_ACTUAL_TOKEN"

Create Deployment and Service yaml file:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: qwen3-4b
  labels:
    app: qwen3-4b
spec:
  replicas: 1
  selector:
    matchLabels:
      app: qwen3-4b
  template:
    metadata:
      labels:
        app: qwen3-4b
    spec:
      volumes:
        - name: cache-volume
          persistentVolumeClaim:
            claimName: qwen3-4b-storage
        - name: shm
          emptyDir:
            medium: Memory
            sizeLimit: "2Gi"
      containers:
        - name: qwen3-4b
          image: nvcr.io/nvidia/vllm:25.12-py3
          command: ["/bin/sh", "-c"]
          args: [
            "vllm serve Qwen/Qwen3-4B --host 0.0.0.0 --trust-remote-code --max-model-len 32768 --gpu-memory-utilization 0.45"
          ]
          env:
            - name: HF_TOKEN
              valueFrom:
                secretKeyRef:
                  name: hf-token-secret
                  key: token
          ports:
            - containerPort: 8000
          resources:
            limits:
              cpu: "8"
              memory: 16Gi
              nvidia.com/gpu: 1  # Ensures the pod is scheduled on a GPU node
            requests:
              cpu: "4"
              memory: 8Gi
              nvidia.com/gpu: 1
          volumeMounts:
            - name: cache-volume
              mountPath: /root/.cache/huggingface
            - name: shm
              mountPath: /dev/shm
---
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: qwen3-4b
spec:
  ports:
    - name: http
      port: 8000
      targetPort: 8000
  selector:
    app: qwen3-4b
  type: ClusterIP

Apply these files:

kubectl apply -f pvc.yaml
kubectl apply -f secret.yaml
kubectl apply -f deployment.yaml

Nvidia-smi output:

Check logs of pod using below commands:

kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
qwen3-4b-7fd7d4485d-j42fn   1/1     Running   0          38m

Then run:

kubectl logs qwen3-4b-7fd7d4485d-j42fn

Once the pod status is Running, find the Service IP:

kubectl get svc qwen3-4b

Replace the IP address below with the CLUSTER-IP retrieved from the command above:

curl http://<SERVICE_IP>:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-4B",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain the Grace Blackwell architecture on DGX Spark."}
    ],
    "max_tokens": 100
  }'

Hope it helps.

justin.garrison · December 27, 2025, 6:06am

Thanks for sharing. 👍

For anyone interested, you can also use Talos Linux with the 1.12 release to run Kubernetes on the Spark. It removes DGX Linux and replaces it with a newer kernel (6.18) and optimized environment for Kubernetes.

The guide for setting up the NVIDIA drivers and container runtime can be found here NVIDIA GPU (OSS drivers) - Sidero Documentation

sakshamconsul · March 25, 2026, 5:50am

Thanks for sharing! Any luck getting the network-operator working with 2 DGX sparks?

Topic		Replies	Views
Local K8s Cluster with minikube on Nvidia DGX Spark DGX Spark / GB10 Projects	3	498	January 2, 2026
DGX Spark + Qwen3-Next-80B: Proven Performance, But Missing Clear Path to NIM, TensorRT-LLM & Web UIs DGX Spark / GB10 cuda , nim , llama	16	3532	March 6, 2026
I'd like to learn how to use the latest vLLM on DGX Spark DGX Spark / GB10 cuda	9	2025	November 29, 2025
DGX Spark Multi-Node LLM Inference Report for Qwen3-235B model DGX Spark / GB10 nim , llama	33	1794	January 2, 2026
Some new development work for Qwen3 on the Spark DGX Spark / GB10	5	652	February 3, 2026
HOW-TO: setup-dgx-spark docker inference - A "Sane" Inference Stack for GB10 (Need Contributors!) DGX Spark / GB10 Projects docker , llama , dgx	30	1201	March 11, 2026
How to run NVFP4/DeepSeek-R1-0528-Qwen3-8B-FP4 using eugr/spark-vllm-docker DGX Spark / GB10 deepseek	9	284	March 16, 2026
Microk8s on Spark DGX Spark / GB10	9	635	October 27, 2025
DGX Spark, Nemotron3, and NVFP4: Getting to 65+ tps DGX Spark / GB10 spark , nemotron , dgx	14	1581	December 22, 2025
Slow performance for RedHatAI/Qwen3-Coder-Next-NVFP4 DGX Spark / GB10	12	207	March 11, 2026

Local Kubernetes Cluster with K3s on Nvidia DGX Spark

Related topics