NemoClaw sandbox: missing PVC (volumeClaimTemplates: []) causes total data loss on container OOM/restart — contradicts docs

smithaimee956 · April 24, 2026, 1:37pm

Running NemoClaw v0.1.0 (cluster image Package openshell/cluster · GitHub , sandbox image

openshell/sandbox-from:1775835188) on macOS + Docker Desktop (Apple Silicon, arm64). Two bugs hit within a single

day of moderate use. Details and evidence below.

─────────────────────────────────────────

BUG 1 — No PVC provisioned; all data lost on container restart

─────────────────────────────────────────

The Workspace Files docs ( Workspace Files — NVIDIA NemoClaw Developer Guide ) explicitly

state:

"The sandbox uses a Persistent Volume Claim (PVC) that outlives individual container restarts."

But on my install, the Sandbox CRD was created without any PVC:

  $ kubectl get sandbox aimee-assistan2 -n openshell -o json | jq '.spec.volumeClaimTemplates'                  

  \[\]          

                                                                                                                

  $ kubectl get sandbox aimee-assistan2 -n openshell -o yaml | grep -A 6 "^      volumes:"                      

  volumes:

  - name: openshell-client-tls                                                                                  

    secret: {secretName: openshell-client-tls}

  - hostPath: {path: /opt/openshell/bin, type: DirectoryOrCreate}                                               

    name: openshell-supervisor-bin

No PVC, no emptyDir, no hostPath for /sandbox/ or /sandbox/.openclaw-data/. Everything at those paths (skills,

workspace .md files, agents/, device pairing) lived in the container’s writable overlay layer.

When the kubelet restarted the agent container, every persistent path was reset to the image defaults. Lost:

/sandbox/.openclaw/workspace/SOUL.md, USER.md, IDENTITY.md, AGENTS.md, memory/
/sandbox/.openclaw-data/skills/google-analyzer/ (148MB custom skill)
Device pairing state (had to re-pair TUI and dashboard)

Note: /sandbox/.openclaw/skills is a symlink → /sandbox/.openclaw-data/skills, so without a PVC on .openclaw-data,

skills are inherently ephemeral even if workspace/ were somehow protected.

─────────────────────────────────────────

BUG 2 — No default memory limit on the agent container

─────────────────────────────────────────

  $ kubectl get pod aimee-assistan2 -o yaml | grep -A 1 "resources:"                                            

  resources: {}

With no memory limit, a batch running CPU-based faster-whisper transcription (analyzing Google Drive videos via a

skill) grew unbounded until the kernel OOM-killer terminated the container:

  lastState:                                                                                                    

    terminated:

      exitCode: 137                                                                                             

      reason: OOMKilled

      startedAt: "2026-04-23T05:46:46Z"

      finishedAt: "2026-04-23T20:16:01Z"

Because no per-pod limit is set, memory pressure affected the whole Docker Desktop VM, not just this pod — other

pods could have been evicted as collateral damage.

─────────────────────────────────────────

Related existing issues I also encountered

─────────────────────────────────────────

https://github.com/NVIDIA/NemoClaw/issues/2042 — openclaw gateway + dashboard port-forward dead after pod

restart (nemoclaw-start.sh only runs at sandbox creation, not on pod recreation). Hit this ~5 times today.

[macOS] SSH handshake verification fails after Docker container restart — sandbox requires destroy/recreate · Issue #768 · NVIDIA/NemoClaw · GitHub — SSH handshake verification fails after Docker container restart.

My OPENSHELL_SSH_HANDSHAKE_SECRET env var in the Sandbox CRD had a stale value after a helm chart upgrade rotated

the server-side value; I had to manually patch the CRD to match before the TUI could connect.

─────────────────────────────────────────

Workaround I hand-applied

─────────────────────────────────────────

1. Created a 10Gi PVC with local-path StorageClass:

 kubectl apply -f - <<EOF                                                                                       

 apiVersion: v1                                                                                                 

 kind: PersistentVolumeClaim

 metadata:                                                                                                      

   name: aimee-assistan2-openclaw-data

   namespace: openshell                                                                                         

 spec:

   resources: {requests: {storage: 10Gi}}                                                                       

   storageClassName: local-path          

 EOF

2. Patched the Sandbox CRD to add the volume + mount:

 kubectl patch sandbox aimee-assistan2 -n openshell --type=json -p='\[                                           

   {"op":"add","path":"/spec/podTemplate/spec/volumes/-","value":{"name":"openclaw-data","persistentVolumeClaim"

:{“claimName”:“aimee-assistan2-openclaw-data”}}},

   {"op":"add","path":"/spec/podTemplate/spec/containers/0/volumeMounts/-","value":{"mountPath":"/sandbox/.openc

law-data",“name”:“openclaw-data”}}

\]'

3. Added resource requests/limits (prevent whole-node memory pressure):

 {"requests":{"memory":"512Mi","cpu":"250m"},"limits":{"memory":"4Gi"}}

4. Force pod recreation + restore data from local backup (~/.nemoclaw/backups/).

Data now survives container restarts, OOMs are container-scoped.

─────────────────────────────────────────

Questions for the NemoClaw team

─────────────────────────────────────────

1. Is the missing PVC a setup-flow bug, or is my install corrupted? Should a fresh `nemoclaw onboard` always

produce a Sandbox CRD with a PVC?

2. Are there plans to ship default `resources.requests`/`resources.limits` in the generated Sandbox podTemplate?

Running without any memory cap is a footgun on Docker-Desktop-hosted clusters.

3. Is there a supported path to migrate an existing sandbox from ephemeral-overlay to PVC-backed storage without

losing data? `nemoclaw rebuild` is mentioned in docs — does it add a PVC if one is missing?

4. Should `/sandbox/.openclaw/skills` be a symlink at all? It makes skill persistence depend on persistence of

`.openclaw-data/`, which has different guarantees from `workspace/`.

Happy to run additional diagnostics or provide a `nemoclaw debug` tarball if useful.

Topic		Replies	Views
[BUG] RTX 5070 Ti - Sandbox NotFound / Access Denied on NemoClaw Onboarding (WSL2) NVIDIA Nemotron ai , jetson , nim , agentic-ai , nemotron	5	564	April 1, 2026
NemoClaw on Spark DGX Spark / GB10 agentic-ai	56	4781	March 24, 2026
NemoClaw / OpenShell Jetson AGX Orin, Orin Super Nano and Orin NX Jetson Projects jetson	1	1482	March 23, 2026
OpenShell + OpenClaw + SGLang + ComfyUI DGX Spark / GB10 nemoclaw	4	657	March 26, 2026
Error with Nvidia VSS blueprint - nemo-rerank-ranking-deployment Visual AI Agent nvbugs	15	341	February 27, 2025
Nemoclaw- Jetson AGX Thor workaround / lessons learned Jetson Thor jetson , nemotron	4	384	April 6, 2026
VSS Blueprint Helm Installation- Nemo embedding pod failure Visual AI Agent nim , llama	30	449	May 29, 2025
Errors with NemoClaw DGX Spark playbook DGX Spark / GB10 nemotron	2	396	March 18, 2026
UNABLE TO GENERATE API KEY FOR USA LOCATION? GPU - Hardware kernel , nemo	2	72	March 18, 2026
The question of running "kubectl delete crd clusterpolicies.nvidia.com" TAO Toolkit	5	976	March 27, 2023

NemoClaw sandbox: missing PVC (volumeClaimTemplates: []) causes total data loss on container OOM/restart — contradicts docs

Related topics