NemoClaw sandbox: missing PVC (volumeClaimTemplates: []) causes total data loss on container OOM/restart — contradicts docs

Running NemoClaw v0.1.0 (cluster image Package openshell/cluster · GitHub , sandbox image

openshell/sandbox-from:1775835188) on macOS + Docker Desktop (Apple Silicon, arm64). Two bugs hit within a single

day of moderate use. Details and evidence below.

─────────────────────────────────────────

BUG 1 — No PVC provisioned; all data lost on container restart

─────────────────────────────────────────

The Workspace Files docs ( Workspace Files — NVIDIA NemoClaw Developer Guide ) explicitly

state:

"The sandbox uses a Persistent Volume Claim (PVC) that outlives individual container restarts."                 

But on my install, the Sandbox CRD was created without any PVC:

  $ kubectl get sandbox aimee-assistan2 -n openshell -o json | jq '.spec.volumeClaimTemplates'                  

  \[\]          

                                                                                                                

  $ kubectl get sandbox aimee-assistan2 -n openshell -o yaml | grep -A 6 "^      volumes:"                      

  volumes:

  - name: openshell-client-tls                                                                                  

    secret: {secretName: openshell-client-tls}

  - hostPath: {path: /opt/openshell/bin, type: DirectoryOrCreate}                                               

    name: openshell-supervisor-bin                                                                              

No PVC, no emptyDir, no hostPath for /sandbox/ or /sandbox/.openclaw-data/. Everything at those paths (skills,

workspace .md files, agents/, device pairing) lived in the container’s writable overlay layer.

When the kubelet restarted the agent container, every persistent path was reset to the image defaults. Lost:

  • /sandbox/.openclaw/workspace/SOUL.md, USER.md, IDENTITY.md, AGENTS.md, memory/

  • /sandbox/.openclaw-data/skills/google-analyzer/ (148MB custom skill)

  • Device pairing state (had to re-pair TUI and dashboard)

Note: /sandbox/.openclaw/skills is a symlink → /sandbox/.openclaw-data/skills, so without a PVC on .openclaw-data,

skills are inherently ephemeral even if workspace/ were somehow protected.

─────────────────────────────────────────

BUG 2 — No default memory limit on the agent container

─────────────────────────────────────────

  $ kubectl get pod aimee-assistan2 -o yaml | grep -A 1 "resources:"                                            

  resources: {}

With no memory limit, a batch running CPU-based faster-whisper transcription (analyzing Google Drive videos via a

skill) grew unbounded until the kernel OOM-killer terminated the container:

  lastState:                                                                                                    

    terminated:

      exitCode: 137                                                                                             

      reason: OOMKilled

      startedAt: "2026-04-23T05:46:46Z"

      finishedAt: "2026-04-23T20:16:01Z"                                                                        

Because no per-pod limit is set, memory pressure affected the whole Docker Desktop VM, not just this pod — other

pods could have been evicted as collateral damage.

─────────────────────────────────────────

Related existing issues I also encountered

─────────────────────────────────────────

restart (nemoclaw-start.sh only runs at sandbox creation, not on pod recreation). Hit this ~5 times today.

My OPENSHELL_SSH_HANDSHAKE_SECRET env var in the Sandbox CRD had a stale value after a helm chart upgrade rotated

the server-side value; I had to manually patch the CRD to match before the TUI could connect.

─────────────────────────────────────────

Workaround I hand-applied

─────────────────────────────────────────

1. Created a 10Gi PVC with local-path StorageClass:

 kubectl apply -f - <<EOF                                                                                       

 apiVersion: v1                                                                                                 

 kind: PersistentVolumeClaim

 metadata:                                                                                                      

   name: aimee-assistan2-openclaw-data

   namespace: openshell                                                                                         

 spec:

   resources: {requests: {storage: 10Gi}}                                                                       

   storageClassName: local-path          

 EOF                                                                                                            

2. Patched the Sandbox CRD to add the volume + mount:

 kubectl patch sandbox aimee-assistan2 -n openshell --type=json -p='\[                                           

   {"op":"add","path":"/spec/podTemplate/spec/volumes/-","value":{"name":"openclaw-data","persistentVolumeClaim"

:{“claimName”:“aimee-assistan2-openclaw-data”}}},

   {"op":"add","path":"/spec/podTemplate/spec/containers/0/volumeMounts/-","value":{"mountPath":"/sandbox/.openc

law-data",“name”:“openclaw-data”}}

 \]'                                                                                                             

3. Added resource requests/limits (prevent whole-node memory pressure):

 {"requests":{"memory":"512Mi","cpu":"250m"},"limits":{"memory":"4Gi"}}                                         

4. Force pod recreation + restore data from local backup (~/.nemoclaw/backups/).

Data now survives container restarts, OOMs are container-scoped.

─────────────────────────────────────────

Questions for the NemoClaw team

─────────────────────────────────────────

1. Is the missing PVC a setup-flow bug, or is my install corrupted? Should a fresh `nemoclaw onboard` always

produce a Sandbox CRD with a PVC?

2. Are there plans to ship default `resources.requests`/`resources.limits` in the generated Sandbox podTemplate?

Running without any memory cap is a footgun on Docker-Desktop-hosted clusters.

3. Is there a supported path to migrate an existing sandbox from ephemeral-overlay to PVC-backed storage without

losing data? `nemoclaw rebuild` is mentioned in docs — does it add a PVC if one is missing?

4. Should `/sandbox/.openclaw/skills` be a symlink at all? It makes skill persistence depend on persistence of

`.openclaw-data/`, which has different guarantees from `workspace/`.

Happy to run additional diagnostics or provide a `nemoclaw debug` tarball if useful.