Running NemoClaw v0.1.0 (cluster image Package openshell/cluster · GitHub , sandbox image
openshell/sandbox-from:1775835188) on macOS + Docker Desktop (Apple Silicon, arm64). Two bugs hit within a single
day of moderate use. Details and evidence below.
─────────────────────────────────────────
BUG 1 — No PVC provisioned; all data lost on container restart
─────────────────────────────────────────
The Workspace Files docs ( Workspace Files — NVIDIA NemoClaw Developer Guide ) explicitly
state:
"The sandbox uses a Persistent Volume Claim (PVC) that outlives individual container restarts."
But on my install, the Sandbox CRD was created without any PVC:
$ kubectl get sandbox aimee-assistan2 -n openshell -o json | jq '.spec.volumeClaimTemplates'
\[\]
$ kubectl get sandbox aimee-assistan2 -n openshell -o yaml | grep -A 6 "^ volumes:"
volumes:
- name: openshell-client-tls
secret: {secretName: openshell-client-tls}
- hostPath: {path: /opt/openshell/bin, type: DirectoryOrCreate}
name: openshell-supervisor-bin
No PVC, no emptyDir, no hostPath for /sandbox/ or /sandbox/.openclaw-data/. Everything at those paths (skills,
workspace .md files, agents/, device pairing) lived in the container’s writable overlay layer.
When the kubelet restarted the agent container, every persistent path was reset to the image defaults. Lost:
-
/sandbox/.openclaw/workspace/SOUL.md, USER.md, IDENTITY.md, AGENTS.md, memory/
-
/sandbox/.openclaw-data/skills/google-analyzer/ (148MB custom skill)
-
Device pairing state (had to re-pair TUI and dashboard)
Note: /sandbox/.openclaw/skills is a symlink → /sandbox/.openclaw-data/skills, so without a PVC on .openclaw-data,
skills are inherently ephemeral even if workspace/ were somehow protected.
─────────────────────────────────────────
BUG 2 — No default memory limit on the agent container
─────────────────────────────────────────
$ kubectl get pod aimee-assistan2 -o yaml | grep -A 1 "resources:"
resources: {}
With no memory limit, a batch running CPU-based faster-whisper transcription (analyzing Google Drive videos via a
skill) grew unbounded until the kernel OOM-killer terminated the container:
lastState:
terminated:
exitCode: 137
reason: OOMKilled
startedAt: "2026-04-23T05:46:46Z"
finishedAt: "2026-04-23T20:16:01Z"
Because no per-pod limit is set, memory pressure affected the whole Docker Desktop VM, not just this pod — other
pods could have been evicted as collateral damage.
─────────────────────────────────────────
Related existing issues I also encountered
─────────────────────────────────────────
- https://github.com/NVIDIA/NemoClaw/issues/2042 — openclaw gateway + dashboard port-forward dead after pod
restart (nemoclaw-start.sh only runs at sandbox creation, not on pod recreation). Hit this ~5 times today.
- [macOS] SSH handshake verification fails after Docker container restart — sandbox requires destroy/recreate · Issue #768 · NVIDIA/NemoClaw · GitHub — SSH handshake verification fails after Docker container restart.
My OPENSHELL_SSH_HANDSHAKE_SECRET env var in the Sandbox CRD had a stale value after a helm chart upgrade rotated
the server-side value; I had to manually patch the CRD to match before the TUI could connect.
─────────────────────────────────────────
Workaround I hand-applied
─────────────────────────────────────────
1. Created a 10Gi PVC with local-path StorageClass:
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: aimee-assistan2-openclaw-data
namespace: openshell
spec:
resources: {requests: {storage: 10Gi}}
storageClassName: local-path
EOF
2. Patched the Sandbox CRD to add the volume + mount:
kubectl patch sandbox aimee-assistan2 -n openshell --type=json -p='\[
{"op":"add","path":"/spec/podTemplate/spec/volumes/-","value":{"name":"openclaw-data","persistentVolumeClaim"
:{“claimName”:“aimee-assistan2-openclaw-data”}}},
{"op":"add","path":"/spec/podTemplate/spec/containers/0/volumeMounts/-","value":{"mountPath":"/sandbox/.openc
law-data",“name”:“openclaw-data”}}
\]'
3. Added resource requests/limits (prevent whole-node memory pressure):
{"requests":{"memory":"512Mi","cpu":"250m"},"limits":{"memory":"4Gi"}}
4. Force pod recreation + restore data from local backup (~/.nemoclaw/backups/).
Data now survives container restarts, OOMs are container-scoped.
─────────────────────────────────────────
Questions for the NemoClaw team
─────────────────────────────────────────
1. Is the missing PVC a setup-flow bug, or is my install corrupted? Should a fresh `nemoclaw onboard` always
produce a Sandbox CRD with a PVC?
2. Are there plans to ship default `resources.requests`/`resources.limits` in the generated Sandbox podTemplate?
Running without any memory cap is a footgun on Docker-Desktop-hosted clusters.
3. Is there a supported path to migrate an existing sandbox from ephemeral-overlay to PVC-backed storage without
losing data? `nemoclaw rebuild` is mentioned in docs — does it add a PVC if one is missing?
4. Should `/sandbox/.openclaw/skills` be a symlink at all? It makes skill persistence depend on persistence of
`.openclaw-data/`, which has different guarantees from `workspace/`.
Happy to run additional diagnostics or provide a `nemoclaw debug` tarball if useful.