Problem with CUDA Memory

Isaac Sim Version

4.0.0

Operating System

Ubuntu 22.04

Topic Description

Detailed Description

I tried to rerun my program but I can’t the program were working before and now they are not anymore. I don’t remeber all the things I have done before m program ne fonctionne plus but there is a list :

  • I followed the tutorial to use curobo with moveit2
  • I have downloaded and installed the IsaacSim v.4.5.0 maybe there is a conflict etween thise two.

Error Messages

When I run my program after a restart everything works fine but If I stopped it this message is shown :

2025-02-04 08:53:57 [1,270,888ms] [Warning] [omni.usd] Unexpected reference count of 3 for UsdStage 'anon:0x28d05cc0:World0.usd' while being closed in UsdContext (this may indicate it is still resident in memory).
[363.478s] Simulation App Shutting Down
2025-02-04T09:19:06.516473Z  INFO hub-002 ThreadId(13) hub::rpc::server_msgpack: hub/src/rpc/server_msgpack.rs:172: Connection from 127.0.0.1:46744 closed

and then when I restart the program:

2025-02-04 08:54:12 [5,316ms] [Warning] [carb.omniclient.plugin]  16194: OmniHub: ThreadId(03) Hub encountered error. Trying to reconnect to Hub. retry_reason="Hub failed to launch: Io(\"waited for file \\\"/tmp/hub-theobloesch-32343ABF.config.json\\\"\")"
2025-02-04 08:54:17 [10,358ms] [Warning] [carb.omniclient.plugin]  16194: OmniHub: ThreadId(03) Hub encountered error. Trying to reconnect to Hub. retry_reason="Hub failed to launch: Io(\"waited for file \\\"/tmp/hub-theobloesch-32343ABF.config.json\\\"\")"
2025-02-04 08:54:22 [15,407ms] [Warning] [carb.omniclient.plugin]  16194: OmniHub: ThreadId(03) Hub encountered error. Trying to reconnect to Hub. retry_reason="Hub failed to launch: Io(\"waited for file \\\"/tmp/hub-theobloesch-32343ABF.config.json\\\"\")"
2025-02-04 08:54:27 [20,465ms] [Warning] [carb.omniclient.plugin]  16194: OmniHub: ThreadId(03) Hub encountered error. Trying to reconnect to Hub. retry_reason="Hub failed to launch: Io(\"waited for file \\\"/tmp/hub-theobloesch-32343ABF.config.json\\\"\")"
2025-02-04 08:54:32 [25,532ms] [Warning] [carb.omniclient.plugin]  16194: OmniHub: ThreadId(03) Hub encountered error. Trying to reconnect to Hub. retry_reason="Hub failed to launch: Io(\"waited for file \\\"/tmp/hub-theobloesch-32343ABF.config.json\\\"\")"
2025-02-04T08:54:37.702598Z  INFO main ThreadId(01) hub: hub/src/hub.rs:54: return=Err(error sending request for url (http://127.0.0.1:14090/api/1.0/service/config)

Caused by:
    operation timed out)
Error: error sending request for url (http://127.0.0.1:14090/api/1.0/service/config)

Caused by:
    operation timed out
2025-02-04 08:54:37 [30,610ms] [Warning] [carb.omniclient.plugin]  16194: OmniHub: ThreadId(03) Hub encountered error. Trying to reconnect to Hub. retry_reason="Hub failed to launch: Io(\"waited for file \\\"/tmp/hub-theobloesch-32343ABF.config.json\\\"\")"
2025-02-04T08:54:42.751291Z  INFO main ThreadId(01) hub: hub/src/hub.rs:54: return=Err(error sending request for url (http://127.0.0.1:14090/api/1.0/service/config)

Or the program restart but very slowly:

2025-02-04T09:09:02.236100Z  INFO hub-002 ThreadId(13) lru_task:lru_update: hub_cache::block::cache: hub-cache/src/block/cache.rs:1320: return=Ok(())
2025-02-04T09:09:02.236141Z  INFO hub-002 ThreadId(13) lru_task:lru_update: hub_cache::block::cache: hub-cache/src/block/cache.rs:1320: return=Ok(())
2025-02-04T09:09:02.236168Z  INFO hub-002 ThreadId(13) lru_task:lru_update: hub_cache::block::cache: hub-cache/src/block/cache.rs:1320: return=Ok(())
warming up...
/home/theobloesch/curobo/src/curobo/rollout/dynamics_model/tensor_step.py:538: UserWarning: Applied workaround for CuDNN issue, install nvrtc.so (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:80.)
  return filter_signal_jit(signal, self._sma_kernel)
Curobo is Ready
2025-02-04 09:09:10 [18,868ms] [Warning] [omni.hydra] Mesh '/World/obstacles/limit_3_mesh' has corrupted data in primvar 'displayColor': buffer size 8 doesn't match expected size 36 in faceVarying primvars
2025-02-04 09:09:10 [18,868ms] [Warning] [omni.hydra] Mesh '/World/obstacles/rack_sp_mesh' has corrupted data in primvar 'displayColor': buffer size 8 doesn't match expected size 36 in faceVarying primvars
2025-02-04 09:09:10 [18,868ms] [Warning] [omni.hydra] Mesh '/World/obstacles/rack_rp_mesh' has corrupted data in primvar 'displayColor': buffer size 8 doesn't match expected size 36 in faceVarying primvars
2025-02-04 09:09:10 [18,869ms] [Warning] [omni.hydra] Mesh '/World/obstacles/rack_tp_mesh' has corrupted data in primvar 'displayColor': buffer size 8 doesn't match expected size 36 in faceVarying primvars
2025-02-04 09:09:10 [18,869ms] [Warning] [omni.hydra] Mesh '/World/obstacles/rack_stp_mesh' has corrupted data in primvar 'displayColor': buffer size 8 doesn't match expected size 36 in faceVarying primvars
2025-02-04 09:09:10 [18,869ms] [Warning] [omni.hydra] Mesh '/World/obstacles/rack_lp_mesh' has corrupted data in primvar 'displayColor': buffer size 8 doesn't match expected size 36 in faceVarying primvars
2025-02-04 09:09:10 [18,869ms] [Warning] [omni.hydra] Mesh '/World/obstacles/table_mesh' has corrupted data in primvar 'displayColor': buffer size 8 doesn't match expected size 36 in faceVarying primvars
2025-02-04 09:09:12 [21,152ms] [Warning] [omni.physx.tensors.plugin] Cannot assign transform to non-root articulation link at '/World/UF_ROBOT/root_joint/xarm6link_tcp'
2025-02-04 09:09:12 [21,152ms] [Warning] [omni.physx.tensors.plugin] Cannot assign velocities to rigid body at '/World/UF_ROBOT/root_joint/xarm6link_tcp'
2025-02-04 09:09:12 [21,152ms] [Warning] [omni.physx.tensors.plugin] Cannot assign transform to non-root articulation link at '/World/UF_ROBOT/root_joint/xarm6link_tcp'
2025-02-04 09:09:12 [21,152ms] [Warning] [omni.physx.tensors.plugin] Cannot assign velocities to rigid body at '/World/UF_ROBOT/root_joint/xarm6link_tcp'
Updating world, reading w.r.t. /World/UF_ROBOT
Obstacles read from stage 19
Updated World
[0.59958845 0.07874151]
[0.59958845 0.07874151]
[0.02631525 0.02381778]
[0.00159558 0.00159486]

Additional Information

I had also a warning message that the Cuda memory run out of place but it doesn’t appear anymore.

What I’ve Tried

I have tried to restart my PC to run older program that worked and example from Curobo but nothings works.

Thanks you in advane for all the help you would provide
Best regards

Please move on to the latest release (Isaac Sim 4.5)

Sorry, after rereading Curobo’s documentation, I realized that it is only compatible with Isaac Sim 4.0.0—that’s where my issue comes from. Do you know if Curobo will be available soon for the latest Isaac Sim release?"

Hi,

After retesting my code with the correct version of Isaac Sim (4.0.0) compatible with Curobo, I’m still facing the same issue. Could you provide some guidance to troubleshoot this problem?

Thank you in advance for your help—I really appreciate it!

Error message :

Traceback (most recent call last):
  File "/home/theobloesch/xArm6_Pick_Place/with_motion_gen/Pick_Place_demo_real.py", line 780, in <module>
    main()
  File "/home/theobloesch/xArm6_Pick_Place/with_motion_gen/Pick_Place_demo_real.py", line 625, in main
    curobo.config_motion_gen()
  File "/home/theobloesch/xArm6_Pick_Place/with_motion_gen/Pick_Place_demo_real.py", line 269, in config_motion_gen
    self.motion_gen.warmup(enable_graph=True, warmup_js_trajopt=False)
  File "/home/theobloesch/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1892, in warmup
    self.plan_single(
  File "/home/theobloesch/curobo/src/curobo/wrap/reacher/motion_gen.py", line 1533, in plan_single
    result = self._plan_attempts(
  File "/home/theobloesch/curobo/src/curobo/wrap/reacher/motion_gen.py", line 2997, in _plan_attempts
    result = self._plan_from_solve_state(
  File "/home/theobloesch/curobo/src/curobo/wrap/reacher/motion_gen.py", line 3450, in _plan_from_solve_state
    traj_result = self._solve_trajopt_from_solve_state(
  File "/home/theobloesch/.local/share/ov/pkg/isaac-sim-4.0.0/kit/python/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/theobloesch/curobo/src/curobo/wrap/reacher/motion_gen.py", line 2819, in _solve_trajopt_from_solve_state
    traj_result = trajopt_instance.solve_any(
  File "/home/theobloesch/curobo/src/curobo/wrap/reacher/trajopt.py", line 805, in solve_any
    return self.solve_single(
  File "/home/theobloesch/curobo/src/curobo/wrap/reacher/trajopt.py", line 978, in solve_single
    return self._solve_from_solve_state(
  File "/home/theobloesch/curobo/src/curobo/wrap/reacher/trajopt.py", line 920, in _solve_from_solve_state
    traj_result = self._get_result(
  File "/home/theobloesch/.local/share/ov/pkg/isaac-sim-4.0.0/kit/python/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/theobloesch/curobo/src/curobo/wrap/reacher/trajopt.py", line 1337, in _get_result
    metrics = self.interpolate_rollout.get_metrics_cuda_graph(interpolated_trajs)
  File "/home/theobloesch/curobo/src/curobo/rollout/arm_base.py", line 461, in get_metrics_cuda_graph
    self._cu_out_metrics = self.get_metrics(self._cu_metrics_state_in)
  File "/home/theobloesch/curobo/src/curobo/rollout/arm_base.py", line 436, in get_metrics
    out_metrics = self.constraint_fn(state)
  File "/home/theobloesch/curobo/src/curobo/rollout/arm_base.py", line 396, in constraint_fn
    coll_constraint = self.primitive_collision_constraint.forward(
  File "/home/theobloesch/curobo/src/curobo/rollout/cost/primitive_collision_cost.py", line 173, in discrete_fn
    self._collision_query_buffer.update_buffer_shape(
  File "/home/theobloesch/curobo/src/curobo/geom/sdf/world.py", line 270, in update_buffer_shape
    self.create_from_shape(shape, tensor_args, collision_types)
  File "/home/theobloesch/curobo/src/curobo/geom/sdf/world.py", line 243, in create_from_shape
    self.primitive_collision_buffer = CollisionBuffer.initialize_from_shape(
  File "/home/theobloesch/curobo/src/curobo/geom/sdf/world.py", line 67, in initialize_from_shape
    distance_buffer = torch.zeros(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 28.00 MiB. GPU 0 has a total capacity of 5.77 GiB of which 17.94 MiB is free. Process 6314 has 2.89 GiB memory in use. Including non-PyTorch memory, this process has 2.81 GiB memory in use. Of the allocated memory 385.36 MiB is allocated by PyTorch, and 96.64 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Best regards,
Théo Bloesch