Can't complete isaac_ros_foundationpose quickstart

Hi, I’m trying to complete this Quickstart.
I’m on x86, i9 with 32MB ram and RTX 4070 laptop with 8 GB ram.
Is there some way to reduce the batch size or is more ram needed for the GPU?

Here’s the error (just the last part):

[component_container_mt-1] [INFO] [1719138216.477395281] [foundationpose_node]: [NitrosSubscriber] Use the negotiated data format: "nitros_image_rgb8"
[component_container_mt-1] [INFO] [1719138216.477399532] [foundationpose_node]: [NitrosSubscriber] Negotiation ended with no results
[component_container_mt-1] [INFO] [1719138216.477402940] [foundationpose_node]: [NitrosSubscriber] Use the compatible subscriber: topic_name="/rgb/camera_info", data_format="nitros_camera_info"
[component_container_mt-1] [INFO] [1719138216.477406854] [foundationpose_node]: [NitrosSubscriber] Use the negotiated data format: "nitros_image_mono8"
[component_container_mt-1] [INFO] [1719138216.477410454] [foundationpose_node]: [NitrosSubscriber] Negotiation ended with no results
[component_container_mt-1] [INFO] [1719138216.477413557] [foundationpose_node]: [NitrosSubscriber] Use the compatible subscriber: topic_name="/depth_image", data_format="nitros_image_32FC1"
[component_container_mt-1] [INFO] [1719138216.477502035] [foundationpose_node]: [NitrosNode] Exporting the final graph based on the negotiation results
[component_container_mt-1] [INFO] [1719138218.149944263] [foundationpose_node]: [NitrosNode] Wrote the final top level YAML graph to "/tmp/isaac_ros_nitros/graphs/GVUOUAGZWE/GVUOUAGZWE.yaml"
[component_container_mt-1] [INFO] [1719138218.149985756] [foundationpose_node]: [NitrosNode] Loading application
[component_container_mt-1] 2024-06-23 12:23:38.156 WARN  gxf/std/yaml_file_loader.cpp@1077: Using unregistered parameter 'dummy_rx' in component ''.
[component_container_mt-1] 2024-06-23 12:23:38.156 WARN  gxf/std/yaml_file_loader.cpp@1077: Using unregistered parameter 'dummy_rx' in component ''.
[component_container_mt-1] 2024-06-23 12:23:38.157 WARN  gxf/std/yaml_file_loader.cpp@1077: Using unregistered parameter 'dev_id' in component 'stream'.
[component_container_mt-1] [INFO] [1719138218.157802817] [foundationpose_node]: [NitrosNode] Initializing and running GXF graph
[component_container_mt-1] 2024-06-23 12:23:38.160 WARN  gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] 2024-06-23 12:23:38.160 WARN  gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] 2024-06-23 12:23:38.160 WARN  gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] 2024-06-23 12:23:38.160 WARN  gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] 2024-06-23 12:23:38.161 WARN  gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] [INFO] [1719138218.162818484] [foundationpose_node]: [NitrosNode] Node was started
[component_container_mt-1] Could not open file 
[component_container_mt-1] Could not open file 
[component_container_mt-1] [ERROR] [1719138219.342796304] [TRT]: TRT ERROR: ModelImporter.cpp:733: Failed to parse ONNX model from file: 
[component_container_mt-1] [ERROR] [1719138219.445128976] [tensor_rt]: Unable to read tensor shape info from TRT Model Engine or from ONNX file.
[component_container_mt-1] [WARN] [1719138219.445165894] [tensor_rt]: Failed to get block size from model, set to the default size: 67108864.
[component_container_mt-1] [INFO] [1719138219.445226724] [tensor_rt]: Tensors 67108864 bytes, num outputs 40 x tensors per output 3 = 120 blocks
[component_container_mt-1] [INFO] [1719138219.445312650] [tensor_rt]: [NitrosNode] Initializing and running GXF graph
[component_container_mt-1] 2024-06-23 12:23:39.448 ERROR gxf/std/block_memory_pool.cpp@77: Failure in cudaMalloc. cuda_error: cudaErrorMemoryAllocation, error_str: out of memory
[component_container_mt-1] 2024-06-23 12:23:39.448 ERROR gxf/std/entity_warden.cpp@437: Failed to initialize component 00157 (pool)
[component_container_mt-1] 2024-06-23 12:23:39.448 ERROR gxf/core/runtime.cpp@702: Could not initialize entity 'XRYGZEGHUG_inference' (E152): GXF_OUT_OF_MEMORY
[component_container_mt-1] 2024-06-23 12:23:39.448 ERROR gxf/std/program.cpp@283: Failed to activate entity 00152 named XRYGZEGHUG_inference: GXF_OUT_OF_MEMORY
[component_container_mt-1] 2024-06-23 12:23:39.448 ERROR gxf/std/program.cpp@285: Deactivating...
[component_container_mt-1] 2024-06-23 12:23:39.448 ERROR gxf/core/runtime.cpp@1452: Graph activation failed with error: GXF_OUT_OF_MEMORY
[component_container_mt-1] [ERROR] [1719138219.448839191] [tensor_rt]: [NitrosContext] GxfGraphActivate Error: GXF_OUT_OF_MEMORY
[component_container_mt-1] [ERROR] [1719138219.448860417] [tensor_rt]: [NitrosNode] runGraphAsync Error: GXF_OUT_OF_MEMORY
[component_container_mt-1] terminate called after throwing an instance of 'std::runtime_error'
[component_container_mt-1]   what():  [NitrosNode] runGraphAsync Error: GXF_OUT_OF_MEMORY
[ERROR] [component_container_mt-1]: process has died [pid 576, exit code -6, cmd '/opt/ros/humble/lib/rclcpp_components/component_container_mt --ros-args -r __node:=container -r __ns:=/isaac_ros_examples'].

Other suggestions to be able to run this example?

I don’t know it this may be of help to understand the problem, but I tried to run nvidia-smi in another terminal inside the container: when the error messages compare, the GPU RAM is at about 5000 mb of 8188 MB total.
I could run successfully the quickstarts for isaac_ros_dope and isaac_ros_centerpose. During the model conversion of these examples the load on the GPU RAM got up to over 7 GB.

Hi @marco.pastorio

The Isaac ROS foundationpose demo requires more memory to execute. We are working to make a new release lighter.

The FoundationPose Tracking — isaac_ros_docs documentation demo uses less demo and can be run on your device.

Let me know if my response was helpful to you.

Best,
Raffaello

Hi,
I just tried FoundationPose Tracking but it still goes out of memory.
I’ll see if I can get my hands on a machine with more GPU RAM, and try again on future releases.

Since I was evaluating a jetson board, should I consider the same possible limitation on Orin Nano or NX with under 16 GB?

Thank you!

I tried on another machine with a RTX 3060 with 12 GB RAM, unluckily I could only try on Windows 11 with WSL, which is not ideal. Once the docker container is up and running there are about 1.25GB of memory already taken, so about 11 GB free.
Both FoundationPose examples still crash, but this time not for going out of memory but for this errors:

FoundationPose Quickstart:

[component_container_mt-1] 2024-06-27 10:23:19.142 WARN  ./gxf/extensions/tensor_rt/tensor_rt_inference.cpp@281: Rebuilding CUDA engine /workspaces/isaac_ros-dev/isaac_ros_assets/models/synthetica_detr/sdetr_grasp.plan (forced by config). Note: this process may take up to several minutes.
[component_container_mt-1] 2024-06-27 10:23:30.690 WARN  ./gxf/extensions/tensor_rt/tensor_rt_inference.cpp@155: TRT WARNING: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[component_container_mt-1] Could not open file model.onnx
[component_container_mt-1] Could not open file model.onnx
[component_container_mt-1] 2024-06-27 10:23:30.690 ERROR ./gxf/extensions/tensor_rt/tensor_rt_inference.cpp@151: TRT ERROR: ModelImporter.cpp:733: Failed to parse ONNX model from file: model.onnx
[component_container_mt-1] 2024-06-27 10:23:30.690 ERROR ./gxf/extensions/tensor_rt/tensor_rt_inference.cpp@472: Failed to parse ONNX file model.onnx
[component_container_mt-1] 2024-06-27 10:23:30.913 ERROR ./gxf/extensions/tensor_rt/tensor_rt_inference.cpp@287: Failed to create engine plan for model model.onnx.
[component_container_mt-1] 2024-06-27 10:23:30.913 WARN  gxf/std/entity_executor.cpp@495: Failed to start entity [ELLAEPFEDX_inference]
[component_container_mt-1] 2024-06-27 10:23:30.913 WARN  gxf/std/multi_thread_scheduler.cpp@342: Error while executing entity E140 named 'ELLAEPFEDX_inference': GXF_FAILURE
[component_container_mt-1] 2024-06-27 10:23:30.932 ERROR gxf/std/entity_executor.cpp@586: Entity [ELLAEPFEDX_inference] must be in Started, Tick Pending, Ticking or Idle stage before stopping. Current state is StartPending
[component_container_mt-1] 2024-06-27 10:23:31.142 ERROR gxf/std/entity_executor.cpp@210: Entity with eid 140 not found!
[component_container_mt-1] [WARN] [1719476611.142723551] [tensor_rt]: [NitrosNode] The heartbeat entity (eid=140) was stopped. The graph may have been terminated.

FoundationPose Tracking:

[component_container_mt-1] [INFO] [1719476145.322448832] [foundationpose_node]: [NitrosNode] Node was started
[component_container_mt-1] 2024-06-27 10:15:45.972 WARN  ./gxf/extensions/tensor_rt/tensor_rt_inference.cpp@155: TRT WARNING: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[component_container_mt-1] 2024-06-27 10:15:46.856 WARN  ./gxf/extensions/tensor_rt/tensor_rt_inference.cpp@155: TRT WARNING: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[component_container_mt-1] Could not open file
[component_container_mt-1] Could not open file
[component_container_mt-1] [ERROR] [1719476153.389427181] [TRT]: TRT ERROR: ModelImporter.cpp:733: Failed to parse ONNX model from file:
[component_container_mt-1] [ERROR] [1719476153.716013953] [tensor_rt]: Unable to read tensor shape info from TRT Model Engine or from ONNX file.
[component_container_mt-1] [WARN] [1719476153.716387685] [tensor_rt]: Failed to get block size from model, set to the default size: 67108864.
[component_container_mt-1] [INFO] [1719476153.717159740] [tensor_rt]: Tensors 67108864 bytes, num outputs 40 x tensors per output 3 = 120 blocks
[component_container_mt-1] [INFO] [1719476153.717556836] [tensor_rt]: [NitrosNode] Initializing and running GXF graph
[component_container_mt-1] [INFO] [1719476154.757134892] [tensor_rt]: [NitrosNode] Node was started
[component_container_mt-1] 2024-06-27 10:15:54.757 WARN  ./gxf/extensions/tensor_rt/tensor_rt_inference.cpp@281: Rebuilding CUDA engine /workspaces/isaac_ros-dev/isaac_ros_assets/models/synthetica_detr/sdetr_grasp.plan (forced by config). Note: this process may take up to several minutes.
[component_container_mt-1] 2024-06-27 10:16:16.914 WARN  ./gxf/extensions/tensor_rt/tensor_rt_inference.cpp@155: TRT WARNING: CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[component_container_mt-1] Could not open file model.onnx
[component_container_mt-1] Could not open file model.onnx
[component_container_mt-1] 2024-06-27 10:16:17.176 ERROR ./gxf/extensions/tensor_rt/tensor_rt_inference.cpp@151: TRT ERROR: ModelImporter.cpp:733: Failed to parse ONNX model from file: model.onnx
[component_container_mt-1] 2024-06-27 10:16:17.177 ERROR ./gxf/extensions/tensor_rt/tensor_rt_inference.cpp@472: Failed to parse ONNX file model.onnx
[component_container_mt-1] 2024-06-27 10:16:22.118 ERROR ./gxf/extensions/tensor_rt/tensor_rt_inference.cpp@287: Failed to create engine plan for model model.onnx.
[component_container_mt-1] 2024-06-27 10:16:22.119 WARN  gxf/std/entity_executor.cpp@495: Failed to start entity [GCATEVJKJU_inference]
[component_container_mt-1] 2024-06-27 10:16:22.119 WARN  gxf/std/multi_thread_scheduler.cpp@342: Error while executing entity E204 named 'GCATEVJKJU_inference': GXF_FAILURE
[component_container_mt-1] 2024-06-27 10:16:22.125 ERROR gxf/std/entity_executor.cpp@586: Entity [GCATEVJKJU_inference] must be in Started, Tick Pending, Ticking or Idle stage before stopping. Current state is StartPending
[component_container_mt-1] 2024-06-27 10:16:22.757 ERROR gxf/std/entity_executor.cpp@210: Entity with eid 204 not found!
[component_container_mt-1] [WARN] [1719476182.757892225] [tensor_rt]: [NitrosNode] The heartbeat entity (eid=204) was stopped. The graph may have been terminated.

PS: I think that it may be related to issues with file versions as now I can’t run the commands for install ros-humble-isaac-ros-foundationpose and ros-humble-isaac-ros-rtdetr without file size mismatch.

@Raffaello . I am trying to run foundationpose on realsense camera 435i. I have RTX A6000 GPU. My foundationpose node is started but it not detecting the mac and cheese box. I got the same box which is used in example. In rviz if add realsense camera topic I am able to see image but default camera frame which suppose to show pose that is not showing anything. My goal is to make issac ros founmdationPose work for my custom object using camera(realsense 435i). I have started with mac and cheese which is given then I will move to my custom object. Please feel free to ask for more information if you want. Thanks.