I am deploying a set of robots utilsiing the Isaac ROS packages, including isaac_ros_dnn_image_encoder and isaac_ros_tensor_rt.
I am using an Nvidia Orin Dev kit 64GB with an m.2 installed
Model: Jetson AGX Orin Developer Kit - Jetpack 5.1.2 [L4T 35.4.1]
Libraries:
CUDA: 11.4.315
CUDNN: 8.6.0.166
TensorRT: 8.5.2.2
VPI: 2.3.9
Vulkan: 1.3.204
OpenCV: 4.5.4 with CUDA:NO
On launch there are 2 sets of AI image processing/inference containers that spin up.
~30% of the time, a component will crash on launch:
e.g.
1709515396.9951015 [component_container_mt-11] NvMMLiteOpen : Block : BlockType = 261
1709515397.0986693 [component_container_mt-11] NvMMLiteBlockCreate : Block : BlockType = 261
1709515397.1022320 [component_container_mt-11] [INFO] [1709515397.101109772] [abc.panorama_server.video_h264_decoder]: [NitrosContext] Running application...
1709515397.1090574 [component_container_mt-11] [INFO] [1709515397.104371090] [abc.panorama_server.video_h264_decoder]: [NitrosNode] Starting a heartbeat timer (eid=17)
1709515397.1104555 [component_container_mt-11] [INFO] [1709515397.104604756] [abc.panorama_server.video_resize_node]: [NitrosContext] Loading application: '/tmp/isaac_ros_nitros/graphs/RUKDNOJEZN/RUKDNOJEZN.yaml'
1709515397.1112237 [component_container_mt-11] [INFO] [1709515397.104717110] [abc.panorama_server.video_dnn_encoder]: [NitrosNode] Initializing NitrosNode
1709515397.1119342 [component_container_mt-11] [INFO] [1709515397.105246908] [abc.panorama_server.video_h264_decoder]: Negotiating
1709515397.1126776 [component_container_mt-11] [INFO] [1709515397.106614124] [abc.panorama_server.video_dnn_encoder]: [NitrosNode] Starting NitrosNode
1709515397.1133666 [component_container_mt-11] [INFO] [1709515397.106669132] [abc.panorama_server.video_dnn_encoder]: [NitrosNode] Loading built-in preset extension specs
1709515397.1140604 [component_container_mt-11] e[1;31m2024-03-04 14:23:17.108 ERROR gxf/std/type_registry.cpp@48: Unknown type: nvidia::gxf::TensorRtInferencee[0m
1709515397.1147683 [component_container_mt-11] e[1;31m2024-03-04 14:23:17.108 ERROR gxf/std/yaml_file_loader.cpp@399: Could not add component of type 'nvidia::gxf::TensorRtInference' to entity.e[0m
1709515397.1154776 [component_container_mt-11] [ERROR] [1709515397.108336480] [abc.panorama_server.video_resize_node]: [NitrosNode] LoadApplication Error: GXF_FACTORY_UNKNOWN_CLASS_NAME
1709515397.1166997 [component_container_mt-11] terminate called after throwing an instance of 'std::runtime_error'
1709515397.1174448 [component_container_mt-11] what(): [NitrosNode] LoadApplication Error: GXF_FACTORY_UNKNOWN_CLASS_NAME
1709515397.2644863 [foxglove_bridge-1] [INFO] [1709515397.261071500] [abc.foxglove_bridge]: Subscribing to topic "/abc/detection_server/rts_image/apriltag_image_annotations" (foxglove_msgs/msg/ImageAnnotations) on channel 36
1709515397.2692885 [foxglove_bridge-1] [INFO] [1709515397.268544739] [abc.foxglove_bridge]: Subscribing to topic "/abc/detection_server/rts_image/bbox_image_annotations" (foxglove_msgs/msg/ImageAnnotations) on channel 35
1709515397.6883087 [detection_server-5] [INFO] [1709515397.687620387] [abc.detection_server]: Initialising Detection Server.
1709515397.6904640 [detection_server-5] [INFO] [1709515397.690162688] [abc.detection_server]: Detection Service Initialised.
1709515397.9373837 [ERROR] [component_container_mt-11]: process has died [pid 29954, exit code -6, cmd '/opt/ros/humble/lib/rclcpp_components/component_container_mt --ros-args -r __node:=tensor_rt_container -r __ns:=/abc/panorama_server'].
or another example:
abc-ai-run | [component_container_mt-7] [INFO] [1709676265.593694817] [abc.detection_server.rts_image_dnn_encoder]: [NitrosContext] Running application...
abc-ai-run | [component_container_mt-7] [INFO] [1709676265.595963651] [abc.detection_server.tensor_rt]: [NitrosContext] Loading application: '/tmp/isaac_ros_nitros/graphs/NYTVHSFZKR/NYTVHSFZKR.yaml'
abc-ai-run | [component_container_mt-7] [INFO] [1709676265.606229519] [abc.detection_server.tensor_rt]: [NitrosNode] Linking Nitros pub/sub to the loaded application
abc-ai-run | [component_container_mt-7] [ERROR] [1709676265.608126001] [abc.detection_server.tensor_rt]: [NitrosContext] GXFEntityFind Error: GXF_ENTITY_NOT_FOUND
abc-ai-run | [component_container_mt-7] [ERROR] [1709676265.608557010] [abc.detection_server.tensor_rt]: [NitrosContext] getCid Error: GXF_ENTITY_NOT_FOUND
abc-ai-run | [component_container_mt-7] [ERROR] [1709676265.608599794] [abc.detection_server.tensor_rt]: [NitrosNode] Failed to get the pointer of nvidia::gxf::DoubleBufferReceiver (inference/rx) for linking a NitrosSubscriber: GXF_ENTITY_NOT_FOUND
abc-ai-run | [component_container_mt-7] terminate called after throwing an instance of 'std::runtime_error'
abc-ai-run | [component_container_mt-7] what(): [NitrosNode] Failed to get the pointer of nvidia::gxf::DoubleBufferReceiver (inference/rx) for linking a NitrosSubscriber: GXF_ENTITY_NOT_FOUND
abc-ai-run | [detection_server-5] [INFO] [1709676266.443703563] [abc.detection_server]: Initialising Detection Server.
abc-ai-run | [detection_server-5] [INFO] [1709676266.516133822] [abc.detection_server]: Detection Service Initialised.
abc-ai-run | [ERROR] [component_container_mt-7]: process has died [pid 29149, exit code -6, cmd '/opt/ros/humble/lib/rclcpp_components/component_container_mt --ros-args -r __node:=tensor_rt_container -r __ns:=/abc/detection_server'].
Simply relaunching the container will eventually allow it to run without errors.
I am launching via a docker compose file based on the nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble image,
FROM nvcr.io/nvidia/isaac/ros:aarch64-ros2_humble_b7e1ed6c02a6fa3c1c7392479291c035
...
...
RUN apt-get update && apt-get install -y \
ros-humble-isaac-ros-common \
ros-humble-isaac-ros-dnn-image-encoder \
ros-humble-isaac-ros-tensor-rt \
ros-humble-isaac-ros-h264-decoder \
ros-humble-isaac-ros-image-pipeline \
ros-humble-isaac-ros-nitros \
the following volumes are mounted into the container
volumes:
- ${HOME}/.Xauthority:/home/admin/.Xauthority:rw
- /dev/*:/dev/*
- /etc/localtime:/etc/localtime:ro
- /usr/bin/tegrastats:/usr/bin/tegrastats
- /tmp/argus_socket:/tmp/argus_socket
- /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcusolver.so.11:/usr/local/cuda-11.4/targets/aarch64-linux/lib/libcusolver.so.11
- /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcusparse.so.11:/usr/local/cuda-11.4/targets/aarch64-linux/lib/libcusparse.so.11
- /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcurand.so.10:/usr/local/cuda-11.4/targets/aarch64-linux/lib/libcurand.so.10
- /usr/local/cuda-11.4/targets/aarch64-linux/lib/libnvToolsExt.so:/usr/local/cuda-11.4/targets/aarch64-linux/lib/libnvToolsExt.so
- /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcupti.so.11.4:/usr/local/cuda-11.4/targets/aarch64-linux/lib/libcupti.so.11.4
- /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcudla.so.1:/usr/local/cuda-11.4/targets/aarch64-linux/lib/libcudla.so.1
- /usr/local/cuda-11.4/targets/aarch64-linux/include/nvToolsExt.h:/usr/local/cuda-11.4/targets/aarch64-linux/include/nvToolsExt.h
- /usr/local/cuda-11.4/targets/aarch64-linux/lib/libcufft.so.10:/usr/local/cuda-11.4/targets/aarch64-linux/lib/libcufft.so.10
- /usr/lib/aarch64-linux-gnu/tegra:/usr/lib/aarch64-linux-gnu/tegra
- /usr/src/jetson_multimedia_api:/usr/src/jetson_multimedia_api
- /opt/nvidia/nsight-systems-cli:/opt/nvidia/nsight-systems-cli
- /opt/nvidia/vpi2:/opt/nvidia/vpi2
- /usr/share/vpi2:/usr/share/vpi2
A snippet from the launch file:
h264_decoder = ComposableNode(
name="video_h264_decoder",
package="isaac_ros_h264_decoder",
namespace=[LaunchConfiguration("ns"), "/panorama_server"],
plugin="nvidia::isaac_ros::h264_decoder::DecoderNode",
parameters=[
{
"input_height": 1080,
"input_width": 1920,
}
],
remappings=[
(
"image_compressed",
["/", LaunchConfiguration("ns"), "/", VIDEO_INPUT_TOPIC, "/", "h264"],
),
(
"image_uncompressed",
["/", LaunchConfiguration("ns"), "/", VIDEO_INPUT_TOPIC],
),
],
)
image_encoder_node = ComposableNode(
name="video_dnn_encoder",
namespace=[LaunchConfiguration("ns"), "/panorama_server"],
package="isaac_ros_dnn_image_encoder",
plugin="nvidia::isaac_ros::dnn_inference::DnnImageEncoderNode",
parameters=[
{
"input_image_width" : model_dimension_width,
"input_image_height" : model_dimension_height,
"network_image_width": model_dimension_width,
"network_image_height": model_dimension_height,
"image_mean": [0.0, 0.0, 0.0],
"image_stddev": [
PIXEL_SCALE_INVERSE,
PIXEL_SCALE_INVERSE,
PIXEL_SCALE_INVERSE,
],
}
],
remappings=[
("encoded_tensor", "tensor_pub"),
("image", ["/", LaunchConfiguration("ns"), "/", VIDEO_INPUT_TOPIC]),
],
)
image_resize_node = ComposableNode(
name="video_resize_node",
namespace=[LaunchConfiguration("ns"), "/panorama_server"],
package="isaac_ros_image_proc",
plugin="nvidia::isaac_ros::image_proc::ResizeNode",
parameters=[
{
"output_height" : model_dimension_height,
"output_width" : model_dimension_width,
"keep_aspect_ratio": False,
}
],
remappings=[
("image", ["/", LaunchConfiguration("ns"), "/", VIDEO_INPUT_TOPIC]),
("camera_info", ["/", LaunchConfiguration("ns"), "/camera_info"]),
("resize/image", ["/", LaunchConfiguration("ns"), "/", VIDEO_INPUT_TOPIC+"_resized"]),
("resize/camera_info", ["/", LaunchConfiguration("ns"), "/camera_info"+"_resized"]),
],
)
tensorrt_inference_node = ComposableNode(
name="tensor_rt",
namespace=[LaunchConfiguration("ns"), "/panorama_server"],
package="isaac_ros_tensor_rt",
plugin="nvidia::isaac_ros::dnn_inference::TensorRTNode",
parameters=[
{
"engine_file_path": model_engine_file_path,
"output_binding_names": [
"num_detections",
"detection_boxes",
"detection_scores",
"detection_classes",
],
"output_tensor_names": [
"num_detections",
"detection_boxes",
"detection_scores",
"detection_classes",
],
"input_tensor_names": ["input_tensor"],
"input_binding_names": ["input"],
"force_engine_update": False,
}
],
)
video_inference_container = ComposableNodeContainer(
name="tensor_rt_container",
namespace=[LaunchConfiguration("ns"), "/panorama_server"],
package="rclcpp_components",
executable="component_container_mt",
# The h264 image is received. It is then h264 decoded, resized, tensor encoded, and then inferenced.
composable_node_descriptions=[
h264_decoder,
image_resize_node,
image_encoder_node,
tensorrt_inference_node,
],
)