Unknown embedded device detected and slow first inference

I’m new to TensorRT and when I’m using Jetson Orin first inference is slow. I have two custom models and it happens with both of them (first inference time is 6 minutes and 2 minutes).

Some of the warnings I get when I run a simple script:

2023-08-28 03:26:10.501539: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:10.538141: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:10.538410: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:10.540753: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:10.541093: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:10.541218: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:11.131744: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:11.132048: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:11.132123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1708] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-08-28 03:26:11.132239: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:11.132361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1621] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 44820 MB memory: → device: 0, name: Orin, pci bus id: 0000:00:00.0, compute capability: 8.7
2023-08-28 03:26:27.488727: I tensorflow/compiler/tf2tensorrt/common/utils.cc:104] Linked TensorRT version: 8.5.2
2023-08-28 03:26:27.488989: I tensorflow/compiler/tf2tensorrt/common/utils.cc:106] Loaded TensorRT version: 8.5.2
2023-08-28 03:26:31.737074: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:1344] [TF-TRT] Sparse compute capability is enabled.
2023-08-28 03:26:33.351818: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:26:33.354820: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.

2023-08-28 03:28:40.624897: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.627248: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.629653: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.632364: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.635026: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.637646: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.640360: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:32:07.526010: W tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:6003] TF-TRT Warning: Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [1,0,2] is an empty tensor, which is not supported by TRT
2023-08-28 03:32:07.730802: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:1103] TF-TRT Warning: Engine creation for PartitionedCall/TRTEngineOp_000_000 failed. The native segment will be used instead. Reason: UNIMPLEMENTED: Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [1,0,2] is an empty tensor, which is not supported by TRT
2023-08-28 03:32:07.731073: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:936] TF-TRT Warning: Engine retrieval for input shapes: [[1,0,2], [1,0,2]] failed. Running native segment for PartitionedCall/TRTEngineOp_000_000
2023-08-28 03:32:07.758491: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:936] TF-TRT Warning: Engine retrieval for input shapes: [[1,0,2], [1,0,2]] failed. Running native segment for PartitionedCall/TRTEngineOp_000_000
{‘tf_op_layer_concat_18’: <tf.Tensor: shape=(1, 0, 12), dtype=float32, numpy=array(, shape=(1, 0, 12), dtype=float32)>}
Inference time: 0:05:44.262719
Inference time: 0:00:00.021712

Are these warnings related to the speed of first inference?

Hi,

Could you share the details about your board and JetPack version?
Do you use Orin or Orin Nano?

Thanks.

Hi,

If you are using Orin 64GB and Jetpack 5.1.2, the warning is expected and the fix will be available in TensorRT 8.6.

It is a harmless warning, TensorRT should work without an issue.
Please let us know if TensorRT is not working in your environment.

Thanks.

1 Like

Hi,

I’m using NVIDIA®Jetson AGX Orin™ 64GB Developer Kit

Jetpack version (output of sudo apt-cache show nvidia-jetpack):

Package: nvidia-jetpack
Version: 5.1-b147
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 5.1-b147), nvidia-jetpack-dev (= 5.1-b147)

So I guess everything is ok then?

Hi,

Yes, the warning will be fixed in JetPack 6/TensorRT 8.6 but everything can work normally in JetPack 5+TensorRT 8.5.
Sorry for the inconvenience.

Thanks.

Hello @AastaLLL ,

When I run the command “python3 deepstream_test_1.py <sample_video.h264>” on my Orin device, I receive a TRT warning which is running continuously providing output after couple of minutes.
Asked this question, as this issue isn’ t occurred in the other Orin device(same TRT version) where the output is displayed immediately without the warning mentioned below. Please let me know if this is a problem with TRT 8.5 or some other?

WARNING: [TRT]: Unknown embedded device detected. Using 59660MiB as the allocation cap for memory on embedded devices.