I’m new to TensorRT and when I’m using Jetson Orin first inference is slow. I have two custom models and it happens with both of them (first inference time is 6 minutes and 2 minutes).
Some of the warnings I get when I run a simple script:
2023-08-28 03:26:10.501539: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:10.538141: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:10.538410: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:10.540753: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:10.541093: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:10.541218: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:11.131744: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:11.132048: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:11.132123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1708] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-08-28 03:26:11.132239: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-08-28 03:26:11.132361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1621] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 44820 MB memory: → device: 0, name: Orin, pci bus id: 0000:00:00.0, compute capability: 8.7
2023-08-28 03:26:27.488727: I tensorflow/compiler/tf2tensorrt/common/utils.cc:104] Linked TensorRT version: 8.5.2
2023-08-28 03:26:27.488989: I tensorflow/compiler/tf2tensorrt/common/utils.cc:106] Loaded TensorRT version: 8.5.2
2023-08-28 03:26:31.737074: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:1344] [TF-TRT] Sparse compute capability is enabled.
2023-08-28 03:26:33.351818: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:26:33.354820: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.624897: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.627248: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.629653: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.632364: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.635026: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.637646: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:28:40.640360: W tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:83] TF-TRT Warning: DefaultLogger Unknown embedded device detected. Using 59656MiB as the allocation cap for memory on embedded devices.
2023-08-28 03:32:07.526010: W tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:6003] TF-TRT Warning: Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [1,0,2] is an empty tensor, which is not supported by TRT
2023-08-28 03:32:07.730802: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:1103] TF-TRT Warning: Engine creation for PartitionedCall/TRTEngineOp_000_000 failed. The native segment will be used instead. Reason: UNIMPLEMENTED: Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [1,0,2] is an empty tensor, which is not supported by TRT
2023-08-28 03:32:07.731073: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:936] TF-TRT Warning: Engine retrieval for input shapes: [[1,0,2], [1,0,2]] failed. Running native segment for PartitionedCall/TRTEngineOp_000_000
2023-08-28 03:32:07.758491: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:936] TF-TRT Warning: Engine retrieval for input shapes: [[1,0,2], [1,0,2]] failed. Running native segment for PartitionedCall/TRTEngineOp_000_000
{‘tf_op_layer_concat_18’: <tf.Tensor: shape=(1, 0, 12), dtype=float32, numpy=array(, shape=(1, 0, 12), dtype=float32)>}
Inference time: 0:05:44.262719
Inference time: 0:00:00.021712
Are these warnings related to the speed of first inference?