I deployed to tensorRT FP32 from onnx with opset 12 by setting the target opset option to 12. Then, I rerun inference again. I will post the inference results soon.
In the meanwhile, I am posting the output log of the conversion from ONNX to tensorRT FP32.
2024-03-20 15:32:05,979 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-03-20 15:32:06,061 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.2.0-deploy
2024-03-20 15:32:06,113 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
2024-03-20 15:32:32,870 [TAO Toolkit] [INFO] matplotlib.font_manager: generated new fontManager
Loading uff directly from the package source code
Loading uff directly from the package source code
2024-03-20 15:32:34,091 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.common.logging.status_logging 198: Log file already exists at /workspace/tao-experiments-birds/faster_rcnn_20231218_BestExperiment1_ac42/status.json
2024-03-20 15:32:34,091 [TAO Toolkit] [INFO] root 174: Starting faster_rcnn gen_trt_engine.
[03/20/2024-15:32:34] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 32, GPU 204 (MiB)
[03/20/2024-15:33:19] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +889, GPU +174, now: CPU 997, GPU 378 (MiB)
2024-03-20 15:33:19,264 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.faster_rcnn.engine_builder 137: Parsing ONNX model
2024-03-20 15:33:19,341 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.faster_rcnn.engine_builder 96: ONNX model inputs:
2024-03-20 15:33:19,342 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.faster_rcnn.engine_builder 97: Input 0: input_image.
2024-03-20 15:33:19,342 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.faster_rcnn.engine_builder 98: [0, 3, 613, 418].
[03/20/2024-15:33:19] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:372: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/20/2024-15:33:19] [TRT] [W] parsers/onnx/onnx2trt_utils.cpp:400: One or more weights outside the range of INT32 was clamped
[03/20/2024-15:33:19] [TRT] [I] No importer registered for op: ProposalDynamic. Attempting to import as plugin.
[03/20/2024-15:33:19] [TRT] [I] Searching for plugin: ProposalDynamic, plugin_version: 1, plugin_namespace:
[03/20/2024-15:33:19] [TRT] [F] Validation failed: libNamespace == nullptr
/workspace/trt_oss_src/TensorRT/plugin/proposalPlugin/proposalPlugin.cpp:528
[03/20/2024-15:33:19] [TRT] [E] std::exception
[03/20/2024-15:33:19] [TRT] [I] Successfully created plugin: ProposalDynamic
[03/20/2024-15:33:19] [TRT] [F] Validation failed: libNamespace == nullptr
/workspace/trt_oss_src/TensorRT/plugin/proposalPlugin/proposalPlugin.cpp:528
[03/20/2024-15:33:19] [TRT] [E] std::exception
[03/20/2024-15:33:19] [TRT] [I] No importer registered for op: CropAndResizeDynamic. Attempting to import as plugin.
[03/20/2024-15:33:19] [TRT] [I] Searching for plugin: CropAndResizeDynamic, plugin_version: 1, plugin_namespace:
[03/20/2024-15:33:19] [TRT] [I] Successfully created plugin: CropAndResizeDynamic
[03/20/2024-15:33:19] [TRT] [I] No importer registered for op: NMSDynamic_TRT. Attempting to import as plugin.
[03/20/2024-15:33:19] [TRT] [I] Searching for plugin: NMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[03/20/2024-15:33:19] [TRT] [W] parsers/onnx/builtin_op_importers.cpp:5219: Attribute isBatchAgnostic not found in plugin node! Ensure that the plugin creator has a default value defined or the engine may fail to build.
[03/20/2024-15:33:19] [TRT] [I] Successfully created plugin: NMSDynamic_TRT
2024-03-20 15:33:19,754 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.faster_rcnn.engine_builder 154: Network Description
2024-03-20 15:33:19,754 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.faster_rcnn.engine_builder 156: Input 'input_image' with shape (-1, 3, 613, 418) and dtype DataType.FLOAT
2024-03-20 15:33:19,754 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.faster_rcnn.engine_builder 158: Output 'nms_out' with shape (-1, 1, 100, 7) and dtype DataType.FLOAT
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.faster_rcnn.engine_builder 158: Output 'nms_out_1' with shape (-1, 1, 1, 1) and dtype DataType.FLOAT
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.cv.faster_rcnn.engine_builder 160: dynamic batch size handling
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 150: TensorRT engine build configurations:
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 163:
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 179: BuilderFlag.TF32
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 195:
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 197: Note: max representabile value is 2,147,483,648 bytes or 2GB.
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 199: MemoryPoolType.WORKSPACE = 2147483648 bytes
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 201: MemoryPoolType.DLA_MANAGED_SRAM = 0 bytes
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 203: MemoryPoolType.DLA_LOCAL_DRAM = 1073741824 bytes
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 205: MemoryPoolType.DLA_GLOBAL_DRAM = 536870912 bytes
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 207:
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 209: PreviewFeature.FASTER_DYNAMIC_SHAPES_0805
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 211: PreviewFeature.DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805
2024-03-20 15:33:19,755 [TAO Toolkit] [INFO] nvidia_tao_deploy.engine.builder 215: Tactic Sources = 31
[03/20/2024-15:33:19] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
[03/20/2024-15:33:19] [TRT] [F] Validation failed: libNamespace == nullptr
/workspace/trt_oss_src/TensorRT/plugin/proposalPlugin/proposalPlugin.cpp:528
[03/20/2024-15:33:19] [TRT] [E] std::exception
[03/20/2024-15:33:22] [TRT] [I] Graph optimization time: 2.31754 seconds.
[03/20/2024-15:33:22] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 1258, GPU 388 (MiB)
[03/20/2024-15:33:22] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 1260, GPU 398 (MiB)
[03/20/2024-15:33:22] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
[03/20/2024-15:33:22] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[03/20/2024-15:33:22] [TRT] [F] Validation failed: libNamespace == nullptr
/workspace/trt_oss_src/TensorRT/plugin/proposalPlugin/proposalPlugin.cpp:528
[03/20/2024-15:33:22] [TRT] [E] std::exception
[03/20/2024-15:36:12] [TRT] [I] Detected 1 inputs and 2 output network tensors.
[03/20/2024-15:36:12] [TRT] [F] Validation failed: libNamespace == nullptr
/workspace/trt_oss_src/TensorRT/plugin/proposalPlugin/proposalPlugin.cpp:528
[03/20/2024-15:36:12] [TRT] [E] std::exception
[03/20/2024-15:36:12] [TRT] [I] Total Host Persistent Memory: 282912
[03/20/2024-15:36:12] [TRT] [I] Total Device Persistent Memory: 1062912
[03/20/2024-15:36:12] [TRT] [I] Total Scratch Memory: 5332224
[03/20/2024-15:36:12] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 139 MiB, GPU 354 MiB
[03/20/2024-15:36:12] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 169 steps to complete.
[03/20/2024-15:36:12] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 7.63265ms to assign 8 blocks to 169 nodes requiring 391385088 bytes.
[03/20/2024-15:36:12] [TRT] [I] Total Activation Memory: 391384064
[03/20/2024-15:36:12] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1492, GPU 604 (MiB)
[03/20/2024-15:36:12] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1492, GPU 614 (MiB)
[03/20/2024-15:36:12] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +89, GPU +191, now: CPU 89, GPU 191 (MiB)
Export finished successfully.
What I am particularly worried about is this warning:
[03/20/2024-15:33:22] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
Does this mean that the model is not able to get it to FP32?