TAO Converter Provide three optimization profiles for pointpillar

Hello, I am trying to follow the instructions on how to use the pointpillars model with ROS2 here: GitHub - NVIDIA-AI-IOT/ros2_tao_pointpillars: ROS2 node for 3D object detection using TAO-PointPillars.

As part of the instructions I have to use tao-converter to generate a TensorRT engine from the model.
I am using the deployable pointpillars model from PointPillarNet | NVIDIA NGC, and this is the command I am running to generate the engine:

./tao-converter  -k tlt_encode  \
               -e ~/trt.fp16.engine \
               -p points,1x204800x4,1x204800x4,1x204800x4 \
               -p num_points,1,1,1 \
               -t fp16 \
               ~/pointpillars_deployable.etlt

The output of which is:

Please provide three optimization profiles via -p <input_name>,<min_shape>,<opt_shape>,<max_shape>, where each shape has `x` as delimiter, e.g., NxC, NxCxHxW, NxCxDxHxW, etc.
Aborted (core dumped)

• Hardware (T4/V100/Xavier/Nano/etc): Jetson Nano
TensorRT version: 8.2
TAO Converter version: v3.21.11_trt8.0_aarch64

Thank you

Could you try this version? I can run your command in a Jetson NX device successfully.
wget --content-disposition 'https://api.ngc.nvidia.com/v2/resources/org/nvidia/team/tao/tao-converter/v4.0.0_trt8.5.2.2_aarch64/files?redirect=true&path=tao-converter' -O tao-converter

Refer to TAO Converter | NVIDIA NGC.

BTW, if you are using latest TAO, the exporting will generate onnx file instead of etlt file now. Then please use trtexec instead. Refer to TRTEXEC with PointPillars - NVIDIA Docs.

The command runs now with that version but ends up giving me an error “Error Code 10: Internal Error (Could not find any implementation for node ConvTranspose_382 + BatchNormalization_383 + Relu_384.)”

I already tried increasing the workspace size but still have the same error.

Full logs:

jetson@nano:~/Downloads$ ./tao-converter-forum -k tlt_encode -e ~/trt.fp16forum.engine -p points,1x204800x4,1x204800x4,1x204800x4 -p num_points,1,1,1 -t fp16 -w 1073741824 pointpillars_deployable.etlt 
[INFO] [MemUsageChange] Init CUDA: CPU +230, GPU +0, now: CPU 248, GPU 3501 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 248 MiB, GPU 3530 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 277 MiB, GPU 3560 MiB
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/fileCJNmAu
[INFO] ONNX IR version:  0.0.8
[INFO] Opset version:    11
[INFO] Producer name:    
[INFO] Producer version: 
[INFO] Domain:           
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[INFO] No importer registered for op: VoxelGeneratorPlugin. Attempting to import as plugin.
[INFO] Searching for plugin: VoxelGeneratorPlugin, plugin_version: 1, plugin_namespace: 
[INFO] Successfully created plugin: VoxelGeneratorPlugin
[INFO] No importer registered for op: PillarScatterPlugin. Attempting to import as plugin.
[INFO] Searching for plugin: PillarScatterPlugin, plugin_version: 1, plugin_namespace: 
[INFO] Successfully created plugin: PillarScatterPlugin
[INFO] No importer registered for op: DecodeBbox3DPlugin. Attempting to import as plugin.
[INFO] Searching for plugin: DecodeBbox3DPlugin, plugin_version: 1, plugin_namespace: 
[INFO] Successfully created plugin: DecodeBbox3DPlugin
[WARNING] Output type must be INT32 for shape outputs
[INFO] Detected input dimensions from the model: (-1, 204800, 4)
[INFO] Detected input dimensions from the model: (-1)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 204800, 4) for input: points
[INFO] Using optimization profile opt shape: (1, 204800, 4) for input: points
[INFO] Using optimization profile max shape: (1, 204800, 4) for input: points
[INFO] Using optimization profile min shape: (1) for input: num_points
[INFO] Using optimization profile opt shape: (1) for input: num_points
[INFO] Using optimization profile max shape: (1) for input: num_points
[INFO] ---------- Layers Running on DLA ----------
[INFO] ---------- Layers Running on GPU ----------
[INFO] [GpuLayer] 678 + (Unnamed Layer* 9) [Shuffle]
[INFO] [GpuLayer] VoxelGeneratorPlugin_0
[INFO] [GpuLayer] Reshape_292
[INFO] [GpuLayer] MatMul_293
[INFO] [GpuLayer] Transpose_294 + (Unnamed Layer* 17) [Shuffle]
[INFO] [GpuLayer] BatchNormalization_295 + Relu_296
[INFO] [GpuLayer] MaxPool_297
[INFO] [GpuLayer] (Unnamed Layer* 32) [Shuffle] + Transpose_298
[INFO] [GpuLayer] Reshape_300
[INFO] [GpuLayer] PillarScatterPlugin_0
[INFO] [GpuLayer] Conv_374 + Relu_375
[INFO] [GpuLayer] Conv_376 + Relu_377
[INFO] [GpuLayer] Conv_378 + Relu_379
[INFO] [GpuLayer] Conv_380 + Relu_381
[INFO] [GpuLayer] ConvTranspose_382 + BatchNormalization_383 + Relu_384
[INFO] [GpuLayer] Conv_400 + Relu_401
[INFO] [GpuLayer] Conv_402 + Relu_403
[INFO] [GpuLayer] Conv_404 + Relu_405
[INFO] [GpuLayer] Conv_406 + Relu_407
[INFO] [GpuLayer] Conv_408 + Relu_409
[INFO] [GpuLayer] Conv_410 + Relu_411
[INFO] [GpuLayer] ConvTranspose_412 + BatchNormalization_413 + Relu_414
[INFO] [GpuLayer] Conv_430 + Relu_431
[INFO] [GpuLayer] Conv_432 + Relu_433
[INFO] [GpuLayer] Conv_434 + Relu_435
[INFO] [GpuLayer] Conv_436 + Relu_437
[INFO] [GpuLayer] Conv_438 + Relu_439
[INFO] [GpuLayer] Conv_440 + Relu_441
[INFO] [GpuLayer] ConvTranspose_442 + BatchNormalization_443 + Relu_444
[INFO] [GpuLayer] 532 copy
[INFO] [GpuLayer] 577 copy
[INFO] [GpuLayer] 622 copy
[INFO] [GpuLayer] Conv_450 || Conv_446 || Conv_447
[INFO] [GpuLayer] Transpose_448
[INFO] [GpuLayer] Transpose_449
[INFO] [GpuLayer] Transpose_451
[INFO] [GpuLayer] DecodeBbox3DPlugin_0
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +231, now: CPU 442, GPU 3808 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +241, GPU +81, now: CPU 683, GPU 3889 (MiB)
[INFO] Local timing cache in use. Profiling results in this builder pass will not be stored.
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[ERROR] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node ConvTranspose_382 + BatchNormalization_383 + Relu_384.)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Thanks for the help

Could you please try more workspace memory?

Still getting the same error with-w 2793741824

Could you try more value? Maybe you can refer to $free -h.

I’ve tried it with the total memory available and still get the same error

To narrow down, please refer to tao_toolkit_recipes/tao_forum_faq/FAQ.md at main · NVIDIA-AI-IOT/tao_toolkit_recipes · GitHub to change the etlt file to onnx file.
Then please refer to TRTEXEC with PointPillars - NVIDIA Docs to use trtexec to generate tensorrt engine.

Please note that it is needed to run this docker in a dgpu machine instead of Jetson nano.

BTW, you can also run trtexec with --verbose flag.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

More, please upgrade to TensorRT8.6 or latest Jetpack.
If not working, we can consider to cut the model to 2 trt engines or not do layer merging.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.