TAO Converter Provide three optimization profiles for pointpillar

Dhyaan · February 21, 2024, 5:36am

Hello, I am trying to follow the instructions on how to use the pointpillars model with ROS2 here: GitHub - NVIDIA-AI-IOT/ros2_tao_pointpillars: ROS2 node for 3D object detection using TAO-PointPillars.

As part of the instructions I have to use tao-converter to generate a TensorRT engine from the model.
I am using the deployable pointpillars model from PointPillarNet | NVIDIA NGC, and this is the command I am running to generate the engine:

./tao-converter  -k tlt_encode  \
               -e ~/trt.fp16.engine \
               -p points,1x204800x4,1x204800x4,1x204800x4 \
               -p num_points,1,1,1 \
               -t fp16 \
               ~/pointpillars_deployable.etlt

The output of which is:

Please provide three optimization profiles via -p <input_name>,<min_shape>,<opt_shape>,<max_shape>, where each shape has `x` as delimiter, e.g., NxC, NxCxHxW, NxCxDxHxW, etc.
Aborted (core dumped)

• Hardware (T4/V100/Xavier/Nano/etc): Jetson Nano
TensorRT version: 8.2
TAO Converter version: v3.21.11_trt8.0_aarch64

Thank you

Morganh · February 22, 2024, 7:15am

Could you try this version? I can run your command in a Jetson NX device successfully.
wget --content-disposition 'https://api.ngc.nvidia.com/v2/resources/org/nvidia/team/tao/tao-converter/v4.0.0_trt8.5.2.2_aarch64/files?redirect=true&path=tao-converter' -O tao-converter

Refer to TAO Converter | NVIDIA NGC.

BTW, if you are using latest TAO, the exporting will generate onnx file instead of etlt file now. Then please use trtexec instead. Refer to TRTEXEC with PointPillars - NVIDIA Docs.

Dhyaan · February 23, 2024, 2:03am

The command runs now with that version but ends up giving me an error “Error Code 10: Internal Error (Could not find any implementation for node ConvTranspose_382 + BatchNormalization_383 + Relu_384.)”

I already tried increasing the workspace size but still have the same error.

Full logs:

jetson@nano:~/Downloads$ ./tao-converter-forum -k tlt_encode -e ~/trt.fp16forum.engine -p points,1x204800x4,1x204800x4,1x204800x4 -p num_points,1,1,1 -t fp16 -w 1073741824 pointpillars_deployable.etlt 
[INFO] [MemUsageChange] Init CUDA: CPU +230, GPU +0, now: CPU 248, GPU 3501 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 248 MiB, GPU 3530 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 277 MiB, GPU 3560 MiB
[INFO] ----------------------------------------------------------------
[INFO] Input filename:   /tmp/fileCJNmAu
[INFO] ONNX IR version:  0.0.8
[INFO] Opset version:    11
[INFO] Producer name:    
[INFO] Producer version: 
[INFO] Domain:           
[INFO] Model version:    0
[INFO] Doc string:       
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[INFO] No importer registered for op: VoxelGeneratorPlugin. Attempting to import as plugin.
[INFO] Searching for plugin: VoxelGeneratorPlugin, plugin_version: 1, plugin_namespace: 
[INFO] Successfully created plugin: VoxelGeneratorPlugin
[INFO] No importer registered for op: PillarScatterPlugin. Attempting to import as plugin.
[INFO] Searching for plugin: PillarScatterPlugin, plugin_version: 1, plugin_namespace: 
[INFO] Successfully created plugin: PillarScatterPlugin
[INFO] No importer registered for op: DecodeBbox3DPlugin. Attempting to import as plugin.
[INFO] Searching for plugin: DecodeBbox3DPlugin, plugin_version: 1, plugin_namespace: 
[INFO] Successfully created plugin: DecodeBbox3DPlugin
[WARNING] Output type must be INT32 for shape outputs
[INFO] Detected input dimensions from the model: (-1, 204800, 4)
[INFO] Detected input dimensions from the model: (-1)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 204800, 4) for input: points
[INFO] Using optimization profile opt shape: (1, 204800, 4) for input: points
[INFO] Using optimization profile max shape: (1, 204800, 4) for input: points
[INFO] Using optimization profile min shape: (1) for input: num_points
[INFO] Using optimization profile opt shape: (1) for input: num_points
[INFO] Using optimization profile max shape: (1) for input: num_points
[INFO] ---------- Layers Running on DLA ----------
[INFO] ---------- Layers Running on GPU ----------
[INFO] [GpuLayer] 678 + (Unnamed Layer* 9) [Shuffle]
[INFO] [GpuLayer] VoxelGeneratorPlugin_0
[INFO] [GpuLayer] Reshape_292
[INFO] [GpuLayer] MatMul_293
[INFO] [GpuLayer] Transpose_294 + (Unnamed Layer* 17) [Shuffle]
[INFO] [GpuLayer] BatchNormalization_295 + Relu_296
[INFO] [GpuLayer] MaxPool_297
[INFO] [GpuLayer] (Unnamed Layer* 32) [Shuffle] + Transpose_298
[INFO] [GpuLayer] Reshape_300
[INFO] [GpuLayer] PillarScatterPlugin_0
[INFO] [GpuLayer] Conv_374 + Relu_375
[INFO] [GpuLayer] Conv_376 + Relu_377
[INFO] [GpuLayer] Conv_378 + Relu_379
[INFO] [GpuLayer] Conv_380 + Relu_381
[INFO] [GpuLayer] ConvTranspose_382 + BatchNormalization_383 + Relu_384
[INFO] [GpuLayer] Conv_400 + Relu_401
[INFO] [GpuLayer] Conv_402 + Relu_403
[INFO] [GpuLayer] Conv_404 + Relu_405
[INFO] [GpuLayer] Conv_406 + Relu_407
[INFO] [GpuLayer] Conv_408 + Relu_409
[INFO] [GpuLayer] Conv_410 + Relu_411
[INFO] [GpuLayer] ConvTranspose_412 + BatchNormalization_413 + Relu_414
[INFO] [GpuLayer] Conv_430 + Relu_431
[INFO] [GpuLayer] Conv_432 + Relu_433
[INFO] [GpuLayer] Conv_434 + Relu_435
[INFO] [GpuLayer] Conv_436 + Relu_437
[INFO] [GpuLayer] Conv_438 + Relu_439
[INFO] [GpuLayer] Conv_440 + Relu_441
[INFO] [GpuLayer] ConvTranspose_442 + BatchNormalization_443 + Relu_444
[INFO] [GpuLayer] 532 copy
[INFO] [GpuLayer] 577 copy
[INFO] [GpuLayer] 622 copy
[INFO] [GpuLayer] Conv_450 || Conv_446 || Conv_447
[INFO] [GpuLayer] Transpose_448
[INFO] [GpuLayer] Transpose_449
[INFO] [GpuLayer] Transpose_451
[INFO] [GpuLayer] DecodeBbox3DPlugin_0
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +231, now: CPU 442, GPU 3808 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +241, GPU +81, now: CPU 683, GPU 3889 (MiB)
[INFO] Local timing cache in use. Profiling results in this builder pass will not be stored.
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[ERROR] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node ConvTranspose_382 + BatchNormalization_383 + Relu_384.)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Thanks for the help

Morganh · February 23, 2024, 2:19am

Could you please try more workspace memory?

Dhyaan · February 23, 2024, 2:31am

Still getting the same error with-w 2793741824

Morganh · February 23, 2024, 2:45am

Could you try more value? Maybe you can refer to $free -h.

Dhyaan · February 26, 2024, 3:25am

I’ve tried it with the total memory available and still get the same error

Morganh · February 26, 2024, 5:05am

To narrow down, please refer to tao_toolkit_recipes/tao_forum_faq/FAQ.md at main · NVIDIA-AI-IOT/tao_toolkit_recipes · GitHub to change the etlt file to onnx file.
Then please refer to TRTEXEC with PointPillars - NVIDIA Docs to use trtexec to generate tensorrt engine.

Morganh · February 26, 2024, 7:35am

Please note that it is needed to run this docker in a dgpu machine instead of Jetson nano.

BTW, you can also run trtexec with --verbose flag.

Morganh · March 6, 2024, 3:10am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

More, please upgrade to TensorRT8.6 or latest Jetpack.
If not working, we can consider to cut the model to 2 trt engines or not do layer merging.

system · March 25, 2024, 5:52am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tao-converter not working for custom pointpillars TAO Toolkit tao	9	849	June 6, 2023
GazeNet - Tao_converter [ERROR] input_left_images:0: number of dimensions is 4 but profile 0 has 3 TAO Toolkit	5	336	July 12, 2023
Convert model to Jetson Error during model export step in TAO notebook TAO Toolkit	21	2047	February 15, 2022
Incorrect pointpillar inference results TAO Toolkit	22	455	February 27, 2024
Tao-converter [ERROR] input_rgb: number of dimensions is 5 but profile 0 has 4 TAO Toolkit	14	1568	February 22, 2022
The effect is very poor when converted to trt TAO Toolkit tensorrt , ubuntu	61	1303	September 11, 2023
LPRNet can't use exported engine file TAO Toolkit	18	2512	December 28, 2021
Trtexec convert onnx to engine fails TAO Toolkit	14	1222	October 30, 2023
[ERROR] Model has dynamic shape but no optimization profile specified. Aborted (core dumped) TAO Toolkit	30	2030	December 13, 2021
How to convert etlt to trt on jetson orin TAO Toolkit ubuntu , jetson-inference	10	631	August 4, 2023

TAO Converter Provide three optimization profiles for pointpillar

Related topics