Incorrect pointpillar inference results

Actually this is not directly but indirectly related to TAO as what I’m gonna describe here currently has nothing to do with training my own pointpillarnet model using TAO toolkit, but I downloaded the pretrained model on NGC and used it to produce a .engine file for inference.

Description:
First, I downloaded the pretrained model, pointpillars_deployable.etlt, from NGC.

PointPillarNet

After that, I ran the following nvidia docker to continue as suggested:

docker run --runtime=nvidia -it --rm -v /home/averai:/averai nvcr.io/nvidia/tao/tao-toolkit:5.1.0-pyt /bin/bash

In the docker, the desired .engine file for inference was produced using tao-converter:

./tao-converter -k tlt_encode -e output.engine -p points,1x204800x4,1x204800x4,1x204800x4 -p num_points,1,1,1 -t fp16 pointpillars_deployable.etlt

I used git clone to pull the 2 following code packs, respectively:

viz_3Dbbox_ros2_pointpillars

tao_toolkit_recipes

Both packs are related to each other as run_all_pcs.py actually calls the built ./pointpillar in tao_toolkit_recipes.

I followed the instruction here to run the following commands:

cd tao_pointpillars/tensorrt_sample/test
mkdir build
cd build
cmake … -DCUDA_VERSION=$CUDA_VERSION
make -j8

image

After the successful make, I copied the .engine file and a .bin file to the build folder.
The .bin file I referred to was downloaded 2 months ago when I was trying to train a pointpillar model using TAO toolkit for the first time. It was processed in order to retrieve FOV-only LIDAR points from 360-degree LIDAR points according to the comment of gen_lidar_points.py.

I then ran the following command to do inference.

./pointpillars -e output.engine -l ./input_bin/000000.bin -t 0.01 -c Vehicle,Pedestrain,Cyclist -n 4096 -p -d fp16

It worked and saved the detection result in a .txt file but the result looked strange.
I drew the bounding boxes by using viwer.py from viz_3Dbbox_ros2_pointpillars folder and both the said .bin and .txt files are treated as its input. The drawn bboxes are obviously incorrect.

Both visualized images were made by running viewer.py using the same .bin file.
The left one shows the annotated bboxes of the pedestrians while the right one shows the bboxes
acquired by inference. One of both boxes was even considered a vehicle.

To sum up, all I did was:

  1. .bin and .etlt model preparation
  2. .engine model conversion
  3. Building existing codes and running commands according to the instructions.

I didn’t make any modifications to all the content that’s related to running pointpillar inference.

Update

I used tao-converter to once again convert a .engine file with the same command:

./tao-converter -k tlt_encode -e output.engine -p points,1x204800x4,1x204800x4,1x204800x4 -p num_points,1,1,1 -t fp16 pointpillars_deployable.etlt

This time I noticed that there were some warning messages.

On seeing this, I converted the file using FP32 instead of FP16 and the warning messages were gone.

Then I ran ./pointpillar inference again. The visualized result looks much better although some bboxes are still strange, but I guess that’s because all the bboxes, including those with very low confidence, are also drawn.

Hi @silentjcr
Thanks for the info. This should be related to the TensorRT version. The 8.6.1.6 should be working for fp16. Please refer to The effect is very poor when converted to trt - #62 by Morganh.

You can also leverage latest tao pytorch 5.2.0 docker nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0
Refer to tao_pytorch_backend/docker/Dockerfile at main · NVIDIA/tao_pytorch_backend · GitHub and TAO Toolkit | NVIDIA NGC

1 Like

Okay, may try this docker out later.

By the way, is there any tool that converts the input .tlt model into .etlt format or should I just use trtexec in the docker?

Aside from the .tlt file generated during training, I only have .onnx and .trt file in hand at the moment and it seems that either of both can’t be used as the input for tao-converter to generate a .engine file.

Instead I used trtexec and ran the following command:

trtexec --onnx=/Jefferson/tao_env_trt_8531/checkpoint_epoch_60.onnx
–maxShapes=points:1x25000x4,num_points:1
–minShapes=points:1x25000x4,num_points:1
–optShapes=points:1x25000x4,num_points:1
–fp16
–saveEngine=/Jefferson/tao_env_trt_8531/checkpoint_epoch_60.engine

I had to modify the shapes, which were supposed to be 204800 instead of 25000, so that it could work. Not sure if it has something to do with the fact that I modified the point_cloud_range in my pointpillars.yaml for training on TAO toolkit 5.2.0.

There were no error messages when running the above command in the 5.1.0 docker.

Since TAO5.0, the training result will be .pth model instead of .tlt model. And the exporting result will be .onnx file instead of .etlt model.
So, you can deprecate tao-converter tool. Just use trtexec instead.

Then please refer to TRTEXEC with PointPillars - NVIDIA Docs to use trtexec to generate tensorrt engine.

For previous .etlt model in ngc, you can use below way to change the etlt file to onnx file.
See tao_toolkit_recipes/tao_forum_faq/FAQ.md at main · NVIDIA-AI-IOT/tao_toolkit_recipes · GitHub

1 Like

So in terms of converting my own trained models into .etlt, I no longer need tao-converter for this?

The version of TAO toolkit I’ve been using since last December is 5.2.0.
The generated pointpillar models during training are in .tlt format, and I get one .onnx and one .trt file from export.

Yep, this is exactly what I saw and I ran trtexect afterwards. It’s just that I couldn’t directly use the value 204800 shown in the example, so I guess point_cloud_range being modified may be the reason as I didn’t really change other values in the pointpillars.yaml iirc.

But I failed to use it to run pointpillars successfully.

Could not open file
Could not open file
trt_infer: ModelImporter.cpp:688: Failed to parse ONNX model from file:
: Failed to parse onnx model file, please check the onnx version and trt support op!

Got it.

For .etlt model, you can still use tao-converter to generate tensorrt engine.

The 204800 is mentioned in TRTEXEC with PointPillars - NVIDIA Docs . For 25000, is it changed on your side?

Okay.

I couldn’t convert the model using 204800.
I had to changed it to 25000 so that it could work, but currently the converted .engine file out of trtexec couldn’t be used for pointpillar inference in docker 5.1.0.

Is this model from ngc?

Nope, I referred to my own trained model.

You can try to re-export a new onnx model after changing max_points_num in export yaml file. Then check again.
Refer to tao_pytorch_backend/nvidia_tao_pytorch/pointcloud/pointpillars/tools/export/simplifier_onnx.py at 6f802c93a32aad69a155c86d941c1ea84fe2fc4e · NVIDIA/tao_pytorch_backend · GitHub and tao_pytorch_backend/nvidia_tao_pytorch/pointcloud/pointpillars/tools/cfgs/pointpillar_general.yaml at 6f802c93a32aad69a155c86d941c1ea84fe2fc4e · NVIDIA/tao_pytorch_backend · GitHub.

So that means it has nothing to do with point_cloud_range in .yaml.

I went back to see the .yaml file and max_points_num, located under part inference", was set to 25000 as default.

Do you mean to change it to 204800 and re-export the model?

Yes, it is.

Re-ran trtexec again and then pointpillar inference, still got the same result.

I checked the log messages that popped up when trtexec was running.

I trained and exported the model using TAO toolkit 5.2.0, but until now I’ve been still working under the following docker: http://nvcr.io/nvidia/tao/tao-toolkit:5.1.0-pyt


Are you able to open this onnx file with Netron?

It can be opened.

Could you run trtexec in nvcr.io/nvidia/tao/tao-toolkit:5.2.0-pyt2.1.0 ?

Yes.

OK, glad to know it is passed in this docker.

Yep, I can run trtexec on both dockers and get .engine files from them.

Update
I ran pointpillar inference using the newly generated .engine from my own trained model under docker 5.2.0 and it worked.