Benchmarking TLT YOLOv3

Update: I realized that I need to build TensorRT OSS to handle the custom layers in YOLO, however, the build fails within the docker. See my next post for the error.


I’m trying to build and benchmark a yolo model with TLT and I’m running into some issues. Simply following yolo.ipynb, exporting a QAT trained and pruned model, the second to last cell (that calls tat-infer yolo) has speed of ~2.55 it/sec which is REALLY slow (this is on an RTX 2080 Ti).

Working under the assumption that this may be due to miscellaneous IO on the images, I checked out the docker to try benchmarking with trtexec directly. However, it fails with
INVALID_ARGUMENT: getPluginCreator could not find plugin BatchTilePlugin_TRT version 1

So I have three questions:

  1. Is there a way I can get a reliable benchmark of the TLT Yolo within the TLT docker?
  2. Is there a docker or release of TensorRT that has the BatchTilePlugin registered that will run the TRT version 7.0.0-1 engine output by TLT?
  3. If the answer to 2 is no, where can I find how to register the BatchTilePlugin?


Update: Trying to install TensorRT OSS by calling /opt/tensorrt/ within fails with the following linker error:
/usr/bin/g++ -Wno-deprecated-declarations -DBUILD_SYSTEM=cmake_oss -DCMAKE_HAVE_LIBC_PTHREAD CMakeFiles/cmTC_627a1.dir/src.cxx.o -o cmTC_627a1 CMakeFiles/cmTC_627a1.dir/src.cxx.o: In function 'main': src.cxx:(.text+0x3e): undefined reference to 'pthread_create' src.cxx:(.text+0x4a): undefined reference to 'pthread_detach' src.cxx:(.text+0x56): undefined reference to 'pthread_cancel' src.cxx:(.text+0x67): undefined reference to 'pthread_join' src.cxx:(.text+0x7b): undefined reference to 'pthread_atfork' collect2: error: ld returned 1 exit status CMakeFiles/cmTC_627a1.dir/build.make:105: recipe for target 'cmTC_627a1' failed make[1]: *** [cmTC_627a1] Error 1 make[1]: Leaving directory '/opt/tensorrt/TensorRT/build/CMakeFiles/CMakeTmp' Makefile:140: recipe for target 'cmTC_627a1/fast' failed make: *** [cmTC_627a1/fast] Error 2

Please build from TRT OSS and replace it, refer to Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation and GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream.

YOLOv3 requires batchTilePlugin, resizeNearestPlugin and batchedNMSPlugin. This plugin is available in the TensorRT open source repo, but not in TensorRT 7.0. Detailed instructions to build TensorRT OSS can be found in TensorRT Open Source Software (OSS).

For benchmark, please use trtexec to test outside the docker.

In my update post, you’ll see I tried to follow your suggestion using the TensorRT docker image. However, the build fails when trying to link in pthread.

I am not sure what is your steps to reproduce the build failure.
Please follow the step of deepstream_tao_apps/TRT-OSS/Jetson at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub.

I’m following TensorRT | NVIDIA NGC

docker pull
docker run --gpus all -it --rm -v local_dir:container_dir
Install make 3.18 (since it’s missing)
Then call
as the docker container tells you to do once it starts to install OSS

I’ll try following the page you linked instead.

I’m running into the same build issue following the manual instructions for Building TensorRT OSS here deepstream_tao_apps/TRT-OSS/x86 at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub.

I’m still building within a docker container.

For some reason, it keeps failing to link on pthread with the same error as my first reply.

Further update: I have no problems Building TensorRT OSS with I don’t even have to install make. It’s too bad I need TRT v 7.0.0-1 to read the trt.engine export from TLT

Hi @relativequanta1
From deepstream_tao_apps/TRT-OSS/x86 at master · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub, it is not necessary to build TensorRT OSS Plugin within a tensorrt docker container.

Please just try to follow

1. Installl Cmake (>= 3.13)

2. Build TensorRT OSS Plugin

3. Replace “*”