Inference slow using nvInfer and TensorRT directly into PX2

mayank.mahajan · April 15, 2019, 9:42am

Hi,

I have integrated Jetson-inference in Drive PX2 (jetson-inference Github: GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.). I have modified a sample for detection multiple objects using jetson-inference library.

I call a function detect which is defined in DetectNet.cpp. Inside Detect function ‘execute’ is used(found in Line 513 on, https://github.com/dusty-nv/jetson-inference/blob/master/detectNet.cpp)

I used std::chrono::high_resolution_clock to measure the time taken for ‘execute’ command to run, which in returns gives me the time taken my network to detect the object in a picture.

My initial results on the host PC with FP32 accuracy were in between 50-60 ms per image.

After few weeks, I again ran the time measurement on the same ‘execute’ command, I saw the detection takes in between 150-170 ms, which is 4 times slower than the original measurement on my host PC.

After some digging, I found ‘execute’ command is called from NvInfer.h. I used a caffe model and prototxt file to create a TensorRT Fp32 engine. The detectNet::create makes an engine using tensorNet.cpp, which is provided in the gihub.

I came across ‘nvprof’ but it gives me error on execution.

I cant figure out how the detection became 4 time slower.

Any help would be appreciated.

Thanks

SivaRamaKrishnaNV · April 15, 2019, 10:01am

Dear mayank,
very strange. Is the time increased on other machine(your collegue’s) too? Double check the input image size, TRT,cudnn versions.

mayank.mahajan · April 15, 2019, 1:14pm

Hi Siva,

At the moment, Only I have the right libraries installed, so my colleagues cannot check it.

We checked it in PX2, and it gave the same result as my host PC. that was the reason I checked it again.

Input size I use is 1920X1208, TRT 3.0, NvInfer 4.0.

CUDNN should be 7.1 for CUDA 9.0.

Regards
Mayank

SivaRamaKrishnaNV · April 16, 2019, 8:39am

Dear mayank.mahajan,
check dpkg -l | grep TensorRT on your machine. It should report something like below

ii  libnvinfer-dev                                           4.0.2-1+cuda9.0                              amd64        TensorRT development libraries and headers
ii  libnvinfer-samples                                       4.0.2-1+cuda9.0                              amd64        TensorRT samples and documentation
ii  libnvinfer4                                              4.0.2-1+cuda9.0                              amd64        TensorRT runtime libraries
ii  python3-libnvinfer                                       4.0.2-1+cuda9.0                              amd64        Python 3 bindings for TensorRT
ii  python3-libnvinfer-dev                                   4.0.2-1+cuda9.0                              amd64        Python 3 development package for TensorRT
ii  python3-libnvinfer-doc                                   4.0.2-1+cuda9.0                              amd64        Documention and samples of python bindings for TensorRT
ii  tensorrt                                                 3.0.2-1+cuda9.0                              amd64        Meta package of TensorRT
ii  uff-converter-tf                                         4.0.2-1+cuda9.0                              amd64        UFF converter for TensorRT pack

Source: https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt_302/tensorrt-install-guide/index.html

It could be issue with sate of your host machine. You may have reinstall and setup your machine appropriately again. Do you notice change in timings on DrivePX2 as well?

mayank.mahajan · April 17, 2019, 6:54am

Hi Siva,

I have cross checked the versions. It shows me the exact same output as you have in your comment.

We have checked the time in DrivePX2 and noticed that it takes 150ms. Then I cross checked in the host machine again it showed me 150ms.

I found it very strange.

In my opinion, the time taken per detection should be around 50ms as seen before, and also seen with DW SDK.
Int8 inference takes around 110ms which is also a lot.

Any ideas where it can go wrong?

Regards
Mayank

SivaRamaKrishnaNV · April 17, 2019, 7:49am

Dear Mayank,
Does that mean time on Drive PX2 also increased? If your host PC’s GPU and Drive PX2 dGPU has similar computational power, we can expect them to have similar inference timing. Please check running deviceQuery from CUDA samples to know GPU configuration on both Host and Drive PX2.

mayank.mahajan · April 17, 2019, 8:11am

Hi Siva,

We have first time tested out object detection project in Drive PX2. So I cannot comment if it increased or decreased in DrivePx2.

By your comment, I guess if my host PC shown an increase, it make sense that PX2 also showed 160ms.

I can perform deviceQuery on drivePX2 and my host PC.

I will update you as soon as possible.

Regards
Mayank

Topic		Replies	Views
Inference is so slow with torch1.6 Jetson Xavier NX nvbugs , pytorch	12	3529	October 23, 2020
Extremely slow inference with MMDetection on Jetson Xavier NX Jetson Xavier NX jetson-inference	7	1891	June 27, 2022
Object Detection working very slow on Jetson TX2 Jetson TX2	9	1694	October 18, 2021
Detection tesnorRT takes seconds to run on TX2 Jetson TX2 tensorrt , jetson-inference	8	652	October 18, 2021
Slow inference on jetson TX2 with tensorflow Jetson TX2	2	599	October 18, 2021
Optimize Inference Time of yolov2 model on Jetson Nano NX Jetson Xavier NX tensorrt , tensorflow	2	727	June 15, 2022
Object detection models are very slow Jetson TX2	5	1454	October 18, 2021
Slow inference using tensorrt sampleFasterRCNN, 320ms/frame Jetson TX2	5	1397	October 18, 2021
Performance difference of tensorRT versus nvcaffe+cuDNN GPU-Accelerated Libraries	2	2611	February 1, 2018
Increasing fps on Jetson TX2 for a Tensorflow algorithm Jetson TX2	7	2535	October 18, 2021

Inference slow using nvInfer and TensorRT directly into PX2

Related topics