Detection tesnorRT takes seconds to run on TX2

I’m using a detection model that has been converted to TensorRT on a TX2.
A few days ago, it started to run extremely slow -

  • original speed ~40[msec]
  • now over 2 [sec] or slower

I tried to restart the TX2 - that didn’t help.

I tried to use a different model or use the same one but with a different name.
At first it helped, but now run extremely slowly every time.

Any help is appreaciated
Alex

Details:
* Jetpack: 4.2.2 [L4T 32.2.1]
* CUDA: 10.0.326
* cuDNN: 7.5.0.56-1+cuda10.0
* TensorRT: 5.1.6.1-1+cuda10.0
* OpenCV: 3.4.6 compiled CUDA: YES

Hello,
Moving this to Jetson team so that the team can have a look and help you better.
Thanks!

Thanks @AakankshaS

Hi,

Do you install any package or adjust the clock recently?
Suppose you should get the similar speed if the software stay the same.

You can also try to maximize the system performance to see if helps.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

thanks @AastaLLL - I have done that.

as part of the initialisation step, I apply a single inference prior to really activating the system.
after this inference, sometimes, the used RAM is much higher than it should be.
most cases it would be around:
RAM = 6720/7860 (from jopt() stat)
but at times I measure:
RAM = 7614/7860 (from jopt() stat)
or higher.
if this happens, I get inference times that are 1[sec] or 7[sec] or more.

  • any idea why I get such differences in the RAM usage after first inference (this is a unit test, so same image and data each time, same code of course)?
  • Would there be something else to measure / restrict?

Thanks

@AastaLLL

I can understand that if the memory is full, inference would take a long time.

There are 3 models I have been using - all have been converted to tensorRT using the TF-TRT approach (as pointed out in Should pruning a model prior to converting it to tensorRT make inference faster?).
for each of the models, i have defined the memory usage as
max_workspace_size_bytes = 1 << 26

these are the RAM usages:

  • after loading model 1: RAM = 2224/7860, swap = 600/4096
  • after loading model 2: RAM = 2322/7860, swap = 600/4096
  • after loading model 3: RAM = 2462/7860, swap = 600/4096

after applying them for the first time, the RAM usage jumps to ~7000/7860
should it be that high?

Hi,

We will recommend you to use pure TensorRT for inference instead.
TF-TRT use TensorFlow interface so by default it occupies most of GPU memory to allow fast algorithm.

You can try to export model .pb->.uff(.onnx)->.trt to get a better performance with pure TensorRT.
Here is an example for your reference:

/usr/src/tensorrt/samples/sampleUffSSD/

Thanks.

1 Like

Thanks @AastaLLL
will try it shortly