issues with tensorrt uffs ssd int8 sample

Hi,

I have been working to get the uff ssd uint8 tensorrt sample running. Ran into a lot of bugs in the sample… captured the issues and the fixes in the post below but didn’t get any responses on the tensorrt board.

https://devtalk.nvidia.com/default/topic/1049003/tensorrt/segfault-with-tensorrt-uff-ssd-int8/

I was able to finally get the int8 mode to work, but after doing so, I note that it reports an inference time of around 29 ms in 15W mode. (~34 frames per second)

Wondering if anyone else can confirm they see similar performance of the sample_uff_ssd tensorrt sample. I’m hoping I’m doing something very wrong. :)

  • Josh.

Hi,

So, are you using the Jetson Xavier? Or a desktop GPU?

If Xavier, please remember to maximize the system performance before benchmark.

sudo jetson_clocks

Since there was some performance issue in cuDNN, it’s also highly recommended to use our latest JetPack installer.
https://developer.nvidia.com/embedded/jetpack

Thanks.

Yep, it’s a Jetson AGX Xavier dev board which is why I cross posted in this forum. I couldn’t find a way to move the forum post from the original tensorrt forum post, so the details are buried over there. The summary is:

Got the sampleUffSSD example working in FP32 mode within a docker container without any real problems.

Moved to the AGX Xavier board, got that running in FP32 mode as well. Performance was poor as expected as it full 32-bit floating point.

Tried using the --int8 option. Ran into lots of bugs in the sample app itself (captured in that thread). There are still some open issues in that thread that seem a little serious, but they’re isolated to that sample application and primarily are just oversights or limitations of the PPM loader in that codebase. (There is one uninitialized memory question that’s probably especially important.) I was able to work around or fix the issues in my local copy of the code.

In any case, after getting it to build and calibrate, the inference reported on the sampleUffSSD --int8 mode is around 29 ms.

Referencing here:

I was using sudo nvpmodel -q --verbose to query for the power mode, which reported 15W mode.

Looking at jetson_clocks output, I see this.

nvidia@jetson-0423418010368:~$ sudo ./jetson_clocks.sh --show
[sudo] password for nvidia: 
SOC family:tegra194  Machine:jetson-xavier
Online CPUs: 0-3
CPU Cluster Switching: Disabled
cpu0: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu1: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu2: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu3: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu4: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu5: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu6: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
cpu7: Gonvernor=schedutil MinFreq=1190400 MaxFreq=1190400 CurrentFreq=1190400
GPU MinFreq=675750000 MaxFreq=675750000 CurrentFreq=675750000
EMC MinFreq=204000000 MaxFreq=1331200000 CurrentFreq=1331200000 FreqOverride=1
Fan: speed=255

After running sudo ./jetson_clocks.sh, another show has the same output, so I believe I already was running with max clocks?

I’ll rerun the sample in MAXN mode and MODE_15W running jetson_clocks each time and see what the numbers look like just to be sure. I don’t see the samples reported in the official benchmarks, so reaching out here for what I should expect to see for the tensorrt samples running on the AGX Xavier.

Reran in MODE_15W to confirm performance and got 27.7 ms.

nvidia@jetson-0423418010368:~/tensorrt/bin$ ./sample_uff_ssd --int8
../data/ssd/sample_ssd_relu6.uff
Begin parsing model...
End parsing model...
Begin building engine...


End building engine...
 Num batches  2
PPM C=3, H=300, W=300, buffer size: 270000, c=3, h=300, w=300, copy size: 270000
flush...
PPM C=3, H=300, W=300, buffer size: 270000, c=3, h=300, w=300, copy size: 270000
flush...
 Data Size  540000
*** deserializing
Time taken for inference is 27.7309 ms.
 KeepCount 100
Detected dog in the image 0 (../data/ssd/dog.ppm) with confidence 90.811172% and coordinates (81.976357,22.518414),(295.958710,299.032471).
Result stored in dog-0.908112.ppm.
Detected dog in the image 0 (../data/ssd/dog.ppm) with confidence 87.930664% and coordinates (0.816657,0.000000),(117.601402,235.993164).
Result stored in dog-0.879307.ppm.
Detected truck in the image 1 (../data/ssd/bus.ppm) with confidence 81.315407% and coordinates (1.525819,122.979218),(250.242004,246.153259).
Result stored in truck-0.813154.ppm.
Detected car in the image 1 (../data/ssd/bus.ppm) with confidence 69.900932% and coordinates (173.104691,83.396996),(193.124130,95.357376).
Result stored in car-0.699009.ppm.
Detected person in the image 1 (../data/ssd/bus.ppm) with confidence 52.655731% and coordinates (278.384308,161.034729),(298.044464,255.471451).
Result stored in person-0.526557.ppm.

Reran in MAXN, seeing around 16.4 ms.

nvidia@jetson-0423418010368:~/tensorrt/bin$ sudo nvpmodel -m 0
nvidia@jetson-0423418010368:~/tensorrt/bin$ sudo ~/jetson_clocks.sh 
nvidia@jetson-0423418010368:~/tensorrt/bin$ sudo ~/jetson_clocks.sh --show
SOC family:tegra194  Machine:jetson-xavier
Online CPUs: 0-7
CPU Cluster Switching: Disabled
cpu0: Gonvernor=schedutil MinFreq=2265600 MaxFreq=2265600 CurrentFreq=2265600
cpu1: Gonvernor=schedutil MinFreq=2265600 MaxFreq=2265600 CurrentFreq=2265600
cpu2: Gonvernor=schedutil MinFreq=2265600 MaxFreq=2265600 CurrentFreq=2265600
cpu3: Gonvernor=schedutil MinFreq=2265600 MaxFreq=2265600 CurrentFreq=2265600
cpu4: Gonvernor=schedutil MinFreq=2265600 MaxFreq=2265600 CurrentFreq=2265600
cpu5: Gonvernor=schedutil MinFreq=2265600 MaxFreq=2265600 CurrentFreq=2265600
cpu6: Gonvernor=schedutil MinFreq=2265600 MaxFreq=2265600 CurrentFreq=2265600
cpu7: Gonvernor=schedutil MinFreq=2265600 MaxFreq=2265600 CurrentFreq=2265600
GPU MinFreq=1377000000 MaxFreq=1377000000 CurrentFreq=1377000000
EMC MinFreq=204000000 MaxFreq=2133000000 CurrentFreq=2133000000 FreqOverride=1
Fan: speed=255
nvidia@jetson-0423418010368:~/tensorrt/bin$ ./sample_uff_ssd --int8
../data/ssd/sample_ssd_relu6.uff
Begin parsing model...
End parsing model...
Begin building engine...
End building engine...
 Num batches  2
PPM C=3, H=300, W=300, buffer size: 270000, c=3, h=300, w=300, copy size: 270000
flush...
PPM C=3, H=300, W=300, buffer size: 270000, c=3, h=300, w=300, copy size: 270000
flush...
 Data Size  540000
*** deserializing
Time taken for inference is 16.4324 ms.
 KeepCount 100
Detected dog in the image 0 (../data/ssd/dog.ppm) with confidence 90.811172% and coordinates (81.976357,22.518414),(295.958710,299.032471).
Result stored in dog-0.908112.ppm.
Detected dog in the image 0 (../data/ssd/dog.ppm) with confidence 87.930664% and coordinates (0.816657,0.000000),(117.601402,235.993164).
Result stored in dog-0.879307.ppm.
Detected truck in the image 1 (../data/ssd/bus.ppm) with confidence 81.315407% and coordinates (1.525819,122.979218),(250.242004,246.153259).
Result stored in truck-0.813154.ppm.
Detected car in the image 1 (../data/ssd/bus.ppm) with confidence 69.900932% and coordinates (173.104691,83.396996),(193.124130,95.357376).
Result stored in car-0.699009.ppm.
Detected person in the image 1 (../data/ssd/bus.ppm) with confidence 52.655731% and coordinates (278.384308,161.034729),(298.044464,255.471451).
Result stored in person-0.526557.ppm.

This gives me something to play with for my problem. Thanks.

Hi,

Does setting nvpmodel fix your question?
Thanks.