Getting maximum performance from NVDLA

Hello,
I am trying to get maximum performance (Frames per second) on my Xavier. I have flashed my xavier with latest Jetpack 4.3 developer’s preview.
I used bundled trtexec utility with following parameters for MobilenetV1:

–avgRuns=100
–iterations=1000javascript:void(0);
–int8
–useDLACore=0
–allowGPUFallback

I am getting latency of 2.169ms i.e. 461FPS. but published numbers at NVDLA Deep Learning Inference Compiler is Now Open Source | NVIDIA Technical Blog show latency of 1.9ms and 527FPS.
May I know with which setup or configuration the numbers published by Nvidia are generated?

Hi mostafiz.h,

Please set max performance mode before running the test.

sudo nvpmodel -m 0
sudo jetson_clocks

Hello Carolyuu, the numbers posted above is already generated with ‘MAXN’ mode and ‘jetson_clocks’, so that was not the issue. Is there any other way?

Hi again,
I did few more experiments.
I compared Jetpack versions.

Jetpack 4.2.2 int8 MobilenetV1 DLA0 with MaxN and jetson_clocks activated:
Batch 1 : Latency is 2.5ms = 400FPS
Batch 8 : Latency is 12.6ms = 634FPS

Jetpack 4.3DP int8 MobilenetV1 DLA0 with MaxN and jetson_clocks activated:
Batch 1 : Latency is 2.1ms = 462FPS
Batch 8 : Latency is 14.7ms = 540FPS

while, numbers reported at Nvdidia website NVDLA Deep Learning Inference Compiler is Now Open Source | NVIDIA Technical Blog are :
Batch 1 : Latency is 1.9 ms = 527FPS
Batch 8 : Latency is 13.4ms = 599FPS

May I get some help regarding how to reproduce the Nvidia published numbers?
Thanks.

Hi mostafiz.h,

How do you get the profiling result?
Using script or with trtexec?

I am using trtexec.

Hi,

Short help request, if you do not mind sharing:

where did you took the MobilenetV1 (and ResNet50) models that are compatible with the DLA (not falling back to the GPU) ?

Are there weights also ?

Thanks for the help !

Hello dannykario, you can get models from jetson zoo(Jetson Zoo - eLinux.org), All layers except 'prob’in Resnet50 model runs in DLA, but in MobilenetV1 many layers falls back to GPU, you can use the tag ‘–allowFallback’ to run the model.

Hi,

Thanks. Well, not sure if it helps, but see my Q in:

https://devtalk.nvidia.com/default/topic/1068174/tensorrt/allowgpufallback-is-per-layer-or-per-model-/post/5410575/#5410575

If I understand correctly, in the movilenetv1 case, u r running ALL the model on the GPU, not the DLA …