Performance without DLA

Hi ,

Is there a way to disable DLA1 and DLA2 as running jetson_benchmarks?

Due to our customer likes to know NX performance without DLA1 and DLA2 enabled.

Thank you,

Hi,

The simplest way is to update the batch size in nx-benchmarks.csv for DLA into zero.

Thanks.

Hi AastaLLL,

Thank you for your support.

May I double check again the batch size you mention is the column of BatchSizeDLA as below?
image

From the column, only inception_v4, ResNet50_224x224 and ssd-mobilenet-v1 three model enable DLA .
So I only can set these three models to disable DLA?
Please correct me, if I mixed up.

Thank you,

Hi AastaLLL,

We still have NX performance issue as the following.
Thank you for any advice.

  1. From the column, only inception_v4, ResNet50_224x224 and ssd-mobilenet-v1 three model enable DLA .

So I only can set these three models to disable DLA?

  1. After I disabled the three models DLA and only got inception_v4, ResNet50_224x224 worked, and the performance as below are proper or not?

  2. Could I run the benchmark with DLA only? ( I guess only the three models. Or I can get the three models results? Or assume the DLA performance is total performance – DLA0 perforance?)

  3. what is NX GPU and DLA TOPS? ( NX TOPS is 21)

Here is my test on NX. ( NX emmc module + Nano B01 evb board)

Model Name FPS NX EMMC
(Nano_b01)
FPS --H
NX EMMC
(Nano_b01)
DLA 0

inception_v4 311.73 240.93 93.3
vgg19_N2 66.43 58.05 57.64
super_resolution_bsd500 150.46 112.23 112.23
unet-segmentation 145.42 101.197 100.76
pose_estimation 237.1 171.26 174.26
yolov3-tiny-416 546.69 413.74 414.74
ResNet50_224x224 824.02 609.29 245.15
ssd-mobilenet-v1 887.6 625.2 no mobilenet-v1-bs0.onnx

Thank you,

Hi,

Sorry that there are some missing information in my previous comment.

To turn off the batch size, the fps won’t take the processor (GPU or DLA) into account.
But the processor is still running back-end which might have some impact on the bench-marking result.

1.
For a better usage, please set Devices into 1 in nx-benchmarks.csv.

  • 3 if GPU+2DLA, 1 if GPU Only

3.
The requires some update to the source.
You can add some code to skip the GPU inference here:

4. TOPS 21 = 12.3 (GPU) + 2*4.5 (each DLA)

Thanks.

Hi AastaLLL,

Thank you for your support.
After set Devices into 1 and got the result below.

NX EMMC
(NX EVB board) NX EMMC
(NX EVB board)
DLA
0
inception_v4 317.45 193.49
ResNet50_224x224 879.33 621.26
ssd-mobilenet-v1 892.74 770.18

Are the result normal?

Plus, the other models already set Devices to 1.
Does DLA support the other models (vgg19_N2, yolov3-tiny-416, …)?
Why not set the models to use DLA?

Thank you,

Hi,

Could you add some log here to see if you are using the correct process for benchmarking?

DLA is a hardware based inference engine so not all the operation from TensorRT are supported.
Since this is a benchmark script, we turn off the model that cannot be run on the DLA without fallback.

Thanks.