Running NN models in GPU & DLA

noir201 · March 10, 2024, 9:21pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) AGX ORIN
• DeepStream Version 6.2

I’m trying to compare running FaceDetect & FaceNet models on GPU/DLA. Once i run the FaceDetect model on DLA, this helped to reduce power consumption with around 100mW, however there’s a small increase in processing speed compare to ruuning it on the GPU.
For FaceNet, once it’s running on DLA the power consumption increases and there’s a huge drop in the FPS & increase in processing time.
I have attached the log files when enabling both models to run on DLA. As per the log file around 227 layers of FaceDetect are not supported however around 77 in FaceNet are not supported. This means that the back and forth communication between DLA & GPU in FaceDetect is higher than FaceNet but i can’t understand why FaceNet model running on DLA causes higher processing time and power consumption
FaceDetect_DLAEngine_unsupportedLayers.txt (49.2 KB)
.
FaceNet_DLAEngine_unsupportedLayers.txt (9.8 KB)

Fiona.Chen · March 11, 2024, 5:21am

Which models are you talking about? Where and how did you get the models?

noir201 · March 12, 2024, 10:32am

Face Detect for face Detection. This is a pre trained model by NVIDIA.
FaceNet for Face recognition. which i converted from PyTorch to .ONNX.

Fiona.Chen · March 13, 2024, 1:19am

Can you provide your FaceNet onnx file?

Fiona.Chen · March 13, 2024, 6:42am

Even there are 227 layers fallback to GPU for FaceDetect, it does not mean the DLA&GPU back and forth increases. It depends on the network structure. Maybe many of the faceDetect 227 layers are connected directly together while for FaceNet the 77 layers are separated.

noir201 · March 14, 2024, 11:16am

I have attached the facenet.onnx file.
Is there a way to check the names of the unsupported layers because FaceDetect and FaceNet consists of convolutional, pooling layers with different dimensions.
However the above attached files generated when enabling DLA in the configuration file doesn’t give the correct names of layers.
FaceNet unsupported layers consists only of (unnamed layers/ shuffle and constant) which are not there in the model itself

facenet.onnx.zip (83.1 MB)

noir201 · March 15, 2024, 6:49pm

I have noticed for FaceDetect the below lines are repeated around 10 times to give total of 227 layers and what is written is **“expect fall back to non-int8 implementation for any layer consuming or producing given tensor”**does this mean these layers will not run on DLA?

WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_conv_2/kernel, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_conv_2/bias, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/moving_variance, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/Reshape_1/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/batchnorm/add/y, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/gamma, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/Reshape_3/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/beta, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/Reshape_2/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/moving_mean, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/Reshape/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor

noir201 · March 17, 2024, 8:40pm

As per documentation shuffle layer is supported by DLA i’m wondering why does it fall back to the gpu in this model

Fiona.Chen · March 18, 2024, 1:35am

How did you get the log? By “trtexec”?

noir201 · March 18, 2024, 3:34pm

no this log is generated when i have enabled the DLA in the detection/ classification configuration files. When running the pipeline to generate the engine file
I’m using JP5.1, TRT 8.5.1

Fiona.Chen · March 20, 2024, 10:43am

No. It just means the layer does not support int8, it can run on DLA.

noir201 · March 20, 2024, 4:37pm

what about the shuffle layers that should be supported in DLA but in my case it fell back to the GPU?

but is reduction in FPS expected when the whole model runs on the DLA?

Fiona.Chen · March 21, 2024, 2:06am

Compared to the layers all run with int8, the FPS declines.

noir201 · March 21, 2024, 6:06am

you mean that with GPU enabled these layers will run with int-8 but only when we enable DLA it won’t run on Int-8? is there any justification for this.
and i still need clarification please about the shuffle layers

Fiona.Chen · March 21, 2024, 7:08am

For this model FaceDetect | NVIDIA NGC, it is true. For the INT8 calibration file attached in this place does not support all layers in DLA while supporting more layers in GPU.

Even the DLA supports “shuffle”, there are still some limitations. Please check whether the layers meet the limitations. Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

noir201 · March 27, 2024, 7:31am

Does the INT8 calibration file consists of layers and these layers are not supported by DLA? is this what do you mean? but these operations as shown in the log didn’t fall back to gpu it only fell back to non int8 implementation which i’m assuming FP16 as this is the only supported precision other than INT8 in DLA

Fiona.Chen · March 27, 2024, 7:39am

If you are talking about the calibration file in FaceDetect | NVIDIA NGC, the calibration file is generated with GPU but not DLA. So if you use the calibration file here with DLA, some layers lack of calibration parameters which will fall back to FP16. These layers still run in DLA but not GPU.

If you want the DLA calibration file, please consult in TensorRT forum. Latest Deep Learning (Training & Inference)/TensorRT topics - NVIDIA Developer Forums

system · April 10, 2024, 7:40am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trying using contemporary DLA and GPU on Jetson NX DeepStream SDK dla	8	742	April 26, 2023
Unable to use DLA cores in nvinfer DeepStream SDK	9	1005	October 12, 2021
Testing DLA using object detection model and deepstream DeepStream SDK dla	5	133	September 3, 2024
Shuffle Layers fall back to the GPU when enabling the DLA TensorRT	1	239	March 21, 2024
(SHUFFLE): Unsupported on DLA switching to GPU Jetson AGX Orin tensorrt , dla	6	691	April 8, 2024
Running models in 2 DLAs DeepStream SDK	4	327	November 9, 2023
Unsupported shuffle layers to run on DLA Jetson AGX Orin tensorrt , dla	6	564	May 13, 2024
Using DLA for Peoplenet tlt model in Deepstream DeepStream SDK jetson-inference , dla	2	1228	October 12, 2021
DLA performance DeepStream SDK	17	324	September 23, 2024
Unable to use DLA with TensorRT Jetson AGX Xavier	11	3482	November 8, 2018

Running NN models in GPU & DLA

Related topics