Running NN models in GPU & DLA

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) AGX ORIN
• DeepStream Version 6.2

I’m trying to compare running FaceDetect & FaceNet models on GPU/DLA. Once i run the FaceDetect model on DLA, this helped to reduce power consumption with around 100mW, however there’s a small increase in processing speed compare to ruuning it on the GPU.
For FaceNet, once it’s running on DLA the power consumption increases and there’s a huge drop in the FPS & increase in processing time.
I have attached the log files when enabling both models to run on DLA. As per the log file around 227 layers of FaceDetect are not supported however around 77 in FaceNet are not supported. This means that the back and forth communication between DLA & GPU in FaceDetect is higher than FaceNet but i can’t understand why FaceNet model running on DLA causes higher processing time and power consumption
FaceDetect_DLAEngine_unsupportedLayers.txt (49.2 KB)
.
FaceNet_DLAEngine_unsupportedLayers.txt (9.8 KB)

Which models are you talking about? Where and how did you get the models?

Face Detect for face Detection. This is a pre trained model by NVIDIA.
FaceNet for Face recognition. which i converted from PyTorch to .ONNX.

Can you provide your FaceNet onnx file?

Even there are 227 layers fallback to GPU for FaceDetect, it does not mean the DLA&GPU back and forth increases. It depends on the network structure. Maybe many of the faceDetect 227 layers are connected directly together while for FaceNet the 77 layers are separated.

I have attached the facenet.onnx file.
Is there a way to check the names of the unsupported layers because FaceDetect and FaceNet consists of convolutional, pooling layers with different dimensions.
However the above attached files generated when enabling DLA in the configuration file doesn’t give the correct names of layers.
FaceNet unsupported layers consists only of (unnamed layers/ shuffle and constant) which are not there in the model itself

facenet.onnx.zip (83.1 MB)

I have noticed for FaceDetect the below lines are repeated around 10 times to give total of 227 layers and what is written is **“expect fall back to non-int8 implementation for any layer consuming or producing given tensor”**does this mean these layers will not run on DLA?

WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_conv_2/kernel, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_conv_2/bias, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/moving_variance, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/Reshape_1/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/batchnorm/add/y, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/gamma, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/Reshape_3/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/beta, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/Reshape_2/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/moving_mean, expect fall back to non-int8 implementation for any layer consuming or producing given tensor
WARNING: [TRT]: Missing scale and zero-point for tensor block_1b_bn_2/Reshape/shape, expect fall back to non-int8 implementation for any layer consuming or producing given tensor

As per documentation shuffle layer is supported by DLA i’m wondering why does it fall back to the gpu in this model

How did you get the log? By “trtexec”?

no this log is generated when i have enabled the DLA in the detection/ classification configuration files. When running the pipeline to generate the engine file
I’m using JP5.1, TRT 8.5.1

No. It just means the layer does not support int8, it can run on DLA.

what about the shuffle layers that should be supported in DLA but in my case it fell back to the GPU?

but is reduction in FPS expected when the whole model runs on the DLA?

Compared to the layers all run with int8, the FPS declines.

you mean that with GPU enabled these layers will run with int-8 but only when we enable DLA it won’t run on Int-8? is there any justification for this.
and i still need clarification please about the shuffle layers

For this model FaceDetect | NVIDIA NGC, it is true. For the INT8 calibration file attached in this place does not support all layers in DLA while supporting more layers in GPU.

Even the DLA supports “shuffle”, there are still some limitations. Please check whether the layers meet the limitations. Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

Does the INT8 calibration file consists of layers and these layers are not supported by DLA? is this what do you mean? but these operations as shown in the log didn’t fall back to gpu it only fell back to non int8 implementation which i’m assuming FP16 as this is the only supported precision other than INT8 in DLA

If you are talking about the calibration file in FaceDetect | NVIDIA NGC, the calibration file is generated with GPU but not DLA. So if you use the calibration file here with DLA, some layers lack of calibration parameters which will fall back to FP16. These layers still run in DLA but not GPU.

If you want the DLA calibration file, please consult in TensorRT forum. Latest Deep Learning (Training & Inference)/TensorRT topics - NVIDIA Developer Forums

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.