Trtexec layer optimization

Hi,

im using an Jetson Xavier AGX with Jetpack 4.3. I converted a modified squeezenet caffemodel to an tensorrtengine with trtexec. I used following command:

./trtexec --deploy=deploy.prototxt --model=squeezenet.caffemodel --output=out1,out2 --fp16 --useDLACore=0 --allowGPUFallback --verbose

And got among others following output:

[02/31/2021-14:11:25] [V] [TRT] Applying generic optimizations to the graph for inference.
[02/31/2021-14:11:25] [V] [TRT] Original: 169 layers
[02/31/2021-14:11:25] [V] [TRT] After dead-layer removal: 169 layers
[02/31/2021-14:11:26] [W] [TRT] Internal DLA error for layer conv7_3. Switching to GPU fallback.
[02/31/2021-14:11:26] [W] [TRT] Internal DLA error for layer conv7_3. Switching to GPU fallback.
[02/31/2021-14:11:26] [W] [TRT] Internal DLA error for layer conv7_3. Switching to GPU fallback.
[02/31/2021-14:11:26] [W] [TRT] Internal DLA error for layer conv1_6. Switching to GPU fallback.
[02/31/2021-14:11:26] [W] [TRT] Internal DLA error for layer conv1_6. Switching to GPU fallback.
[02/31/2021-14:11:26] [W] [TRT] Internal DLA error for layer conv1_6. Switching to GPU fallback.
[02/31/2021-14:11:26] [V] [TRT] After DLA optimization: 11 layers
[02/31/2021-14:11:26] [V] [TRT] After scale fusion: 11 layers
[02/31/2021-14:11:26] [V] [TRT] After vertical fusions: 11 layers
[02/31/2021-14:11:26] [V] [TRT] After final dead-layer removal: 11 layers
[02/31/2021-14:11:26] [V] [TRT] After tensor merging: 11 layers
[02/31/2021-14:11:26] [V] [TRT] Eliminating concatenation concat_stage6
[02/31/2021-14:11:26] [V] [TRT] Generating copy for conv7_3 to concat6
[02/31/2021-14:11:26] [V] [TRT] Generating copy for conv7_3 to concat6
[02/31/2021-14:11:26] [V] [TRT] Generating copy for conv4_4 to concat6
[02/31/2021-14:11:26] [V] [TRT] After concat removal: 13 layers
.
.
.
[02/31/2021-14:11:48] [V] [TRT] After reformat layers: 25 layers

My questions are the following:

  1. Is it possible that the network gets so dramatically reduced or could this be a hint for an error?
  2. What is the explanation for the different layer numbers?
  3. Does the outcoming network has 25 layers?

Thanks in advance.

Hi,

1. It should be okay. TensorRT tends to merge layer for acceleration.

2. You can check this tutorial to know more about TensorRT mechanism.

3. Yes.

Thanks.

Hi thanks for your answer.
But if i generate the TensorRT engine just with the GPU i get 105 after reformat layers.

The generated TensorRT engine with GPU is also working and show the desired output, but the TensorRT engine for DLA doesn’t work and don’t show anything.