[Xavier NX + DLA] does not support dynamic shapes, and CBUF size requirement

HelloNewJAPAN · December 23, 2020, 11:42pm

Hi,
My English isn’t so good so feel free to ask me if there is anything unclear.

Thank you for your assistance always.

Using TensorRT (trtexec) in a [Jetson Xavier NX + DLA] environment.

When the Convolution layer is connected after the Resize layer, the following two messages are output and executed by GPU FallBack.

DLA Layer Conv_1 does not support dynamic shapes in any dimension.
DLA LAYER: CBUF size requirement for layer Conv_1 is 131072banks, which exceeds the limit (16).

I would appreciate it if you could answer the following questions about this message.

・Is it a specification that such a message is output when I connect a Convolution layer after a Resize layer? Also, what is the cause of this?

・Is there any way to work around this Issue?

I’ve uploaded the model I used for testing.
Use it if necessary you need it.
dla_resize_test.onnx (218.1 KB)

Thank you in advance.

Regards,

AastaLLL · December 28, 2020, 5:13am

Hi,

1. The message indicates that DLA doesn’t support dynamic shape input.
The dynamic shape means TensorRT chooses the size of a tensor at the runtime.
If you have a fixed input size, please create the network with an implicit batch rather than explicit.

2. Please try if you can skip the error via implicit batch first.

Thanks.

HelloNewJAPAN · January 6, 2021, 8:13am

Hi,
I really appreciate your reply.

I tried running it with implicit batch size, but I get the same error.
The following is the runtime log.

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/work/dla_resize_test_N.onnx --workspace=2048 --fp16 --useDLACore=0 --allowGPUFallback --shapes=input:0:1x32x32x32
[01/06/2021-16:45:45] [I] === Model Options ===
[01/06/2021-16:45:45] [I] Format: ONNX
[01/06/2021-16:45:45] [I] Model: /home/jetson/work/dla_resize_test_N.onnx
[01/06/2021-16:45:45] [I] Output:
[01/06/2021-16:45:45] [I] === Build Options ===
[01/06/2021-16:45:45] [I] Max batch: explicit
[01/06/2021-16:45:45] [I] Workspace: 2048 MB
[01/06/2021-16:45:45] [I] minTiming: 1
[01/06/2021-16:45:45] [I] avgTiming: 8
[01/06/2021-16:45:45] [I] Precision: FP32+FP16
[01/06/2021-16:45:45] [I] Calibration: 
[01/06/2021-16:45:45] [I] Safe mode: Disabled
[01/06/2021-16:45:45] [I] Save engine: 
[01/06/2021-16:45:45] [I] Load engine: 
[01/06/2021-16:45:45] [I] Builder Cache: Enabled
[01/06/2021-16:45:45] [I] NVTX verbosity: 0
[01/06/2021-16:45:45] [I] Inputs format: fp32:CHW
[01/06/2021-16:45:45] [I] Outputs format: fp32:CHW
[01/06/2021-16:45:45] [I] Input build shape: input:0=1x32x32x32+1x32x32x32+1x32x32x32
[01/06/2021-16:45:45] [I] Input calibration shapes: model
[01/06/2021-16:45:45] [I] === System Options ===
[01/06/2021-16:45:45] [I] Device: 0
[01/06/2021-16:45:45] [I] DLACore: 0(With GPU fallback)
[01/06/2021-16:45:45] [I] Plugins:
[01/06/2021-16:45:45] [I] === Inference Options ===
[01/06/2021-16:45:45] [I] Batch: Explicit
[01/06/2021-16:45:45] [I] Input inference shape: input:0=1x32x32x32
[01/06/2021-16:45:45] [I] Iterations: 10
[01/06/2021-16:45:45] [I] Duration: 3s (+ 200ms warm up)
[01/06/2021-16:45:45] [I] Sleep time: 0ms
[01/06/2021-16:45:45] [I] Streams: 1
[01/06/2021-16:45:45] [I] ExposeDMA: Disabled
[01/06/2021-16:45:45] [I] Spin-wait: Disabled
[01/06/2021-16:45:45] [I] Multithreading: Disabled
[01/06/2021-16:45:45] [I] CUDA Graph: Disabled
[01/06/2021-16:45:45] [I] Skip inference: Disabled
[01/06/2021-16:45:45] [I] Inputs:
[01/06/2021-16:45:45] [I] === Reporting Options ===
[01/06/2021-16:45:45] [I] Verbose: Disabled
[01/06/2021-16:45:45] [I] Averages: 10 inferences
[01/06/2021-16:45:45] [I] Percentile: 99
[01/06/2021-16:45:45] [I] Dump output: Disabled
[01/06/2021-16:45:45] [I] Profile: Disabled
[01/06/2021-16:45:45] [I] Export timing to JSON file: 
[01/06/2021-16:45:45] [I] Export output to JSON file: 
[01/06/2021-16:45:45] [I] Export profile to JSON file: 
[01/06/2021-16:45:45] [I] 
----------------------------------------------------------------
Input filename:   /home/jetson/work/dla_resize_test_N.onnx
ONNX IR version:  0.0.7
Opset version:    12
Producer name:    tf2onnx
Producer version: 1.7.2
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[01/06/2021-16:45:47] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/06/2021-16:45:47] [W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/test/test_conv1/Conv2D__5 is not supported on DLA, falling back to GPU.
[01/06/2021-16:45:47] [W] [TRT] DLA only supports FP16 and Int8 precision type. Switching Shape__12 device type to GPU.
[01/06/2021-16:45:47] [W] [TRT] DLA only supports FP16 and Int8 precision type. Switching Slice__13 device type to GPU.
[01/06/2021-16:45:47] [W] [TRT] DLA only supports FP16 and Int8 precision type. Switching (Unnamed Layer* 5) [Constant] device type to GPU.
[01/06/2021-16:45:47] [W] [TRT] Concat__15: DLA only supports concatenation on the C dimension.
[01/06/2021-16:45:47] [W] [TRT] DLA only supports FP16 and Int8 precision type. Switching Concat__15 device type to GPU.
[01/06/2021-16:45:47] [W] [TRT] Default DLA is enabled but layer Resize__16 is not supported on DLA, falling back to GPU.
[01/06/2021-16:45:47] [W] [TRT] DLA Layer StatefulPartitionedCall/test/last_conv/Conv2D does not support dynamic shapes in any dimension.
[01/06/2021-16:45:47] [W] [TRT] DLA LAYER: CBUF size requirement for layer StatefulPartitionedCall/test/last_conv/Conv2D is 131072banks, which exceeds the limit (16).
[01/06/2021-16:45:47] [W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/test/last_conv/Conv2D is not supported on DLA, falling back to GPU.
[01/06/2021-16:45:47] [W] [TRT] DLA Layer StatefulPartitionedCall/test/last_relu/Relu does not support dynamic shapes in any dimension.
[01/06/2021-16:45:47] [W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/test/last_relu/Relu is not supported on DLA, falling back to GPU.
[01/06/2021-16:45:47] [W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/test/last_conv/Conv2D__18 is not supported on DLA, falling back to GPU.
[01/06/2021-16:45:47] [I] [TRT] 
[01/06/2021-16:45:47] [I] [TRT] --------------- Layers running on DLA: 
[01/06/2021-16:45:47] [I] [TRT] {StatefulPartitionedCall/test/test_conv1/Conv2D,StatefulPartitionedCall/test/test_relu1/Relu}, 
[01/06/2021-16:45:47] [I] [TRT] --------------- Layers running on GPU: 
[01/06/2021-16:45:47] [I] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D__5, Resize__16, StatefulPartitionedCall/test/last_conv/Conv2D + StatefulPartitionedCall/test/last_relu/Relu, StatefulPartitionedCall/test/last_conv/Conv2D__18, 
[01/06/2021-16:46:02] [W] [TRT] No implementation obeys reformatting-free rules, at least 2 reformatting nodes are needed, now picking the fastest path instead.
[01/06/2021-16:46:05] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[01/06/2021-16:46:05] [I] Starting inference threads
[01/06/2021-16:46:09] [I] Warmup completed 0 queries over 200 ms
[01/06/2021-16:46:09] [I] Timing trace has 0 queries over 3.00147 s
[01/06/2021-16:46:09] [I] Trace averages of 10 runs:
~~---------------
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/work/dla_resize_test_N.onnx --workspace=2048 --fp16 --useDLACore=0 --allowGPUFallback --shapes=input:0:1x32x32x32

I’m afraid my explanation wasn’t very good.

I’m sorry, what I meant to say was when using DLA, I would like to know if it is a specification that the following error is output when the Convolution layer is executed immediately after the Resize layer.

DLA Layer /last_conv/Conv2D does not support dynamic shapes in any dimension.
DLA LAYER: CBUF size requirement for layer /last_conv/Conv2D is 131072banks, which exceeds the limit (16).

I appreciate your efforts. Thank you for your cooperation.

Best regards,

AastaLLL · January 15, 2021, 3:59am

Hi,

No. The resize layer does not cause this issue.

There is some issue in our TensorRT checker, which is used before sending a layer to DLA.
And we are discussing if it is possible to remove/update the checker.

We will let you know about the following.
Thanks.

HelloNewJAPAN · January 18, 2021, 12:25am

Hi,

Thank you for your continuous support.
Let me know if you have anything you want to ask.

AastaLLL · February 1, 2021, 9:24am

Hi,

Similar to I get Internal DLA error and it runs on GPU FallBack.
We double-checked this error with our internal branch, and the fallback still represents.

This indicates that the layer is really over the CBUF of DLA.
So it is fallbacked due to a hardware limitation.

Thanks.

HelloNewJAPAN · February 2, 2021, 5:18am

Hi,
Thank you very much for your reply.

I apologize for I knew I was supposed to tell you this but I forgot.

This phenomenon also occurs for small Convolution layers.
It also happens only with the Convolution layer that is executed immediately after the Resize layer.

Upload the ONNX model for testing.
resize_conv_op.onnx (1.7 KB)

The following is the log when I tried to run the uploaded ONNX model with DLA.
The two Convolution Layers have the same settings.
A 1x32x32 (CHW) Convolution can usually be run in DLA, but if it is run after the Resize layer, it will be GPU Fall Back.

[W] [TRT] DLA Layer StatefulPartitionedCall/test/conv_2/Conv2D does not support dynamic shapes in any dimension.
[W] [TRT] DLA LAYER: CBUF size requirement for layer StatefulPartitionedCall/test/conv_2/Conv2D is 131072banks, which exceeds the limit (16).
[W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/test/conv_2/Conv2D is not supported on DLA, falling back to GPU.
[I] [TRT] 
[I] [TRT] --------------- Layers running on DLA: 
[I] [TRT] {StatefulPartitionedCall/test/conv_1/Conv2D}, 
[I] [TRT] --------------- Layers running on GPU: 
[I] [TRT] StatefulPartitionedCall/test/conv_1/Conv2D__5, Resize__16, StatefulPartitionedCall/test/conv_2/Conv2D, StatefulPartitionedCall/test/conv_2/Conv2D__18,

From this, I suspect that the Resize Layer is the cause.

Do you happen to have any knowledge about this phenomenon?

I appreciate your efforts. Thank you for your cooperation.

AastaLLL · February 23, 2021, 8:33am

Hi,

Sorry that my previous statement may not be clear enough.

There are two possible issue that might cause the CBUF failure.
One is the incorrect formula checker in TensorRT and the other is the CBUF OOM.

The resize_conv_op.onnx may be rejected due to the checker bug.
It looks similar on current JetPack since the fix is not available for Jetson user yet.

Thanks.

HelloNewJAPAN · February 24, 2021, 8:23am

Hi,
Oh, I understand. Thank you for your perfect explanation.

Thanks.

Topic		Replies	Views
Why some convolutional layers cannot be compiled on DLA under int8? Jetson AGX Orin tensorrt	14	341	July 23, 2024
Xavier NX does not support adaptative average pooling on DLA? Jetson Xavier NX tensorrt	27	1105	October 11, 2023
[Deconvolution + DLA] The output result is different from the GPU runtime Jetson Xavier NX tensorrt , dla	8	973	August 11, 2021
Convert model to TensorRT with DLA \| DLA Node compilation Failed TensorRT	3	910	October 12, 2021
Conver tf1 model to onnx, inference in tensorrt error Jetson Xavier NX tensorrt , tensorflow , jetson-inference , python	4	1220	October 10, 2021
Cannot build a TensorRT engine for DLA from a large ONNX file Jetson Xavier NX tensorrt , nvbugs , dla	12	2615	July 21, 2021
Multiple issues running nets on DLA Jetson AGX Xavier	15	1508	October 18, 2021
I get Internal DLA error and it runs on GPU FallBack Jetson Xavier NX tensorrt , nvbugs , dla	10	1147	October 18, 2021
Issues with running UFFSSD on DLA with Jetson Xavier Jetson AGX Xavier tensorrt , nvbugs , dla	18	1244	October 18, 2021
Observing different output when running layer on DLA vs running on GPU Jetson AGX Xavier dla	5	1056	June 25, 2021

[Xavier NX + DLA] does not support dynamic shapes, and CBUF size requirement

Related topics