[Xavier NX + DLA] does not support dynamic shapes, and CBUF size requirement

Hi,
My English isn’t so good so feel free to ask me if there is anything unclear.

Thank you for your assistance always.

Using TensorRT (trtexec) in a [Jetson Xavier NX + DLA] environment.

When the Convolution layer is connected after the Resize layer, the following two messages are output and executed by GPU FallBack.

DLA Layer Conv_1 does not support dynamic shapes in any dimension.
DLA LAYER: CBUF size requirement for layer Conv_1 is 131072banks, which exceeds the limit (16).

I would appreciate it if you could answer the following questions about this message.

・Is it a specification that such a message is output when I connect a Convolution layer after a Resize layer? Also, what is the cause of this?

・Is there any way to work around this Issue?

I’ve uploaded the model I used for testing.
Use it if necessary you need it.
dla_resize_test.onnx (218.1 KB)

Thank you in advance.

Regards,

Hi,

1. The message indicates that DLA doesn’t support dynamic shape input.
The dynamic shape means TensorRT chooses the size of a tensor at the runtime.
If you have a fixed input size, please create the network with an implicit batch rather than explicit.

2. Please try if you can skip the error via implicit batch first.

Thanks.

1 Like

Hi,
I really appreciate your reply.

I tried running it with implicit batch size, but I get the same error.
The following is the runtime log.

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/work/dla_resize_test_N.onnx --workspace=2048 --fp16 --useDLACore=0 --allowGPUFallback --shapes=input:0:1x32x32x32
[01/06/2021-16:45:45] [I] === Model Options ===
[01/06/2021-16:45:45] [I] Format: ONNX
[01/06/2021-16:45:45] [I] Model: /home/jetson/work/dla_resize_test_N.onnx
[01/06/2021-16:45:45] [I] Output:
[01/06/2021-16:45:45] [I] === Build Options ===
[01/06/2021-16:45:45] [I] Max batch: explicit
[01/06/2021-16:45:45] [I] Workspace: 2048 MB
[01/06/2021-16:45:45] [I] minTiming: 1
[01/06/2021-16:45:45] [I] avgTiming: 8
[01/06/2021-16:45:45] [I] Precision: FP32+FP16
[01/06/2021-16:45:45] [I] Calibration: 
[01/06/2021-16:45:45] [I] Safe mode: Disabled
[01/06/2021-16:45:45] [I] Save engine: 
[01/06/2021-16:45:45] [I] Load engine: 
[01/06/2021-16:45:45] [I] Builder Cache: Enabled
[01/06/2021-16:45:45] [I] NVTX verbosity: 0
[01/06/2021-16:45:45] [I] Inputs format: fp32:CHW
[01/06/2021-16:45:45] [I] Outputs format: fp32:CHW
[01/06/2021-16:45:45] [I] Input build shape: input:0=1x32x32x32+1x32x32x32+1x32x32x32
[01/06/2021-16:45:45] [I] Input calibration shapes: model
[01/06/2021-16:45:45] [I] === System Options ===
[01/06/2021-16:45:45] [I] Device: 0
[01/06/2021-16:45:45] [I] DLACore: 0(With GPU fallback)
[01/06/2021-16:45:45] [I] Plugins:
[01/06/2021-16:45:45] [I] === Inference Options ===
[01/06/2021-16:45:45] [I] Batch: Explicit
[01/06/2021-16:45:45] [I] Input inference shape: input:0=1x32x32x32
[01/06/2021-16:45:45] [I] Iterations: 10
[01/06/2021-16:45:45] [I] Duration: 3s (+ 200ms warm up)
[01/06/2021-16:45:45] [I] Sleep time: 0ms
[01/06/2021-16:45:45] [I] Streams: 1
[01/06/2021-16:45:45] [I] ExposeDMA: Disabled
[01/06/2021-16:45:45] [I] Spin-wait: Disabled
[01/06/2021-16:45:45] [I] Multithreading: Disabled
[01/06/2021-16:45:45] [I] CUDA Graph: Disabled
[01/06/2021-16:45:45] [I] Skip inference: Disabled
[01/06/2021-16:45:45] [I] Inputs:
[01/06/2021-16:45:45] [I] === Reporting Options ===
[01/06/2021-16:45:45] [I] Verbose: Disabled
[01/06/2021-16:45:45] [I] Averages: 10 inferences
[01/06/2021-16:45:45] [I] Percentile: 99
[01/06/2021-16:45:45] [I] Dump output: Disabled
[01/06/2021-16:45:45] [I] Profile: Disabled
[01/06/2021-16:45:45] [I] Export timing to JSON file: 
[01/06/2021-16:45:45] [I] Export output to JSON file: 
[01/06/2021-16:45:45] [I] Export profile to JSON file: 
[01/06/2021-16:45:45] [I] 
----------------------------------------------------------------
Input filename:   /home/jetson/work/dla_resize_test_N.onnx
ONNX IR version:  0.0.7
Opset version:    12
Producer name:    tf2onnx
Producer version: 1.7.2
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[01/06/2021-16:45:47] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/06/2021-16:45:47] [W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/test/test_conv1/Conv2D__5 is not supported on DLA, falling back to GPU.
[01/06/2021-16:45:47] [W] [TRT] DLA only supports FP16 and Int8 precision type. Switching Shape__12 device type to GPU.
[01/06/2021-16:45:47] [W] [TRT] DLA only supports FP16 and Int8 precision type. Switching Slice__13 device type to GPU.
[01/06/2021-16:45:47] [W] [TRT] DLA only supports FP16 and Int8 precision type. Switching (Unnamed Layer* 5) [Constant] device type to GPU.
[01/06/2021-16:45:47] [W] [TRT] Concat__15: DLA only supports concatenation on the C dimension.
[01/06/2021-16:45:47] [W] [TRT] DLA only supports FP16 and Int8 precision type. Switching Concat__15 device type to GPU.
[01/06/2021-16:45:47] [W] [TRT] Default DLA is enabled but layer Resize__16 is not supported on DLA, falling back to GPU.
[01/06/2021-16:45:47] [W] [TRT] DLA Layer StatefulPartitionedCall/test/last_conv/Conv2D does not support dynamic shapes in any dimension.
[01/06/2021-16:45:47] [W] [TRT] DLA LAYER: CBUF size requirement for layer StatefulPartitionedCall/test/last_conv/Conv2D is 131072banks, which exceeds the limit (16).
[01/06/2021-16:45:47] [W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/test/last_conv/Conv2D is not supported on DLA, falling back to GPU.
[01/06/2021-16:45:47] [W] [TRT] DLA Layer StatefulPartitionedCall/test/last_relu/Relu does not support dynamic shapes in any dimension.
[01/06/2021-16:45:47] [W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/test/last_relu/Relu is not supported on DLA, falling back to GPU.
[01/06/2021-16:45:47] [W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/test/last_conv/Conv2D__18 is not supported on DLA, falling back to GPU.
[01/06/2021-16:45:47] [I] [TRT] 
[01/06/2021-16:45:47] [I] [TRT] --------------- Layers running on DLA: 
[01/06/2021-16:45:47] [I] [TRT] {StatefulPartitionedCall/test/test_conv1/Conv2D,StatefulPartitionedCall/test/test_relu1/Relu}, 
[01/06/2021-16:45:47] [I] [TRT] --------------- Layers running on GPU: 
[01/06/2021-16:45:47] [I] [TRT] StatefulPartitionedCall/test/test_conv1/Conv2D__5, Resize__16, StatefulPartitionedCall/test/last_conv/Conv2D + StatefulPartitionedCall/test/last_relu/Relu, StatefulPartitionedCall/test/last_conv/Conv2D__18, 
[01/06/2021-16:46:02] [W] [TRT] No implementation obeys reformatting-free rules, at least 2 reformatting nodes are needed, now picking the fastest path instead.
[01/06/2021-16:46:05] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[01/06/2021-16:46:05] [I] Starting inference threads
[01/06/2021-16:46:09] [I] Warmup completed 0 queries over 200 ms
[01/06/2021-16:46:09] [I] Timing trace has 0 queries over 3.00147 s
[01/06/2021-16:46:09] [I] Trace averages of 10 runs:
~~---------------
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/work/dla_resize_test_N.onnx --workspace=2048 --fp16 --useDLACore=0 --allowGPUFallback --shapes=input:0:1x32x32x32

I’m afraid my explanation wasn’t very good.

I’m sorry, what I meant to say was when using DLA, I would like to know if it is a specification that the following error is output when the Convolution layer is executed immediately after the Resize layer.

DLA Layer /last_conv/Conv2D does not support dynamic shapes in any dimension.
DLA LAYER: CBUF size requirement for layer /last_conv/Conv2D is 131072banks, which exceeds the limit (16).

I appreciate your efforts. Thank you for your cooperation.

Best regards,

Hi,

No. The resize layer does not cause this issue.

There is some issue in our TensorRT checker, which is used before sending a layer to DLA.
And we are discussing if it is possible to remove/update the checker.

We will let you know about the following.
Thanks.

1 Like

Hi,

Thank you for your continuous support.
Let me know if you have anything you want to ask.

Hi,

Similar to I get Internal DLA error and it runs on GPU FallBack.
We double-checked this error with our internal branch, and the fallback still represents.

This indicates that the layer is really over the CBUF of DLA.
So it is fallbacked due to a hardware limitation.

Thanks.

Hi,
Thank you very much for your reply.

I apologize for I knew I was supposed to tell you this but I forgot.

This phenomenon also occurs for small Convolution layers.
It also happens only with the Convolution layer that is executed immediately after the Resize layer.

Upload the ONNX model for testing.
resize_conv_op.onnx (1.7 KB)

The following is the log when I tried to run the uploaded ONNX model with DLA.
The two Convolution Layers have the same settings.
A 1x32x32 (CHW) Convolution can usually be run in DLA, but if it is run after the Resize layer, it will be GPU Fall Back.

[W] [TRT] DLA Layer StatefulPartitionedCall/test/conv_2/Conv2D does not support dynamic shapes in any dimension.
[W] [TRT] DLA LAYER: CBUF size requirement for layer StatefulPartitionedCall/test/conv_2/Conv2D is 131072banks, which exceeds the limit (16).
[W] [TRT] Default DLA is enabled but layer StatefulPartitionedCall/test/conv_2/Conv2D is not supported on DLA, falling back to GPU.
[I] [TRT] 
[I] [TRT] --------------- Layers running on DLA: 
[I] [TRT] {StatefulPartitionedCall/test/conv_1/Conv2D}, 
[I] [TRT] --------------- Layers running on GPU: 
[I] [TRT] StatefulPartitionedCall/test/conv_1/Conv2D__5, Resize__16, StatefulPartitionedCall/test/conv_2/Conv2D, StatefulPartitionedCall/test/conv_2/Conv2D__18, 

From this, I suspect that the Resize Layer is the cause.

Do you happen to have any knowledge about this phenomenon?

I appreciate your efforts. Thank you for your cooperation.

Hi,

Sorry that my previous statement may not be clear enough.

There are two possible issue that might cause the CBUF failure.
One is the incorrect formula checker in TensorRT and the other is the CBUF OOM.

The resize_conv_op.onnx may be rejected due to the checker bug.
It looks similar on current JetPack since the fix is not available for Jetson user yet.

Thanks.

Hi,
Oh, I understand. Thank you for your perfect explanation.

Thanks.