AnaisG
August 29, 2023, 7:08am
1
Hi,
I’m trying to compile ResNet50 ONXX to TRT on the DLA in a Xavier NX. But the Adaptative Average Pooling falls back to GPU. I tried:
to change the graph of my ONNX model to set count_include_pad=1 for inclusive pooling
to change the jetpack version (4.5.1 to 4.6.4)
Can you tell me if the Adaptative Average Polling is supported on the DLA ? If so, how should I proceed?
Thanks
Hi,
We hope the following document may help you.
If you need further assistance, we are moving this post to the Jetson Xavier NX forum to get better help.
Thank you.
Hi,
Please check the below links, as they might answer your concerns.
Thanks!
Hi,
Based on the document , here is the constraint of the DLA pooling layer:
Pooling layer
Only two spatial dimension operations are supported.
Both FP16 and INT8 are supported.
Operations supported: kMAX, kAVERAGE.
Dimensions of the window must be in the range [1, 8].
Dimensions of padding must be in the range [0, 7].
Dimensions of stride must be in the range [1, 16].
With INT8 mode, input and output tensor scales must be the same.
Thanks.
AnaisG
September 6, 2023, 8:39am
8
Sorry for the delay and thanks all for your answers.
I changed my Adaptive Pooling by an Average Pooling. Now, the pooling runs on the DLA.
However, my ResNet50 still doesn’t run fully on DLA. I can’t use GPU. Since I changed the pooling layer, an Identity and a Shuffle layer has been added during the trt convertion but I can’t see these layers in my ONNX graph. In addition, I read on the documentation:
"For both the ElementWise equal layer and the subsequent IIdentityLayer mentioned above, explicitly set your device types to DLA and their precisions to INT8. Otherwise, these layers will run on the GPU. "
So I tried to convert my model using the following command:
/usr/src/tensorrt/bin/trtexec --onnx=resnet50_new_pool.onnx --useDLACore=0 --best --allowGPUFallback
to allow int8, fp16 and fp32 precisions. But I still have GPU fallbacks :
[09/06/2023-10:28:31] [I] [TRT] ---------- Layers Running on DLA ----------
[09/06/2023-10:28:31] [I] [TRT] [DlaLayer] {ForeignNode[/conv1/Conv.../layer4/layer4.2/relu_2/Relu]}
[09/06/2023-10:28:31] [I] [TRT] [DlaLayer] {ForeignNode[/avgpool/AveragePool.../fc/Gemm]}
[09/06/2023-10:28:31] [I] [TRT] ---------- Layers Running on GPU ----------
[09/06/2023-10:28:31] [I] [TRT] [GpuLayer] (Unnamed Layer* 119) [Identity]
[09/06/2023-10:28:31] [I] [TRT] [GpuLayer] (Unnamed Layer* 124) [Shuffle]
I also tried with ResNet34 and EfficientNet B0 but I still have the problem.
Do you have an idea to help me ?
Best regards.
Hi,
The layer is added automatically to convert the data to be DLA-compatible.
You can do this by feeding the required format directly.
For example:
/usr/src/tensorrt/bin/trtexec --inputIOFormats=fp16:hwc8 --outputIOFormats=fp16:hwc8 ...
Thanks.
AnaisG
September 14, 2023, 6:12am
10
Hello,
Thank you very much for your answer. However, it does not work on my DLA. Indeed, I tried several config but I always have a Segmentation Fault.
For example, I tried :
/usr/src/tensorrt/bin/trtexec --inputIOFormats=fp16:hwc8 --outputIOFormats=fp16:hwc8 --onnx=resnet50_new_pool.onnx --useDLACore=0 --allowGPUFallback
/usr/src/tensorrt/bin/trtexec --inputIOFormats=fp16:chw16 --outputIOFormats=fp16:chw16 --onnx=resnet50_new_pool.onnx --useDLACore=0 --allowGPUFallback
/usr/src/tensorrt/bin/trtexec --inputIOFormats=fp32:chw32 --outputIOFormats=fp32:chw32 --onnx=resnet50_new_pool.onnx --useDLACore=0 --allowGPUFallback
And I always get :
&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --inputIOFormats=fp32:chw32 --outputIOFormats=fp32:chw32 --onnx=resnet50_new_pool.onnx --useDLACore=0 --allowGPUFallback
[09/14/2023-07:42:06] [I] === Model Options ===
[09/14/2023-07:42:06] [I] Format: ONNX
[09/14/2023-07:42:06] [I] Model: resnet50_new_pool.onnx
[09/14/2023-07:42:06] [I] Output:
[09/14/2023-07:42:06] [I] === Build Options ===
[09/14/2023-07:42:06] [I] Max batch: explicit
[09/14/2023-07:42:06] [I] Workspace: 16 MiB
[09/14/2023-07:42:06] [I] minTiming: 1
[09/14/2023-07:42:06] [I] avgTiming: 8
[09/14/2023-07:42:06] [I] Precision: FP32
[09/14/2023-07:42:06] [I] Calibration:
[09/14/2023-07:42:06] [I] Refit: Disabled
[09/14/2023-07:42:06] [I] Sparsity: Disabled
[09/14/2023-07:42:06] [I] Safe mode: Disabled
[09/14/2023-07:42:06] [I] Restricted mode: Disabled
[09/14/2023-07:42:06] [I] Save engine:
[09/14/2023-07:42:06] [I] Load engine:
[09/14/2023-07:42:06] [I] NVTX verbosity: 0
[09/14/2023-07:42:06] [I] Tactic sources: Using default tactic sources
[09/14/2023-07:42:06] [I] timingCacheMode: local
[09/14/2023-07:42:06] [I] timingCacheFile:
[09/14/2023-07:42:06] [I] Input(s): fp32:+chw32
[09/14/2023-07:42:06] [I] Output(s): fp32:+chw32
[09/14/2023-07:42:06] [I] Input build shapes: model
[09/14/2023-07:42:06] [I] Input calibration shapes: model
[09/14/2023-07:42:06] [I] === System Options ===
[09/14/2023-07:42:06] [I] Device: 0
[09/14/2023-07:42:06] [I] DLACore: 0(With GPU fallback)
[09/14/2023-07:42:06] [I] Plugins:
[09/14/2023-07:42:06] [I] === Inference Options ===
[09/14/2023-07:42:06] [I] Batch: Explicit
[09/14/2023-07:42:06] [I] Input inference shapes: model
[09/14/2023-07:42:06] [I] Iterations: 10
[09/14/2023-07:42:06] [I] Duration: 3s (+ 200ms warm up)
[09/14/2023-07:42:06] [I] Sleep time: 0ms
[09/14/2023-07:42:06] [I] Streams: 1
[09/14/2023-07:42:06] [I] ExposeDMA: Disabled
[09/14/2023-07:42:06] [I] Data transfers: Enabled
[09/14/2023-07:42:06] [I] Spin-wait: Disabled
[09/14/2023-07:42:06] [I] Multithreading: Disabled
[09/14/2023-07:42:06] [I] CUDA Graph: Disabled
[09/14/2023-07:42:06] [I] Separate profiling: Disabled
[09/14/2023-07:42:06] [I] Time Deserialize: Disabled
[09/14/2023-07:42:06] [I] Time Refit: Disabled
[09/14/2023-07:42:06] [I] Skip inference: Disabled
[09/14/2023-07:42:06] [I] Inputs:
[09/14/2023-07:42:06] [I] === Reporting Options ===
[09/14/2023-07:42:06] [I] Verbose: Disabled
[09/14/2023-07:42:06] [I] Averages: 10 inferences
[09/14/2023-07:42:06] [I] Percentile: 99
[09/14/2023-07:42:06] [I] Dump refittable layers:Disabled
[09/14/2023-07:42:06] [I] Dump output: Disabled
[09/14/2023-07:42:06] [I] Profile: Disabled
[09/14/2023-07:42:06] [I] Export timing to JSON file:
[09/14/2023-07:42:06] [I] Export output to JSON file:
[09/14/2023-07:42:06] [I] Export profile to JSON file:
[09/14/2023-07:42:06] [I]
[09/14/2023-07:42:06] [I] === Device Information ===
[09/14/2023-07:42:06] [I] Selected Device: Xavier
[09/14/2023-07:42:06] [I] Compute Capability: 7.2
[09/14/2023-07:42:06] [I] SMs: 6
[09/14/2023-07:42:06] [I] Compute Clock Rate: 1.109 GHz
[09/14/2023-07:42:06] [I] Device Global Memory: 7765 MiB
[09/14/2023-07:42:06] [I] Shared Memory per SM: 96 KiB
[09/14/2023-07:42:06] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/14/2023-07:42:06] [I] Memory Clock Rate: 1.109 GHz
[09/14/2023-07:42:06] [I]
[09/14/2023-07:42:06] [I] TensorRT version: 8001
[09/14/2023-07:42:08] [I] [TRT] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 5556 (MiB)
[09/14/2023-07:42:08] [I] Start parsing network model
[09/14/2023-07:42:08] [I] [TRT] ----------------------------------------------------------------
[09/14/2023-07:42:08] [I] [TRT] Input filename: resnet50_new_pool.onnx
[09/14/2023-07:42:08] [I] [TRT] ONNX IR version: 0.0.7
[09/14/2023-07:42:08] [I] [TRT] Opset version: 14
[09/14/2023-07:42:08] [I] [TRT] Producer name: pytorch
[09/14/2023-07:42:08] [I] [TRT] Producer version: 2.0.0
[09/14/2023-07:42:08] [I] [TRT] Domain:
[09/14/2023-07:42:08] [I] [TRT] Model version: 0
[09/14/2023-07:42:08] [I] [TRT] Doc string:
[09/14/2023-07:42:08] [I] [TRT] ----------------------------------------------------------------
[09/14/2023-07:42:08] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/14/2023-07:42:08] [I] Finish parsing network model
[09/14/2023-07:42:08] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 470, GPU 5753 (MiB)
[09/14/2023-07:42:08] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 119) [Identity] is not supported on DLA, falling back to GPU.
[09/14/2023-07:42:08] [W] [TRT] Default DLA is enabled but layer /Flatten is not supported on DLA, falling back to GPU.
[09/14/2023-07:42:08] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 122) [Shuffle] is not supported on DLA, falling back to GPU.
[09/14/2023-07:42:08] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 124) [Shuffle] is not supported on DLA, falling back to GPU.
[09/14/2023-07:42:08] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 470 MiB, GPU 5753 MiB
[09/14/2023-07:42:08] [W] [TRT] output: formats with vectorized dimension require at least 3 dimensions, but dimensions are [1,1000]. Ignoring format CHW32 for type Float.
[09/14/2023-07:42:08] [E] Error[4]: [graphNodes.cpp::checkUserIOFormatsViableHelper::697] Error Code 4: Internal Error (output: no formats available.)
[09/14/2023-07:42:08] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
Segmentation fault (core dumped)
Maybe I didn’t choose the right input or output formats ? Do you have an idea ?
I’m using JetPack4.6.4 and tensorrt 8.0.1.6-1+cuda10.2 with cuda 10.2.460-1
Thanks,
Hi,
You can find the supported DLA input format below:
Could you share the output when running with inputIOFormats=fp16:hwc8 --outputIOFormats=fp16:hwc8 --fp16
with us?
Thanks.
AnaisG
September 18, 2023, 12:04pm
12
Hello,
Thank you for your reply.
I have the same error :
/usr/src/tensorrt/bin/trtexec --inputIOFormats=fp16:hwc8 --outputIOFormats=fp16:hwc8 --onnx=resnet50_new_pool.onnx --useDLACore=0 --allowGPUFallback --fp16
&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --inputIOFormats=fp16:hwc8 --outputIOFormats=fp16:hwc8 --onnx=resnet50_new_pool.onnx --useDLACore=0 --allowGPUFallback --fp16
[09/18/2023-13:50:11] [I] === Model Options ===
[09/18/2023-13:50:11] [I] Format: ONNX
[09/18/2023-13:50:11] [I] Model: resnet50_new_pool.onnx
[09/18/2023-13:50:11] [I] Output:
[09/18/2023-13:50:11] [I] === Build Options ===
[09/18/2023-13:50:11] [I] Max batch: explicit
[09/18/2023-13:50:11] [I] Workspace: 16 MiB
[09/18/2023-13:50:11] [I] minTiming: 1
[09/18/2023-13:50:11] [I] avgTiming: 8
[09/18/2023-13:50:11] [I] Precision: FP32+FP16
[09/18/2023-13:50:11] [I] Calibration:
[09/18/2023-13:50:11] [I] Refit: Disabled
[09/18/2023-13:50:11] [I] Sparsity: Disabled
[09/18/2023-13:50:11] [I] Safe mode: Disabled
[09/18/2023-13:50:11] [I] Restricted mode: Disabled
[09/18/2023-13:50:11] [I] Save engine:
[09/18/2023-13:50:11] [I] Load engine:
[09/18/2023-13:50:11] [I] NVTX verbosity: 0
[09/18/2023-13:50:11] [I] Tactic sources: Using default tactic sources
[09/18/2023-13:50:11] [I] timingCacheMode: local
[09/18/2023-13:50:11] [I] timingCacheFile:
[09/18/2023-13:50:11] [I] Input(s): fp16:+hwc8
[09/18/2023-13:50:11] [I] Output(s): fp16:+hwc8
[09/18/2023-13:50:11] [I] Input build shapes: model
[09/18/2023-13:50:11] [I] Input calibration shapes: model
[09/18/2023-13:50:11] [I] === System Options ===
[09/18/2023-13:50:11] [I] Device: 0
[09/18/2023-13:50:11] [I] DLACore: 0(With GPU fallback)
[09/18/2023-13:50:11] [I] Plugins:
[09/18/2023-13:50:11] [I] === Inference Options ===
[09/18/2023-13:50:11] [I] Batch: Explicit
[09/18/2023-13:50:11] [I] Input inference shapes: model
[09/18/2023-13:50:11] [I] Iterations: 10
[09/18/2023-13:50:11] [I] Duration: 3s (+ 200ms warm up)
[09/18/2023-13:50:11] [I] Sleep time: 0ms
[09/18/2023-13:50:11] [I] Streams: 1
[09/18/2023-13:50:11] [I] ExposeDMA: Disabled
[09/18/2023-13:50:11] [I] Data transfers: Enabled
[09/18/2023-13:50:11] [I] Spin-wait: Disabled
[09/18/2023-13:50:11] [I] Multithreading: Disabled
[09/18/2023-13:50:11] [I] CUDA Graph: Disabled
[09/18/2023-13:50:11] [I] Separate profiling: Disabled
[09/18/2023-13:50:11] [I] Time Deserialize: Disabled
[09/18/2023-13:50:11] [I] Time Refit: Disabled
[09/18/2023-13:50:11] [I] Skip inference: Disabled
[09/18/2023-13:50:11] [I] Inputs:
[09/18/2023-13:50:11] [I] === Reporting Options ===
[09/18/2023-13:50:11] [I] Verbose: Disabled
[09/18/2023-13:50:11] [I] Averages: 10 inferences
[09/18/2023-13:50:11] [I] Percentile: 99
[09/18/2023-13:50:11] [I] Dump refittable layers:Disabled
[09/18/2023-13:50:11] [I] Dump output: Disabled
[09/18/2023-13:50:11] [I] Profile: Disabled
[09/18/2023-13:50:11] [I] Export timing to JSON file:
[09/18/2023-13:50:11] [I] Export output to JSON file:
[09/18/2023-13:50:11] [I] Export profile to JSON file:
[09/18/2023-13:50:11] [I]
[09/18/2023-13:50:11] [I] === Device Information ===
[09/18/2023-13:50:11] [I] Selected Device: Xavier
[09/18/2023-13:50:11] [I] Compute Capability: 7.2
[09/18/2023-13:50:11] [I] SMs: 6
[09/18/2023-13:50:11] [I] Compute Clock Rate: 1.109 GHz
[09/18/2023-13:50:11] [I] Device Global Memory: 7765 MiB
[09/18/2023-13:50:11] [I] Shared Memory per SM: 96 KiB
[09/18/2023-13:50:11] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/18/2023-13:50:11] [I] Memory Clock Rate: 1.109 GHz
[09/18/2023-13:50:11] [I]
[09/18/2023-13:50:11] [I] TensorRT version: 8001
[09/18/2023-13:50:14] [I] [TRT] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 371, GPU 3303 (MiB)
[09/18/2023-13:50:14] [I] Start parsing network model
[09/18/2023-13:50:15] [I] [TRT] ----------------------------------------------------------------
[09/18/2023-13:50:15] [I] [TRT] Input filename: resnet50_new_pool.onnx
[09/18/2023-13:50:15] [I] [TRT] ONNX IR version: 0.0.7
[09/18/2023-13:50:15] [I] [TRT] Opset version: 14
[09/18/2023-13:50:15] [I] [TRT] Producer name: pytorch
[09/18/2023-13:50:15] [I] [TRT] Producer version: 2.0.0
[09/18/2023-13:50:15] [I] [TRT] Domain:
[09/18/2023-13:50:15] [I] [TRT] Model version: 0
[09/18/2023-13:50:15] [I] [TRT] Doc string:
[09/18/2023-13:50:15] [I] [TRT] ----------------------------------------------------------------
[09/18/2023-13:50:15] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/18/2023-13:50:15] [I] Finish parsing network model
[09/18/2023-13:50:15] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 470, GPU 3621 (MiB)
[09/18/2023-13:50:15] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 119) [Identity] is not supported on DLA, falling back to GPU.
[09/18/2023-13:50:15] [W] [TRT] Default DLA is enabled but layer /Flatten is not supported on DLA, falling back to GPU.
[09/18/2023-13:50:15] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 122) [Shuffle] is not supported on DLA, falling back to GPU.
[09/18/2023-13:50:15] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 124) [Shuffle] is not supported on DLA, falling back to GPU.
[09/18/2023-13:50:15] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 470 MiB, GPU 3622 MiB
[09/18/2023-13:50:15] [W] [TRT] output: formats with vectorized dimension require at least 3 dimensions, but dimensions are [1,1000]. Ignoring format HWC8 for type Half.
[09/18/2023-13:50:15] [E] Error[4]: [graphNodes.cpp::checkUserIOFormatsViableHelper::697] Error Code 4: Internal Error (output: no formats available.)
[09/18/2023-13:50:15] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
Segmentation fault (core dumped)
If I understood well, there is a problem with the output but I didn’t change anything from the original ResNet50.
Thanks.
Hi,
We try to reproduce this issue with TensorRT’s model (/usr/src/tensorrt/data/resnet50/ResNet50.onnx )
But it gets stuck at a non-supported layer which seems not aligned to your observation.
...
[09/20/2023-02:15:45] [I] Finish parsing network model
[09/20/2023-02:15:45] [E] Error[2]: [network.cpp::operator()::2682] Error Code 2: Internal Error (Assertion allowGPUFallback failed. Layer 'node_of_OC2_DUMMY_0' is not supported on DLA but GPU fallback is not enabled.)
[09/20/2023-02:15:45] [E] Error[4]: [network.cpp::validate::2789] Error Code 4: Internal Error (DLA validation failed)
[09/20/2023-02:15:45] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[09/20/2023-02:15:45] [E] Engine could not be created from network
[09/20/2023-02:15:45] [E] Building engine failed
[09/20/2023-02:15:45] [E] Failed to create engine from model or file.
[09/20/2023-02:15:45] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/resnet50/ResNet50.onnx --useDLACore=0
Do you use a custom model? If yes, could you share the model with us?
Thanks.
AnaisG
September 20, 2023, 5:40am
14
Hello,
Thanks for your answer.
I don’t use custom layer, I used the ResNet50 model from torchvision. Then, I only changed the AdaptiveAveragePooling() by AvgPool2d(). Finally, I convert my model in an ONNX model. However, as you can see in my message of the 6th of September, the average pooling works well on DLA. I only have problem with the Identity and the Shuffle Layers added during the TensorRT conversion.
import torch
import torch.nn as nn
from torchvision.models import resnet50, ResNet50_Weights
model = resnet50(weights = ResNet50_Weights.IMAGENET1K_V2)
input_size = 7
output_size = 1
stride = (input_size//output_size)
kernel_size = input_size-(output_size-1)*stride
padding = 0
model_new = resnet50(weights=ResNet50_Weights.IMAGENET1K_V2)
model_new.avgpool = nn.AvgPool2d(kernel_size,
stride=stride,
padding=padding,
count_include_pad=True)
Thanks
Hi,
Would you mind also sharing the ONNX model with us?
Thanks.
AnaisG
September 20, 2023, 9:13am
16
Hello,
you will find enclosed the ONNX model.
Thanks,
resnet50_newpool.onnx (97.4 MB)
Hi,
We test your model and output is different compared to your log.
In our experiment, the DLA engine fails to generate due to a non-supported layer (Identity):
[09/21/2023-15:21:32] [W] [TRT] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/21/2023-15:21:32] [I] Finish parsing network model
[09/21/2023-15:21:32] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[09/21/2023-15:21:32] [E] Error[2]: [network.cpp::operator()::2682] Error Code 2: Internal Error (Assertion allowGPUFallback failed. Layer '(Unnamed Layer* 119) [Identity]' is not supported on DLA but GPU fallback is not enabled.)
[09/21/2023-15:21:32] [E] Error[4]: [network.cpp::validate::2789] Error Code 4: Internal Error (DLA validation failed)
[09/21/2023-15:21:32] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[09/21/2023-15:21:32] [E] Engine could not be created from network
[09/21/2023-15:21:32] [E] Building engine failed
[09/21/2023-15:21:32] [E] Failed to create engine from model or file.
[09/21/2023-15:21:32] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=resnet50_newpool.onnx --int8 --useDLACore=0
Checking with polygraphy tool, the layer is added between activation and the average pooling layer.
$ git clone -b release/8.5 https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT/tools/Polygraphy/
$ sudo make install
$ polygraphy convert resnet50_newpool.onnx --convert-to=onnx-like-trt-network --fp16 --tensor-formats=input.1:[hwc8] --tensor-formats=output:[hwc8] -o resnet50_newpool.pb
Is it possible to remove it? It looks like related to the padding.
Thanks.
AnaisG
September 21, 2023, 9:17am
18
Hello,
Thanks for your reply. I managed to remove the identity layer. The Segmentation Fault not appears anymore now. But I still have the problem of the shuffle layer as indicated in my previous messages.
I got this issue:
[09/21/2023-11:07:16] [I] [TRT] ---------- Layers Running on DLA ----------
[09/21/2023-11:07:16] [I] [TRT] [DlaLayer] {ForeignNode[/conv1/Conv.../fc/Gemm]}
[09/21/2023-11:07:16] [I] [TRT] ---------- Layers Running on GPU ----------
[09/21/2023-11:07:16] [I] [TRT] [GpuLayer] (Unnamed Layer* 123) [Shuffle]
Do you have this problem ? And do you know how to solve it ?
Thanks.
Hi,
Based on the TensorRT log, the Shuffle layer is added by the usage of the Flatten layer.
...
[09/22/2023-11:08:14] [I] Finish parsing network model
[09/22/2023-11:08:14] [W] [TRT] Layer '(Unnamed Layer* 119) [Identity]' (CAST): Unsupported on DLA. Switching this layer's device type to GPU.
[09/22/2023-11:08:14] [W] [TRT] Layer '/Flatten' (SHUFFLE): Unsupported on DLA. Switching this layer's device type to GPU.
[09/22/2023-11:08:14] [W] [TRT] Layer 'fc.weight' (CONSTANT): Unsupported on DLA. Switching this layer's device type to GPU.
[09/22/2023-11:08:14] [W] [TRT] Layer 'fc.bias' (CONSTANT): Unsupported on DLA. Switching this layer's device type to GPU.
[09/22/2023-11:08:14] [W] [TRT] Layer '(Unnamed Layer* 125) [Shuffle]' (SHUFFLE): Unsupported on DLA. Switching this layer's device type to GPU.
...
Thanks
AnaisG
September 22, 2023, 5:28am
20
Hi,
So, you confirm that it’s not possible to run an entire ResNet50 only on DLA?
Hi,
You can try to modify the model so it won’t need a Shuffle layer.
We have a script that can modify the model to be DLA-compatible.
However, it will need some modification for TorchVision’s ResNet50.
Could you give it a try?
Install our ONNX graphsurgeon first. steps .
Modify the model with the below script:
#
# SPDX-FileCopyrightText: Copyright (c) 2022-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: MIT
#
# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual
# property and proprietary rights in and to this material, related
# documentation and any modifications thereto. Any use, reproduction,
# disclosure or distribution of this material and related documentation
# without an express license agreement from NVIDIA CORPORATION or
# its affiliates is strictly prohibited.
#
"""ONNX preparation for ResNet-50."""
import os
import onnx
import numpy as np
import onnx_graphsurgeon as gs
from onnx import shape_inference
import common
This file has been truncated. show original
Thanks.
AnaisG
September 22, 2023, 7:11am
22
Thanks for your help. I will try on my ResNet50 !
I will get back to you as soon as possible.