Choosing --useDLACore=1 option dumping the core

trivedi.nagaraj · November 24, 2023, 9:16am

Hi all, I tried configuring the --useDLACore=1 option but it crashed. Not only this but when I mentioned --useDLACore=0 then also it crashed.

When this option was not all used then it worked. Below is the execution output logs when it crashed

./trtexec --onnx=…/data/resnet50/ResNet50.onnx --int8 --useDLACore=1 --loadInputs=~/program/nagaraj/tensor_rt_practice/pytorch_to_trt/input_tensor.dat
[11/24/2023-04:06:54] [I] === Model Options ===
[11/24/2023-04:06:54] [I] Format: ONNX
[11/24/2023-04:06:54] [I] Model: …/data/resnet50/ResNet50.onnx
[11/24/2023-04:06:54] [I] Output:
[11/24/2023-04:06:54] [I] === Build Options ===
[11/24/2023-04:06:54] [I] Max batch: explicit
[11/24/2023-04:06:54] [I] Workspace: 16 MiB
[11/24/2023-04:06:54] [I] minTiming: 1
[11/24/2023-04:06:54] [I] avgTiming: 8
[11/24/2023-04:06:54] [I] Precision: FP32+INT8
[11/24/2023-04:06:54] [I] Calibration: Dynamic
[11/24/2023-04:06:54] [I] Refit: Disabled
[11/24/2023-04:06:54] [I] Sparsity: Disabled
[11/24/2023-04:06:54] [I] Safe mode: Disabled
[11/24/2023-04:06:54] [I] Restricted mode: Disabled
[11/24/2023-04:06:54] [I] Save engine:
[11/24/2023-04:06:54] [I] Load engine:
[11/24/2023-04:06:54] [I] NVTX verbosity: 0
[11/24/2023-04:06:54] [I] Tactic sources: Using default tactic sources
[11/24/2023-04:06:54] [I] timingCacheMode: local
[11/24/2023-04:06:54] [I] timingCacheFile:
[11/24/2023-04:06:54] [I] Input(s)s format: fp32:CHW
[11/24/2023-04:06:54] [I] Output(s)s format: fp32:CHW
[11/24/2023-04:06:54] [I] Input build shapes: model
[11/24/2023-04:06:54] [I] Input calibration shapes: model
[11/24/2023-04:06:54] [I] === System Options ===
[11/24/2023-04:06:54] [I] Device: 0
[11/24/2023-04:06:54] [I] DLACore: 1
[11/24/2023-04:06:54] [I] Plugins:
[11/24/2023-04:06:54] [I] === Inference Options ===
[11/24/2023-04:06:54] [I] Batch: Explicit
[11/24/2023-04:06:54] [I] Input inference shapes: model
[11/24/2023-04:06:54] [I] Iterations: 10
[11/24/2023-04:06:54] [I] Duration: 3s (+ 200ms warm up)
[11/24/2023-04:06:54] [I] Sleep time: 0ms
[11/24/2023-04:06:54] [I] Streams: 1
[11/24/2023-04:06:54] [I] ExposeDMA: Disabled
[11/24/2023-04:06:54] [I] Data transfers: Enabled
[11/24/2023-04:06:54] [I] Spin-wait: Disabled
[11/24/2023-04:06:54] [I] Multithreading: Disabled
[11/24/2023-04:06:54] [I] CUDA Graph: Disabled
[11/24/2023-04:06:54] [I] Separate profiling: Disabled
[11/24/2023-04:06:54] [I] Time Deserialize: Disabled
[11/24/2023-04:06:54] [I] Time Refit: Disabled
[11/24/2023-04:06:54] [I] Skip inference: Disabled
[11/24/2023-04:06:54] [I] Inputs:
[11/24/2023-04:06:54] [I] ~/program/nagaraj/tensor_rt_practice/pytorch_to_trt/input_tensor.dat<-~/program/nagaraj/tensor_rt_practice/pytorch_to_trt/input_tensor.dat
[11/24/2023-04:06:54] [I] === Reporting Options ===
[11/24/2023-04:06:54] [I] Verbose: Disabled
[11/24/2023-04:06:54] [I] Averages: 10 inferences
[11/24/2023-04:06:54] [I] Percentile: 99
[11/24/2023-04:06:54] [I] Dump refittable layers:Disabled
[11/24/2023-04:06:54] [I] Dump output: Disabled
[11/24/2023-04:06:54] [I] Profile: Disabled
[11/24/2023-04:06:54] [I] Export timing to JSON file:
[11/24/2023-04:06:54] [I] Export output to JSON file:
[11/24/2023-04:06:54] [I] Export profile to JSON file:
[11/24/2023-04:06:54] [I]
[11/24/2023-04:06:54] [I] === Device Information ===
[11/24/2023-04:06:54] [I] Selected Device: Xavier
[11/24/2023-04:06:54] [I] Compute Capability: 7.2
[11/24/2023-04:06:54] [I] SMs: 6
[11/24/2023-04:06:54] [I] Compute Clock Rate: 1.109 GHz
[11/24/2023-04:06:54] [I] Device Global Memory: 7773 MiB
[11/24/2023-04:06:54] [I] Shared Memory per SM: 96 KiB
[11/24/2023-04:06:54] [I] Memory Bus Width: 256 bits (ECC disabled)
[11/24/2023-04:06:54] [I] Memory Clock Rate: 1.109 GHz
[11/24/2023-04:06:54] [I]
[11/24/2023-04:06:54] [I] TensorRT version: 8001
[11/24/2023-04:06:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 371, GPU 4527 (MiB)
[11/24/2023-04:06:55] [I] Start parsing network model
[11/24/2023-04:06:55] [I] [TRT] ----------------------------------------------------------------
[11/24/2023-04:06:55] [I] [TRT] Input filename: …/data/resnet50/ResNet50.onnx
[11/24/2023-04:06:55] [I] [TRT] ONNX IR version: 0.0.3
[11/24/2023-04:06:55] [I] [TRT] Opset version: 9
[11/24/2023-04:06:55] [I] [TRT] Producer name: onnx-caffe2
[11/24/2023-04:06:55] [I] [TRT] Producer version:
[11/24/2023-04:06:55] [I] [TRT] Domain:
[11/24/2023-04:06:55] [I] [TRT] Model version: 0
[11/24/2023-04:06:55] [I] [TRT] Doc string:
[11/24/2023-04:06:55] [I] [TRT] ----------------------------------------------------------------
[11/24/2023-04:06:55] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/24/2023-04:06:55] [I] Finish parsing network model
[11/24/2023-04:06:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 471, GPU 4725 (MiB)
[11/24/2023-04:06:55] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[11/24/2023-04:06:55] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 471 MiB, GPU 4725 MiB
[11/24/2023-04:06:55] [W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[11/24/2023-04:06:57] [E] Error[9]: [standardEngineBuilder.cpp::isValidDLAConfig::2189] Error Code 9: Internal Error (Default DLA is enabled but layer (Unnamed Layer* 176) [Shuffle] + (Unnamed Layer* 177) [Shuffle] is not supported on DLA and falling back to GPU is not enabled.)
[11/24/2023-04:06:57] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
Segmentation fault (core dumped)

Thanks and Regards

Nagaraj Trivedi

AastaLLL · November 27, 2023, 2:43am

Hi,

Default DLA is enabled but layer ... is not supported on DLA and falling back to GPU is not enabled.

Based on the log message, the model cannot fully run on DLA.
Please enable the GPU fallback (--allowGPUFallback) to allow TensorRT to place the non-supported layers back to the GPU.

You can find the DLA support matrix below:

Thanks.

trivedi.nagaraj · November 27, 2023, 2:29pm

Hi, thank you for providing this information. It worked. But I found one more issue.
With --enableDLACore the qps(queries per second is very less compared to inferencing without --enableDLACore)
For your reference I have attached two log files, one with option --enableDLACore and the other without --enableDLACore

resnet50_withoutdla.txt (inference with --enableDLACore)
resnet50_without_dla.txt (inference without --enableDLACore)

Also let me know from your experience answering/providing resolution to many such queries on using the option --enableDLACore what type of significant changes we can see during inference.

Please provide the information I have asked above.

Thanks and Regards

Nagaraj Trivedi
resnet50_without_dla.txt (20.2 KB)
resnet50_withdla.txt (12.9 KB)

AastaLLL · November 28, 2023, 5:41am

Hi,

If the inference switches on DLA and GPU frequently, then the data transfer overhead might slow down the task.
For example, dla->GPU->dla->GPU-> …

Thanks.

trivedi.nagaraj · November 28, 2023, 2:50pm

Hi, thank you for providing me the response. It will be helpful for me if you clarify why the --enableDLACore option should be used. What benefit we get executing on this compared to GPU. Does it increase inference speed? or accuracy? or both?
Please clarify.

Thanks and Regards

Nagaraj Trivedi

AastaLLL · November 29, 2023, 7:38am

Hi,

You can find more info in our document:

Q: Why does my network run slower when using DLA compared to without DLA?

A: DLA was designed to maximize energy efficiency. Depending on the features supported by DLA and the features supported by the GPU, either implementation can be more performant. Which implementation to use depends on your latency or throughput requirements and your power budget. Since all DLA engines are independent of the GPU and each other, you could also use both implementations at the same time to further increase the throughput of your network.

Thanks.

system · December 20, 2023, 5:25am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cannot create DLA engine using trtexec Jetson Xavier NX tensorrt	2	1616	October 18, 2021
Dla and tensorcore are used at the same time, performance is degraded Jetson AGX Xavier	9	1097	October 23, 2019
How to use DLA with Tesla T4 TensorRT	2	3003	January 29, 2019
Layers not using DLA cores on NX DeepStream SDK dla	11	1517	October 12, 2021
Tensorrt Python API has a bug in DLA usage Jetson AGX Xavier tensorrt	11	750	August 17, 2022
Unable to use DLA cores in nvinfer DeepStream SDK	9	1007	October 12, 2021
DLA and GPU cores at the same time Jetson AGX Xavier dla	20	10693	October 18, 2021
Unable to use DLA with TensorRT Jetson AGX Xavier	11	3493	November 8, 2018
When converting onnx to engine using trtexec, i set --useDLACore=1; But in inference, the DLA0 is active but DLA1 is suspend? Jetson Xavier NX dla	3	966	August 3, 2022
Enable DLA Mode For A Layer During Network Creation Jetson Xavier NX dla	4	832	October 18, 2021

Choosing --useDLACore=1 option dumping the core

Related topics