Hi all, I tried configuring the --useDLACore=1 option but it crashed. Not only this but when I mentioned --useDLACore=0 then also it crashed.
When this option was not all used then it worked. Below is the execution output logs when it crashed
./trtexec --onnx=…/data/resnet50/ResNet50.onnx --int8 --useDLACore=1 --loadInputs=~/program/nagaraj/tensor_rt_practice/pytorch_to_trt/input_tensor.dat
[11/24/2023-04:06:54] [I] === Model Options ===
[11/24/2023-04:06:54] [I] Format: ONNX
[11/24/2023-04:06:54] [I] Model: …/data/resnet50/ResNet50.onnx
[11/24/2023-04:06:54] [I] Output:
[11/24/2023-04:06:54] [I] === Build Options ===
[11/24/2023-04:06:54] [I] Max batch: explicit
[11/24/2023-04:06:54] [I] Workspace: 16 MiB
[11/24/2023-04:06:54] [I] minTiming: 1
[11/24/2023-04:06:54] [I] avgTiming: 8
[11/24/2023-04:06:54] [I] Precision: FP32+INT8
[11/24/2023-04:06:54] [I] Calibration: Dynamic
[11/24/2023-04:06:54] [I] Refit: Disabled
[11/24/2023-04:06:54] [I] Sparsity: Disabled
[11/24/2023-04:06:54] [I] Safe mode: Disabled
[11/24/2023-04:06:54] [I] Restricted mode: Disabled
[11/24/2023-04:06:54] [I] Save engine:
[11/24/2023-04:06:54] [I] Load engine:
[11/24/2023-04:06:54] [I] NVTX verbosity: 0
[11/24/2023-04:06:54] [I] Tactic sources: Using default tactic sources
[11/24/2023-04:06:54] [I] timingCacheMode: local
[11/24/2023-04:06:54] [I] timingCacheFile:
[11/24/2023-04:06:54] [I] Input(s)s format: fp32:CHW
[11/24/2023-04:06:54] [I] Output(s)s format: fp32:CHW
[11/24/2023-04:06:54] [I] Input build shapes: model
[11/24/2023-04:06:54] [I] Input calibration shapes: model
[11/24/2023-04:06:54] [I] === System Options ===
[11/24/2023-04:06:54] [I] Device: 0
[11/24/2023-04:06:54] [I] DLACore: 1
[11/24/2023-04:06:54] [I] Plugins:
[11/24/2023-04:06:54] [I] === Inference Options ===
[11/24/2023-04:06:54] [I] Batch: Explicit
[11/24/2023-04:06:54] [I] Input inference shapes: model
[11/24/2023-04:06:54] [I] Iterations: 10
[11/24/2023-04:06:54] [I] Duration: 3s (+ 200ms warm up)
[11/24/2023-04:06:54] [I] Sleep time: 0ms
[11/24/2023-04:06:54] [I] Streams: 1
[11/24/2023-04:06:54] [I] ExposeDMA: Disabled
[11/24/2023-04:06:54] [I] Data transfers: Enabled
[11/24/2023-04:06:54] [I] Spin-wait: Disabled
[11/24/2023-04:06:54] [I] Multithreading: Disabled
[11/24/2023-04:06:54] [I] CUDA Graph: Disabled
[11/24/2023-04:06:54] [I] Separate profiling: Disabled
[11/24/2023-04:06:54] [I] Time Deserialize: Disabled
[11/24/2023-04:06:54] [I] Time Refit: Disabled
[11/24/2023-04:06:54] [I] Skip inference: Disabled
[11/24/2023-04:06:54] [I] Inputs:
[11/24/2023-04:06:54] [I] ~/program/nagaraj/tensor_rt_practice/pytorch_to_trt/input_tensor.dat<-~/program/nagaraj/tensor_rt_practice/pytorch_to_trt/input_tensor.dat
[11/24/2023-04:06:54] [I] === Reporting Options ===
[11/24/2023-04:06:54] [I] Verbose: Disabled
[11/24/2023-04:06:54] [I] Averages: 10 inferences
[11/24/2023-04:06:54] [I] Percentile: 99
[11/24/2023-04:06:54] [I] Dump refittable layers:Disabled
[11/24/2023-04:06:54] [I] Dump output: Disabled
[11/24/2023-04:06:54] [I] Profile: Disabled
[11/24/2023-04:06:54] [I] Export timing to JSON file:
[11/24/2023-04:06:54] [I] Export output to JSON file:
[11/24/2023-04:06:54] [I] Export profile to JSON file:
[11/24/2023-04:06:54] [I]
[11/24/2023-04:06:54] [I] === Device Information ===
[11/24/2023-04:06:54] [I] Selected Device: Xavier
[11/24/2023-04:06:54] [I] Compute Capability: 7.2
[11/24/2023-04:06:54] [I] SMs: 6
[11/24/2023-04:06:54] [I] Compute Clock Rate: 1.109 GHz
[11/24/2023-04:06:54] [I] Device Global Memory: 7773 MiB
[11/24/2023-04:06:54] [I] Shared Memory per SM: 96 KiB
[11/24/2023-04:06:54] [I] Memory Bus Width: 256 bits (ECC disabled)
[11/24/2023-04:06:54] [I] Memory Clock Rate: 1.109 GHz
[11/24/2023-04:06:54] [I]
[11/24/2023-04:06:54] [I] TensorRT version: 8001
[11/24/2023-04:06:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 371, GPU 4527 (MiB)
[11/24/2023-04:06:55] [I] Start parsing network model
[11/24/2023-04:06:55] [I] [TRT] ----------------------------------------------------------------
[11/24/2023-04:06:55] [I] [TRT] Input filename: …/data/resnet50/ResNet50.onnx
[11/24/2023-04:06:55] [I] [TRT] ONNX IR version: 0.0.3
[11/24/2023-04:06:55] [I] [TRT] Opset version: 9
[11/24/2023-04:06:55] [I] [TRT] Producer name: onnx-caffe2
[11/24/2023-04:06:55] [I] [TRT] Producer version:
[11/24/2023-04:06:55] [I] [TRT] Domain:
[11/24/2023-04:06:55] [I] [TRT] Model version: 0
[11/24/2023-04:06:55] [I] [TRT] Doc string:
[11/24/2023-04:06:55] [I] [TRT] ----------------------------------------------------------------
[11/24/2023-04:06:55] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/24/2023-04:06:55] [I] Finish parsing network model
[11/24/2023-04:06:55] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 471, GPU 4725 (MiB)
[11/24/2023-04:06:55] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[11/24/2023-04:06:55] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 471 MiB, GPU 4725 MiB
[11/24/2023-04:06:55] [W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[11/24/2023-04:06:57] [E] Error[9]: [standardEngineBuilder.cpp::isValidDLAConfig::2189] Error Code 9: Internal Error (Default DLA is enabled but layer (Unnamed Layer* 176) [Shuffle] + (Unnamed Layer* 177) [Shuffle] is not supported on DLA and falling back to GPU is not enabled.)
[11/24/2023-04:06:57] [E] Error[2]: [builder.cpp::buildSerializedNetwork::417] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed.)
Segmentation fault (core dumped)
Thanks and Regards
Nagaraj Trivedi