Hello,
I tried with TensorFlow but I have the same error with shuffle layers…
/usr/src/tensorrt/bin/trtexec --onnx=resnet50_tf.onnx --best --useDLACore=0 --allowGPUFallback
&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=resnet50_tf.onnx --best --useDLACore=0 --allowGPUFallback
[09/27/2023-09:18:07] [I] === Model Options ===
[09/27/2023-09:18:07] [I] Format: ONNX
[09/27/2023-09:18:07] [I] Model: resnet50_tf.onnx
[09/27/2023-09:18:07] [I] Output:
[09/27/2023-09:18:07] [I] === Build Options ===
[09/27/2023-09:18:07] [I] Max batch: explicit
[09/27/2023-09:18:07] [I] Workspace: 16 MiB
[09/27/2023-09:18:07] [I] minTiming: 1
[09/27/2023-09:18:07] [I] avgTiming: 8
[09/27/2023-09:18:07] [I] Precision: FP32+FP16+INT8
[09/27/2023-09:18:07] [I] Calibration: Dynamic
[09/27/2023-09:18:07] [I] Refit: Disabled
[09/27/2023-09:18:07] [I] Sparsity: Disabled
[09/27/2023-09:18:07] [I] Safe mode: Disabled
[09/27/2023-09:18:07] [I] Restricted mode: Disabled
[09/27/2023-09:18:07] [I] Save engine:
[09/27/2023-09:18:07] [I] Load engine:
[09/27/2023-09:18:07] [I] NVTX verbosity: 0
[09/27/2023-09:18:07] [I] Tactic sources: Using default tactic sources
[09/27/2023-09:18:07] [I] timingCacheMode: local
[09/27/2023-09:18:07] [I] timingCacheFile:
[09/27/2023-09:18:07] [I] Input(s)s format: fp32:CHW
[09/27/2023-09:18:07] [I] Output(s)s format: fp32:CHW
[09/27/2023-09:18:07] [I] Input build shapes: model
[09/27/2023-09:18:07] [I] Input calibration shapes: model
[09/27/2023-09:18:07] [I] === System Options ===
[09/27/2023-09:18:07] [I] Device: 0
[09/27/2023-09:18:07] [I] DLACore: 0(With GPU fallback)
[09/27/2023-09:18:07] [I] Plugins:
[09/27/2023-09:18:07] [I] === Inference Options ===
[09/27/2023-09:18:07] [I] Batch: Explicit
[09/27/2023-09:18:07] [I] Input inference shapes: model
[09/27/2023-09:18:07] [I] Iterations: 10
[09/27/2023-09:18:07] [I] Duration: 3s (+ 200ms warm up)
[09/27/2023-09:18:07] [I] Sleep time: 0ms
[09/27/2023-09:18:07] [I] Streams: 1
[09/27/2023-09:18:07] [I] ExposeDMA: Disabled
[09/27/2023-09:18:07] [I] Data transfers: Enabled
[09/27/2023-09:18:07] [I] Spin-wait: Disabled
[09/27/2023-09:18:07] [I] Multithreading: Disabled
[09/27/2023-09:18:07] [I] CUDA Graph: Disabled
[09/27/2023-09:18:07] [I] Separate profiling: Disabled
[09/27/2023-09:18:07] [I] Time Deserialize: Disabled
[09/27/2023-09:18:07] [I] Time Refit: Disabled
[09/27/2023-09:18:07] [I] Skip inference: Disabled
[09/27/2023-09:18:07] [I] Inputs:
[09/27/2023-09:18:07] [I] === Reporting Options ===
[09/27/2023-09:18:07] [I] Verbose: Disabled
[09/27/2023-09:18:07] [I] Averages: 10 inferences
[09/27/2023-09:18:07] [I] Percentile: 99
[09/27/2023-09:18:07] [I] Dump refittable layers:Disabled
[09/27/2023-09:18:07] [I] Dump output: Disabled
[09/27/2023-09:18:07] [I] Profile: Disabled
[09/27/2023-09:18:07] [I] Export timing to JSON file:
[09/27/2023-09:18:07] [I] Export output to JSON file:
[09/27/2023-09:18:07] [I] Export profile to JSON file:
[09/27/2023-09:18:07] [I]
[09/27/2023-09:18:07] [I] === Device Information ===
[09/27/2023-09:18:07] [I] Selected Device: Xavier
[09/27/2023-09:18:07] [I] Compute Capability: 7.2
[09/27/2023-09:18:07] [I] SMs: 6
[09/27/2023-09:18:07] [I] Compute Clock Rate: 1.109 GHz
[09/27/2023-09:18:07] [I] Device Global Memory: 7773 MiB
[09/27/2023-09:18:07] [I] Shared Memory per SM: 96 KiB
[09/27/2023-09:18:07] [I] Memory Bus Width: 256 bits (ECC disabled)
[09/27/2023-09:18:07] [I] Memory Clock Rate: 1.109 GHz
[09/27/2023-09:18:07] [I]
[09/27/2023-09:18:07] [I] TensorRT version: 8001
[09/27/2023-09:18:11] [I] [TRT] [MemUsageChange] Init CUDA: CPU +354, GPU +0, now: CPU 372, GPU 4477 (MiB)
[09/27/2023-09:18:11] [I] Start parsing network model
[09/27/2023-09:18:11] [I] [TRT] ----------------------------------------------------------------
[09/27/2023-09:18:11] [I] [TRT] Input filename: resnet50_tf.onnx
[09/27/2023-09:18:11] [I] [TRT] ONNX IR version: 0.0.7
[09/27/2023-09:18:11] [I] [TRT] Opset version: 13
[09/27/2023-09:18:11] [I] [TRT] Producer name: tf2onnx
[09/27/2023-09:18:11] [I] [TRT] Producer version: 1.15.1 37820d
[09/27/2023-09:18:11] [I] [TRT] Domain:
[09/27/2023-09:18:11] [I] [TRT] Model version: 0
[09/27/2023-09:18:11] [I] [TRT] Doc string:
[09/27/2023-09:18:11] [I] [TRT] ----------------------------------------------------------------
[09/27/2023-09:18:11] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[09/27/2023-09:18:11] [W] [TRT] ShapedWeights.cpp:173: Weights resnet50/predictions/MatMul/ReadVariableOp:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[09/27/2023-09:18:11] [I] Finish parsing network model
[09/27/2023-09:18:11] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 478, GPU 4681 (MiB)
[09/27/2023-09:18:11] [W] Dynamic dimensions required for input: input, but no shapes were provided. Automatically overriding shape to: 1x224x224x3
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer resnet50/conv1_conv/Conv2D__6 is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer resnet50/pool1_pad/Pad is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer resnet50/avg_pool/Mean is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] DLA only supports FP16 and Int8 precision type. Switching (Unnamed Layer* 122) [Shape] device type to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 123) [Constant] is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 124) [Gather] is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer resnet50/avg_pool/Mean_Squeeze__614 is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer resnet50/predictions/MatMul/ReadVariableOp:0 is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] DLA only supports FP16 and Int8 precision type. Switching (Unnamed Layer* 127) [Shape] device type to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 128) [Constant] is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] (Unnamed Layer* 129) [Concatenation]: DLA only supports concatenation on the C dimension.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 129) [Concatenation] is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 130) [Constant] is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 131) [Gather] is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 132) [Shuffle] is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] DLA only supports FP16 and Int8 precision type. Switching (Unnamed Layer* 134) [Shape] device type to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 135) [Constant] is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 136) [Gather] is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 137) [Shuffle] is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer resnet50/predictions/BiasAdd/ReadVariableOp:0 is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 139) [Shuffle] is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer resnet50/predictions/Softmax is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 142) [Shuffle] is not supported on DLA, falling back to GPU.
[09/27/2023-09:18:11] [W] [TRT] DLA only supports FP16 and Int8 precision type. Switching (Unnamed Layer* 143) [Shape] device type to GPU.
[09/27/2023-09:18:11] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 478 MiB, GPU 4681 MiB
[09/27/2023-09:18:11] [W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[09/27/2023-09:18:12] [W] [TRT] Input tensor has less than 4 dimensions for resnet50/predictions/BiasAdd. At least one shuffle layer will be inserted which cannot run on DLA.
[09/27/2023-09:18:13] [I] [TRT] ---------- Layers Running on DLA ----------
[09/27/2023-09:18:13] [I] [TRT] [DlaLayer] {ForeignNode[Conv__435...resnet50/conv1_relu/Relu]}
[09/27/2023-09:18:13] [I] [TRT] [DlaLayer] {ForeignNode[resnet50/pool1_pool/MaxPool...resnet50/conv5_block3_out/Relu]}
[09/27/2023-09:18:13] [I] [TRT] [DlaLayer] {ForeignNode[resnet50/predictions/MatMul]}
[09/27/2023-09:18:13] [I] [TRT] [DlaLayer] {ForeignNode[resnet50/predictions/BiasAdd]}
[09/27/2023-09:18:13] [I] [TRT] ---------- Layers Running on GPU ----------
[09/27/2023-09:18:13] [I] [TRT] [GpuLayer] resnet50/predictions/BiasAdd/ReadVariableOp:0
[09/27/2023-09:18:13] [I] [TRT] [GpuLayer] resnet50/conv1_conv/Conv2D__6
[09/27/2023-09:18:13] [I] [TRT] [GpuLayer] (Unnamed Layer* 139) [Shuffle]
[09/27/2023-09:18:13] [I] [TRT] [GpuLayer] resnet50/pool1_pad/Pad
[09/27/2023-09:18:13] [I] [TRT] [GpuLayer] resnet50/avg_pool/Mean
[09/27/2023-09:18:13] [I] [TRT] [GpuLayer] (Unnamed Layer* 137) [Shuffle]
[09/27/2023-09:18:13] [I] [TRT] [GpuLayer] shuffle_resnet50/predictions/MatMul:0
[09/27/2023-09:18:13] [I] [TRT] [GpuLayer] shuffle_(Unnamed Layer* 139) [Shuffle]_output
[09/27/2023-09:18:13] [I] [TRT] [GpuLayer] shuffle_resnet50/predictions/BiasAdd:0
[09/27/2023-09:18:13] [I] [TRT] [GpuLayer] resnet50/predictions/Softmax
[09/27/2023-09:18:15] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +192, GPU +250, now: CPU 706, GPU 4971 (MiB)
[09/27/2023-09:18:18] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +307, GPU +513, now: CPU 1013, GPU 5484 (MiB)
[09/27/2023-09:18:18] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[09/27/2023-09:18:37] [W] [TRT] No implementation obeys reformatting-free rules, at least 2 reformatting nodes are needed, now picking the fastest path instead.
[09/27/2023-09:18:37] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[09/27/2023-09:18:37] [I] [TRT] Total Host Persistent Memory: 3408
[09/27/2023-09:18:37] [I] [TRT] Total Device Persistent Memory: 0
[09/27/2023-09:18:37] [I] [TRT] Total Scratch Memory: 0
[09/27/2023-09:18:37] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 65 MiB, GPU 13 MiB
[09/27/2023-09:18:37] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +0, now: CPU 1102, GPU 5789 (MiB)
[09/27/2023-09:18:37] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1102, GPU 5789 (MiB)
[09/27/2023-09:18:37] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1101, GPU 5789 (MiB)
[09/27/2023-09:18:37] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1101, GPU 5789 (MiB)
[09/27/2023-09:18:37] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 1101 MiB, GPU 5789 MiB
[09/27/2023-09:18:38] [I] [TRT] Loaded engine size: 65 MB
[09/27/2023-09:18:38] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 1101 MiB, GPU 5792 MiB
[09/27/2023-09:18:38] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1167, GPU 5854 (MiB)
[09/27/2023-09:18:38] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1167, GPU 5854 (MiB)
[09/27/2023-09:18:38] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 1167, GPU 5854 (MiB)
[09/27/2023-09:18:38] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 1167 MiB, GPU 5854 MiB
[09/27/2023-09:18:38] [I] Engine built in 30.4537 sec.
[09/27/2023-09:18:38] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 995 MiB, GPU 5788 MiB
[09/27/2023-09:18:38] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 995, GPU 5788 (MiB)
[09/27/2023-09:18:38] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 995, GPU 5788 (MiB)
[09/27/2023-09:18:38] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 1061 MiB, GPU 5809 MiB
[09/27/2023-09:18:38] [I] Created input binding for input with dimensions 1x224x224x3
[09/27/2023-09:18:38] [I] Created output binding for predictions with dimensions 1x1000
[09/27/2023-09:18:38] [I] Starting inference
[09/27/2023-09:18:41] [I] Warmup completed 18 queries over 200 ms
[09/27/2023-09:18:41] [I] Timing trace has 298 queries over 3.0293 s
[09/27/2023-09:18:41] [I]
[09/27/2023-09:18:41] [I] === Trace details ===
[09/27/2023-09:18:41] [I] Trace averages of 10 runs:
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.94338 ms - Host latency: 9.99422 ms (end to end 10.003 ms, enqueue 9.78402 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.93923 ms - Host latency: 9.99004 ms (end to end 10.0002 ms, enqueue 9.80433 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.97124 ms - Host latency: 10.0221 ms (end to end 10.0319 ms, enqueue 9.8399 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.94576 ms - Host latency: 9.99662 ms (end to end 10.0036 ms, enqueue 9.814 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.95644 ms - Host latency: 10.0073 ms (end to end 10.017 ms, enqueue 9.78196 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.94634 ms - Host latency: 9.99722 ms (end to end 10.0055 ms, enqueue 9.82343 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.95096 ms - Host latency: 10.0018 ms (end to end 10.0099 ms, enqueue 9.84394 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.0939 ms - Host latency: 10.145 ms (end to end 10.1978 ms, enqueue 10.0139 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.2422 ms - Host latency: 10.2931 ms (end to end 10.3034 ms, enqueue 10.0641 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.0344 ms - Host latency: 10.0853 ms (end to end 10.0973 ms, enqueue 9.87219 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.97233 ms - Host latency: 10.0231 ms (end to end 10.0348 ms, enqueue 9.81057 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.0911 ms - Host latency: 10.1421 ms (end to end 10.1524 ms, enqueue 9.95394 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.041 ms - Host latency: 10.0919 ms (end to end 10.1018 ms, enqueue 9.80658 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.1443 ms - Host latency: 10.1951 ms (end to end 10.206 ms, enqueue 9.9538 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.1061 ms - Host latency: 10.157 ms (end to end 10.1698 ms, enqueue 9.98355 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.1818 ms - Host latency: 10.2327 ms (end to end 10.2437 ms, enqueue 9.94077 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.1613 ms - Host latency: 10.2122 ms (end to end 10.2246 ms, enqueue 9.99011 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.3538 ms - Host latency: 10.4142 ms (end to end 10.4237 ms, enqueue 10.1414 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.4486 ms - Host latency: 10.5115 ms (end to end 10.5203 ms, enqueue 10.2751 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.5079 ms - Host latency: 10.5709 ms (end to end 10.5791 ms, enqueue 10.3812 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.86721 ms - Host latency: 9.93015 ms (end to end 9.93904 ms, enqueue 9.7655 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.85227 ms - Host latency: 9.91536 ms (end to end 9.92612 ms, enqueue 9.72375 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.85117 ms - Host latency: 9.91418 ms (end to end 9.92402 ms, enqueue 9.75522 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.86853 ms - Host latency: 9.93145 ms (end to end 9.94155 ms, enqueue 9.65605 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.83875 ms - Host latency: 9.90159 ms (end to end 9.91301 ms, enqueue 9.69023 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.91931 ms - Host latency: 9.9823 ms (end to end 9.99209 ms, enqueue 9.7384 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 9.89434 ms - Host latency: 9.9594 ms (end to end 9.96948 ms, enqueue 9.77319 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.2608 ms - Host latency: 10.3439 ms (end to end 10.3572 ms, enqueue 10.1076 ms)
[09/27/2023-09:18:41] [I] Average on 10 runs - GPU latency: 10.6794 ms - Host latency: 10.7627 ms (end to end 10.8828 ms, enqueue 10.6707 ms)
[09/27/2023-09:18:41] [I]
[09/27/2023-09:18:41] [I] === Performance summary ===
[09/27/2023-09:18:41] [I] Throughput: 98.3727 qps
[09/27/2023-09:18:41] [I] Latency: min = 9.86865 ms, max = 13.1165 ms, mean = 10.1486 ms, median = 10.011 ms, percentile(99%) = 11.8225 ms
[09/27/2023-09:18:41] [I] End-to-End Host Latency: min = 9.87793 ms, max = 13.137 ms, mean = 10.1639 ms, median = 10.022 ms, percentile(99%) = 11.8826 ms
[09/27/2023-09:18:41] [I] Enqueue Time: min = 8.23511 ms, max = 12.8284 ms, mean = 9.94176 ms, median = 9.89532 ms, percentile(99%) = 11.9259 ms
[09/27/2023-09:18:41] [I] H2D Latency: min = 0.0478516 ms, max = 0.079834 ms, mean = 0.0548499 ms, median = 0.0482178 ms, percentile(99%) = 0.0793457 ms
[09/27/2023-09:18:41] [I] GPU Compute Time: min = 9.80591 ms, max = 13.0334 ms, mean = 10.0906 ms, median = 9.96002 ms, percentile(99%) = 11.7383 ms
[09/27/2023-09:18:41] [I] D2H Latency: min = 0.00256348 ms, max = 0.00463867 ms, mean = 0.00310619 ms, median = 0.00280762 ms, percentile(99%) = 0.00463867 ms
[09/27/2023-09:18:41] [I] Total Host Walltime: 3.0293 s
[09/27/2023-09:18:41] [I] Total GPU Compute Time: 3.007 s
[09/27/2023-09:18:41] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[09/27/2023-09:18:41] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[09/27/2023-09:18:41] [I] Explanations of the performance metrics are printed in the verbose logs.
[09/27/2023-09:18:41] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --onnx=resnet50_tf.onnx --best --useDLACore=0 --allowGPUFallback
[09/27/2023-09:18:41] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 996, GPU 5795 (MiB)
I also tried your code to modify my model to be DLA-compatible but it doesn’t work.
How can I put my model only on DLA please ?
Thanks.