Hi,
I read about how to use the DLA in this page:
I am running a trtexec command line similar to what written in the page but with my onnx file
and i get the text below, can anyone explain please what does it mean and what to expect when running this net with DLA?
Thanks,
Gabi
./bin/trtexec --onnx=/mnt/nvmedisk/Gabi/Retinanet_ir_3class_resnet50_PT_640x512/trt_ir_3c_b1_640x512.onnx --output=prob --useDLACore=1 --fp16 --allowGPUFallback
&&&& RUNNING TensorRT.trtexec # ./bin/trtexec --onnx=/mnt/nvmedisk/Gabi/Retinanet_ir_3class_resnet50_PT_640x512/trt_ir_3c_b1_640x512.onnx --output=prob --useDLACore=1 --fp16 --allowGPUFallback
[I] onnx: /mnt/nvmedisk/Gabi/Retinanet_ir_3class_resnet50_PT_640x512/trt_ir_3c_b1_640x512.onnx
[I] output: prob
[I] useDLACore: 1
[I] fp16
[I] allowGPUFallback
Input filename: /mnt/nvmedisk/Gabi/Retinanet_ir_3class_resnet50_PT_640x512/trt_ir_3c_b1_640x512.onnx
ONNX IR version: 0.0.4
Opset version: 9
Producer name: pytorch
Producer version: 1.1
Domain:
Model version: 0
Doc string:
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 174) [PluginV2] is not running on DLA, falling back to GPU.
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 177) [PluginV2] is not running on DLA, falling back to GPU.
[W] [TRT] DLA LAYER: CBUF size requirement for layer (Unnamed Layer* 179) [Convolution] is 9banks, which exceeds the limit (8).
[W] [TRT] Default DLA is enabled but layer (Unnamed Layer* 179) [Convolution] is not running on DLA, falling back to GPU.
[W] [TRT] DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer (Unnamed Layer* 180) [Activation]
[W] [TRT] DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer (Unnamed Layer* 276) [Activation]
[W] [TRT] DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer (Unnamed Layer* 278) [Activation]
[W] [TRT] DLA supports only 8 subgraphs per DLA core. Switching to GPU for layer (Unnamed Layer* 279) [Activation]
[I] Average over 10 runs is 73.9307 ms (host walltime is 74.6987 ms, 99% percentile time is 76.5665).
[I] Average over 10 runs is 72.0191 ms (host walltime is 72.4281 ms, 99% percentile time is 76.1508).
[I] Average over 10 runs is 71.4415 ms (host walltime is 71.886 ms, 99% percentile time is 73.8089).
[I] Average over 10 runs is 70.9269 ms (host walltime is 71.1898 ms, 99% percentile time is 74.4588).
[I] Average over 10 runs is 71.2038 ms (host walltime is 71.4089 ms, 99% percentile time is 73.0836).
[I] Average over 10 runs is 69.9025 ms (host walltime is 70.118 ms, 99% percentile time is 72.2458).
[I] Average over 10 runs is 70.6782 ms (host walltime is 70.914 ms, 99% percentile time is 72.2416).
[I] Average over 10 runs is 69.9151 ms (host walltime is 70.18 ms, 99% percentile time is 71.1107).
[I] Average over 10 runs is 69.9766 ms (host walltime is 70.2225 ms, 99% percentile time is 70.7799).
[I] Average over 10 runs is 70.4809 ms (host walltime is 70.6907 ms, 99% percentile time is 72.1121).
&&&& PASSED TensorRT.trtexec # ./bin/trtexec --onnx=/mnt/nvmedisk/Gabi/Retinanet_ir_3class_resnet50_PT_640x512/trt_ir_3c_b1_640x512.onnx --output=prob --useDLACore=1 --fp16 --allowGPUFallback