Hello Experts,
CC: @Honey_Patouceul @DaneLLL @amycao @kayccc @icornejo.a @AastaLLL @dusty_nv @Hodu @Jeffli
Curious to know that the tensorrt SDK will use GPU alone or mix of CPU + GPU to run the inference and other functionalities.
Hello Experts,
CC: @Honey_Patouceul @DaneLLL @amycao @kayccc @icornejo.a @AastaLLL @dusty_nv @Hodu @Jeffli
Curious to know that the tensorrt SDK will use GPU alone or mix of CPU + GPU to run the inference and other functionalities.
Not sure, but AFAIK it would mainly use GPU, and maybe DLA on Xavier as well.
Someone more skilled with this topic may better advise.
Hi,
As Honey_Patouceul said, TensorRT mainly use GPU for inference.
On Xavier, TensorRT also supports DLA inference.
Thanks.
Hi,
You can get the layer-level profiling data directly with our trtexec app with --dumpProfile
flag.
For example, here is the output from YOLOv3 Tiny model:
/usr/src/tensorrt/bin/trtexec [your/model/info] --dumpProfile
...
[11/04/2020-17:35:48] [I] === Profile (265 iterations ) ===
[11/04/2020-17:35:48] [I] Layer Time (ms) Avg. Time (ms) Time %
[11/04/2020-17:35:48] [I] conv_1 201.47 0.76 6.4
[11/04/2020-17:35:48] [I] leaky_1 76.22 0.29 2.4
[11/04/2020-17:35:48] [I] maxpool_2 53.42 0.20 1.7
[11/04/2020-17:35:48] [I] conv_3 131.26 0.50 4.2
[11/04/2020-17:35:48] [I] leaky_3 38.95 0.15 1.2
[11/04/2020-17:35:48] [I] maxpool_4 27.38 0.10 0.9
[11/04/2020-17:35:48] [I] conv_5 113.38 0.43 3.6
[11/04/2020-17:35:48] [I] leaky_5 20.45 0.08 0.6
[11/04/2020-17:35:48] [I] maxpool_6 15.69 0.06 0.5
[11/04/2020-17:35:48] [I] conv_7 123.34 0.47 3.9
[11/04/2020-17:35:48] [I] leaky_7 11.17 0.04 0.4
[11/04/2020-17:35:48] [I] maxpool_8 9.00 0.03 0.3
[11/04/2020-17:35:48] [I] conv_9 133.43 0.50 4.2
[11/04/2020-17:35:48] [I] leaky_9 6.77 0.03 0.2
[11/04/2020-17:35:48] [I] maxpool_10 5.49 0.02 0.2
[11/04/2020-17:35:48] [I] conv_11 139.17 0.53 4.4
[11/04/2020-17:35:48] [I] leaky_11 4.22 0.02 0.1
[11/04/2020-17:35:48] [I] maxpool_12 9.96 0.04 0.3
[11/04/2020-17:35:48] [I] conv_13 497.09 1.88 15.8
[11/04/2020-17:35:48] [I] leaky_13 6.63 0.03 0.2
[11/04/2020-17:35:48] [I] conv_14 826.62 3.12 26.2
[11/04/2020-17:35:48] [I] leaky_14 2.95 0.01 0.1
[11/04/2020-17:35:48] [I] conv_19 15.74 0.06 0.5
[11/04/2020-17:35:48] [I] conv_15 139.78 0.53 4.4
[11/04/2020-17:35:48] [I] postMul_19 0.27 0.00 0.0
[11/04/2020-17:35:48] [I] leaky_19 2.81 0.01 0.1
[11/04/2020-17:35:48] [I] preMul_19 0.25 0.00 0.0
[11/04/2020-17:35:48] [I] mm1_19 24.34 0.09 0.8
[11/04/2020-17:35:48] [I] mm2_19 6.66 0.03 0.2
[11/04/2020-17:35:48] [I] (Unnamed Layer* 42) [Matrix Multiply]_output copy 4.60 0.02 0.1
[11/04/2020-17:35:48] [I] leaky_15 4.50 0.02 0.1
[11/04/2020-17:35:48] [I] conv_16 42.02 0.16 1.3
[11/04/2020-17:35:48] [I] yolo_17 10.26 0.04 0.3
[11/04/2020-17:35:48] [I] conv_22 376.30 1.42 11.9
[11/04/2020-17:35:48] [I] leaky_22 6.54 0.02 0.2
[11/04/2020-17:35:48] [I] conv_23 45.43 0.17 1.4
[11/04/2020-17:35:48] [I] yolo_24 22.44 0.08 0.7
[11/04/2020-17:35:48] [I] Total 3156.02 11.91 100.0
Thanks.