HI ,
We tried DLA acceleration. But when we run sample, we find that the speed is slower. Is this normal? Or is there something wrong with us?
here goes the steps:
1、We go into the samples folder of TensorRT and then into the sampleINT8 folder。
2、Compile the code without any modifications to the code and model.
3、Execute the corresponding executable file with the correct parameters。
4、get the result。
the information about our hardware and software:
AGX Xavier
TensorRT5.0.3
(1)run the sample_int8 with DLA and got the following message:
nvidia@tegra-ubuntu:/usr/src/tensorrt/bin$ ./sample_int8 mnist useDLACore=1
DLA requested. Disabling for FP32 run since its not supported.
FP32 run:400 batches of size 30 starting at 100
…
Top1: 0.989833, Top5: 1
Processing 12000 images averaged 0.0176231 ms/image and 0.528693 ms/batch.
FP16 run:400 batches of size 30 starting at 100
WARNING: Default DLA is enabled but layer prob is not running on DLA, falling back to GPU.
…
Top1: 0.92925, Top5: 0.9675
Processing 12000 images averaged 0.193493 ms/image and 5.80478 ms/batch.
DLA requested. Disabling for Int8 run since its not supported.
INT8 run:400 batches of size 30 starting at 100
…
Top1: 0.990167, Top5: 1
Processing 12000 images averaged 0.0362652 ms/image and 1.08796 ms/batch.
(2)run the sample_int8 without DLA and got the following message:
nvidia@tegra-ubuntu:/usr/src/tensorrt/bin$ ./sample_int8 mnist
FP32 run:400 batches of size 30 starting at 100
…
Top1: 0.989833, Top5: 1
Processing 12000 images averaged 0.0176359 ms/image and 0.529076 ms/batch.
FP16 run:400 batches of size 30 starting at 100
…
Top1: 0.98975, Top5: 1
Processing 12000 images averaged 0.0169346 ms/image and 0.508038 ms/batch.
INT8 run:400 batches of size 30 starting at 100
…
Top1: 0.990167, Top5: 1
Processing 12000 images averaged 0.0144124 ms/image and 0.432372 ms/batch.
from the above messages,we can got a slower speed when we use DLA with FP16.Even the speed 10 times slower than that without use DLA.
Is it normal or we need do some changes.
Thanks