TensorRT model inference fully on DLA is slow due to abnormally slow cudaEventSynchronize time

Hi,

1.
GPU might need to do some data formatting for DLA.
You can find the info below:

2.
We expect DLA to be used in low resolution so the perf with fp16 will be slower.

3. INT8 is recommended to improve the DLA perf.

Thanks

1 Like