I think TensorRT supports running networks in either INT8 or FP16 on DLA.
And I can get the max performance of INT8 from the following Technical Brief.
This enables up to 105 INT8 Sparse TOPs total on Jetson AGX Orin DLAs.
However, I cannot find the the max performance of FP16 from the following Technical Brief.
Could anyone tell me the max performance of FP16?
Please check the topic below:
Thank you for your infomation.
I overlooked the topic.
From the topic, I understood that NVIDIA cannot share the exact number here.