I think TensorRT supports running networks in either INT8 or FP16 on DLA.
And I can get the max performance of INT8 from the following Technical Brief.
This enables up to 105 INT8 Sparse TOPs total on Jetson AGX Orin DLAs.
However, I cannot find the the max performance of FP16 from the following Technical Brief.
Could anyone tell me the max performance of FP16?
Please check the topic below:
Thank you for your infomation.
I overlooked the topic.
From the topic, I understood that NVIDIA cannot share the exact number here.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.
Please also refer to the FAQ page that addresses some common questions that we see developers run into: Deep-Learning-Accelerator-SW/FAQ