Confirming expected performance of INT8 vs. FP16 vs. FP32

I’m having a hard time tracking down specs that compare theoretic performance of INT8/FP16/FP32 operations on the Xavier card. Assuming an efficient deep learning workload (i.e. large batches, large matrix multiply operations) what I see on wikichips (Tegra Xavier - Nvidia - WikiChip) seems to suggest that I can hope for relative speeds of roughly:

1x speed on FP32
2x speed on FP16
160x on INT8

I’d like to get a confirmation that, at least theoretically, that is correct for the Xavier card.
Are there any caveats I should be aware of?

Also, what would the power profile (watts) look like assuming a 100% utlization workload in INT8, FP16, and FP32 mode? Order of magnitude estimates are perfectly acceptable if detailed specs aren’t easily accessible.

Hi DavidParks, please check this blog post, in particular you may find the sections on GPU, DLA, and benchmarks relevant:

[url]https://devblogs.nvidia.com/nvidia-jetson-agx-xavier-32-teraops-ai-robotics/[/url]