I’m having a hard time tracking down specs that compare theoretic performance of INT8/FP16/FP32 operations on the Xavier card. Assuming an efficient deep learning workload (i.e. large batches, large matrix multiply operations) what I see on wikichips (Tegra Xavier - Nvidia - WikiChip) seems to suggest that I can hope for relative speeds of roughly:
1x speed on FP32
2x speed on FP16
160x on INT8
I’d like to get a confirmation that, at least theoretically, that is correct for the Xavier card.
Are there any caveats I should be aware of?
Also, what would the power profile (watts) look like assuming a 100% utlization workload in INT8, FP16, and FP32 mode? Order of magnitude estimates are perfectly acceptable if detailed specs aren’t easily accessible.