Mixed Precision (Tensor) vs raw FP16 / raw FP32 Compute Metrics

There is little specificity from NVIDIA (that I’ve been able to find) regarding the relative performance of the Xavier AGX for several common metrics. Namely, the technical splash page for the AGX specifies a FP16 metric of 16 TFLOPs. Yet, I get the sense that this is NOT raw FP16 compute but rather Tensor ‘mixed precision’ compute? Is this assumption correct?

Further, there is no mention of raw FP32 or INT4 compute expectations. Since this architecture is Volta, I presume INT4 is not supported? But what are compute expectations for FP32? Some sources cite 1.4 TFLOPs, but I’m struggling to find anything official from NVIDIA.


Xavier doesn’t support INT4 currently.

Do you mention the metrics shared below?
Basically, it’s measured by low-level instructions type rather than inference precision.


Hey @AastaLLL, thanks for the response.

Yes, those are the metrics I am curious about. There is a fair amount of detail missing in what is reported publicly. Namely, I’m looking for:

  1. Is the FP16 TFLOP metric for mixed precision? If so, what configuration? i.e., FP16 acc or FP32 acc? etc.
  2. What is expected for FP32 TFLOP? Some sources cite 1.4 TFLOPS, but I cannot find anything from NVIDIA.


No. It is low-level profiling, which is directly measured by half mode calculation.

We don’t have an FP32 TFLOP score.
You can find detailed FP16 and INT8 compute data below: