Why tensor cores can't do FP32 arithmetic?

I understand that tensor cores are particularly used for low precision and mixed precision computation. I have also noticed that they can operate on fp64 data without any loss of precision. How are they able to do this? And not able to do fp32 arithmetic?

1 Like

I’m not an internal expert at NVIDIA, but I recall that in the early days, Tensor Cores only supported half precision (on V100). Later, support for more formats and sizes was gradually added. It’s likely a trade-off, isn’t it? Since large models rarely use FP64 for computation, there’s probably no Tensor Core support for it.

The FP64 Tensor Core support is for the datacenter cards and for compatibility with low speed for all other GPUs.

FP64 is mostly used for scientific calculations and uses dedicated hardware, FP32 can be done with normal (non-Tensor Core) operations.

Do do FP32 with Tensor Cores:
If you have a datacenter card, just use FP64; otherwise there are ways to increase the bit size by combining several lower precision INT or FP tensor core operations manually.

1 Like


Wow, FP64 is much slower than FP16 and TF32 in Tensor Core…

Those numbers probably use sparse matrices for FP16. Up till H100 the factor is 1:16 (for the enterprise cards), which is okay, as the computational overhead increases with the square of the bitsize.

However, with Blackwell it is 1:64. Nvidia reduces the FP64 Tensor core performance.

1 Like