I’m using A100 and NGC to tune the HPL benchmark. The document mentioned that HPL-AI uses Tensor Core to improve the performance, but HPL is not mentioned. I wonder whether HPL can utilize the fp64 Tensor Core? And if so, what is the mechanism?
Thanks for your reply.
FP64 Tensor Cores are used by default for FP64 GEMMs
So… What is the real theoretical peak TFlops of A100 GPU in Linpack test? Is it just simply 9.7 TFlops by cuda core, 19.5TFlops by FP64 Tensor Core, or, the combination of the two? Thanks!
HPL is compute bound, so the theoretical peak would be dependent on the tensor cores.