Using Tensor Cores in CUDA Fortran

Originally published at: https://developer.nvidia.com/blog/using-tensor-cores-in-cuda-fortran/

Tensor Cores, which are programmable matrix multiply and accumulate units, were first introduced in the V100 GPUs where they operated on half-precision (16-bit) multiplicands. Tensor Core functionality has been expanded in the following architectures, and in the Ampere A100 GPUs (compute capability 8.0) support for other data types was added, including double precision. Access to…