BMMA on CC 7.2 for hamming distance

Hello,

As far as I know the tensor cores in the GV100 chip only support floating point numerics (HMMA). The following article states, that the GV10B of the Jetson AGX Xavier also supports (U)INT8 (IMMA): https://developer.nvidia.com/blog/nvidia-jetson-agx-xavier-32-teraops-ai-robotics/

So the GV10B tensor cores are a bit like Turing tensor cores. That lead me to the following question:

The CUDA Toolkit 10.2 as in the current JetPack release supports BMMA experimentally, can the Jetson AGX Xavier perform these operations natively? Sadly, the CUDA docs are a bit lacking for compute capability 7.2 and there is no clear statement for this. Only 7.0 and 7.5 are mentioned explicitly as far as I can tell. If I missed something, I am happy to stand corrected :)

Background is that I want to accelerate the hamming distance between two matrices like AxN and NxB, where the distance of the nth col/row is popcount(a_n XOR b_n). That is exactly what the BMMA operation does as far as I understand.

To answer my own question: The PTX ISA WMMA specification states that BMMA is not supported on the AGX/NX:

Target ISA Notes

Floating point wmma requires sm_70 or higher.
Integer wmma requires sm_72 or higher.
Sub-byte and single-bit wmma requires sm_75 or higher.
Double precision and alternate floating point precision wmma requires sm_80 or higher.

Hi,

As you said, BMMA is not supported on Xavier.
Xavier only support HMMA and IMMA.

Thanks.

Thank you for the confirmation. You wrote “currently”, does that mean that the functionality could be extended? I was assuming that this is a HW limitation.

Hi,

No. Sorry about the confusion.
BMMA requires a Turning GPU, but Xavier is Volta generation.

Thanks.