Interesting to read that Nvidia is reusing the single precision multipliers for 16 bit integer multiply. I wasn’t quite sure whether that was the case or not.
While reducing the exposed width to a power of 2 certainly tidied up the interface compared to CC 1.x, I still wonder though whether at some point Nvidia is going to expose the full 24 bits again. With the growing importance of long address calculations, a 64x64 bit multiplication could be achieved with 9 instead of 16 multiplication instructions. For something whose biggest cost would probably be the opcode space it takes up, that seems like an attractive proposition.