CC12.0 Integer Throughput

@Greg

The newly released documentation for Cuda 12.8 does not contain instruction throughput information for CC10.0 or 12.0.

Are you able to confirm that INT32 throughputs for addition, subtraction, multiply, bit shift and logic (AND/OR/XOR) ops are now 128/cycle on CC10.0 and 12.0 please?

I’d be grateful for a response from anyone at Nvidia.