long-integer multiplication: mul.wide.u64 and mul.wide.u128

tera · July 11, 2017, 9:48pm

Interesting to read that Nvidia is reusing the single precision multipliers for 16 bit integer multiply. I wasn’t quite sure whether that was the case or not.
While reducing the exposed width to a power of 2 certainly tidied up the interface compared to CC 1.x, I still wonder though whether at some point Nvidia is going to expose the full 24 bits again. With the growing importance of long address calculations, a 64x64 bit multiplication could be achieved with 9 instead of 16 multiplication instructions. For something whose biggest cost would probably be the opcode space it takes up, that seems like an attractive proposition.

Topic		Replies	Views
32-bit number multiplication CUDA Programming and Performance	23	20850	July 1, 2012
XMAD meaning CUDA Programming and Performance	17	7162	April 10, 2017
32-bit multiplication and 64-bit registers CUDA Programming and Performance	6	6217	December 10, 2008
Cuda 3.5 Integer Multiply Performance Is it really 3x slower than 64-bit floating point? CUDA Programming and Performance	21	20304	March 12, 2014
why shift is slower than integer multiply shift ,integer multiply CUDA Programming and Performance	20	6136	July 1, 2010
Integer MAD instruction CUDA Programming and Performance	11	17872	October 22, 2010
PTX u32 wide multiplication How-to and performance characteristics? CUDA Programming and Performance	7	2168	October 12, 2010
Blackwell Integer CUDA Programming and Performance	159	5953	October 31, 2025
Technical questions on GTX1080ti multiplication CUDA Programming and Performance	14	2158	November 11, 2017
how to implement mul.wide.u32 in C code 32-bit multiplication and 64-bit registers CUDA Programming and Performance	4	2379	July 29, 2009

long-integer multiplication: mul.wide.u64 and mul.wide.u128

Related topics