umad and Array Indexing

shifter1 · April 29, 2009, 1:52pm

I am looking through the CUDA PTX docs and I see that there is a mad instruction which multiplies two numbers together and adds in a third.

In my code, I access a uint64 (long long int) array which results in PTX code like the following:

mul.lo.u64 	%rd11, %rd9, 8;   	// 

	add.u64 	%rd12, %rd3, %rd11;	  //

Where the pointer is in %rd3, and the offset is in %rd11. Wouldn’t this be better off done in a single mad instruction?

Jamie_K · April 29, 2009, 4:51pm

Hm, you would think so.

One thing to consider is that the generated PTX is not final machine code. Decuda may produce something different. PTX has an instruction for integer modulus operation, but the hardware has no such operation, meaning that somewhere along the way it translates into dozens of instructions, either in C to PTX or PTX to cubin.

It would be useful to have a list of operations that are actually implemented on the various devices, and their latency and throughput. (Even if mad for u64 existed, it might be slower than shift and add.) Such a list may exist, but I’ve never seen it.

Topic		Replies	Views
PTX assembly -- help! CUDA Programming and Performance	2	7260	March 16, 2011
PTX u32 wide multiplication How-to and performance characteristics? CUDA Programming and Performance	7	2041	October 12, 2010
how to implement mul.wide.u32 in C code 32-bit multiplication and 64-bit registers CUDA Programming and Performance	4	2285	July 29, 2009
long-integer multiplication: mul.wide.u64 and mul.wide.u128 CUDA Programming and Performance	31	7677	January 2, 2018
Please explain mad.f32 vs. mul & add CUDA Programming and Performance	2	3729	March 17, 2011
Integer MAD instruction CUDA Programming and Performance	11	17671	October 22, 2010
What is the reason why performance deteriorates when PTX code written with pipeline considerations is repeatedly used? CUDA Programming and Performance	4	365	April 28, 2023
32-bit multiplication and 64-bit registers CUDA Programming and Performance	6	6081	December 10, 2008
64 bit add.cc (among others) CUDA Programming and Performance	9	2463	October 3, 2014
Question about 64 Bit Integer Performance CUDA Programming and Performance	12	8977	August 18, 2018

umad and Array Indexing

Related topics