PTX assembly -- help!

r00ki · March 16, 2011, 6:49am

Hi,

I am using CUDA runtime API in my application. I have a statement in my kernel which looks like this - X = a*X + b, where X, a, b are unsigned ints. When I checked the generated PTX, it was using mul.lo.u32 and add.u32 instead of mad.lo.u32 instruction. I tried inline assembly but it unnecessarily introduces lots of extra mov instructions.

So,

can I force the nvcc to use mad instruction (compiler directives?)
can I modify the PTX and update the executable? if so how? (without using driver API)

thanks.

philipjfry · March 16, 2011, 10:19am

The central graphics processors do not execute the instruction set described by PTX (although probably a similar one), but there is another compilation step that can perform certain optimizations when creating the byte code for a certain GPU processor. There is not too much revealed, but ptx_isa_2.2.pdf states explicitly in Table 54 about FLOATING POINT operations: Â»In particular, mul/add and mul/sub sequences with no rounding modifiers may be optimized to use fused-multiply-add instructions on the target device.Â« That might be true for integer mad, too, as rounding mode is never a problem here.

AFAIK you can modify the PTX code, the nvcc.pdf gives an overview of the whole compilation process. It is, however, uncertain if this actually changes the resulting instruction stream.

tera · March 16, 2011, 10:34am

The optimization of mul + add into mad happens after the PTX stage. If you want to see it, you have to disassemble the .cubin file with one of the available disassemblers (decuda (for compute capability 1.x only) or nv50dis/nvc0dis together with elfToCubin, or the official cuobjdump (cc 1.x only unless you have the 4.0rc prerelease).

Topic		Replies	Views
umad and Array Indexing CUDA Programming and Performance	1	4921	April 29, 2009
Integer MAD instruction CUDA Programming and Performance	11	17925	October 22, 2010
Generating XMAD{.X,.CC} by PTX CUDA Programming and Performance	4	1188	February 12, 2019
nvcc FMAD detection doesn't seem to work... which syntax for fmad instruction ? CUDA Programming and Performance	4	2997	May 30, 2007
Wrong result returned by madc.hi.u64 ptx instruction for specific operands CUDA NVCC Compiler cuda , ubuntu , nvbugs	4	840	December 2, 2021
why CUDA 2.0 does not expose all PTX ISA 1.3 ? CUDA Programming and Performance	20	27963	November 5, 2008
Examining the generated .ptx file CUDA Programming and Performance	13	2675	October 24, 2014
XMAD meaning CUDA Programming and Performance	17	7245	April 10, 2017
Please explain mad.f32 vs. mul & add CUDA Programming and Performance	2	3854	March 17, 2011
enforcing mad24 instructions any way to force the compiler to fuse subsequent mul + add into mad24 CUDA Programming and Performance	2	1518	March 13, 2009

PTX assembly -- help!

Related topics