ptx optimization

e.ping · May 29, 2009, 10:19pm

I was looking through some PTX code generated by nvcc (with /O2 optimization) and it seemed like there were lots of redundant instructions loading immediate values into registers. For instance, in the C/CUDA code I had a float3 with an overloaded multiply operator, like this…
item = item * 0.5

and the PTX code listing looked like this …
mov.f32 %f95, 0f3f000000; // 0.5
mul.f32 %f43, %f43, %f95; //
mov.f32 %f96, 0f3f000000; // 0.5
mul.f32 %f45, %f45, %f96; //
mov.f32 %f97, 0f3f000000; // 0.5
mul.f32 %f47, %f47, %f97; //

So, there are 3 separate instructions loading 0.5 into a register. Is there is a reason the code is generated like this? Would the PTX compiler do any further optimization?

Kalle

SPWorley · May 30, 2009, 12:24am

PTX is not the final cubin code the processor runs. PTX is an intermediate form before any optimization.

To look at your real code, you may try decuda.

cbuchner1 · May 30, 2009, 5:43am

try this code for comparison

volatile float pointfive = 0.5f;

item *= pointfive;

Christian

e.ping · May 30, 2009, 6:20am

Thanks, that’s a useful utility. Looks like those were turned into multiply instructions with 0.5f as the immediate value.

Topic		Replies	Views
ptxas optimization CUDA Programming and Performance	4	2868	January 9, 2009
Does PTX code represent optimization, no optimization, or something in between? ( OpenCL ) CUDA Programming and Performance	0	499	December 4, 2012
PTX assembly language reference does one exist, or plans to release one? CUDA Programming and Performance	6	6777	March 29, 2009
ptx question CUDA Programming and Performance	4	3835	October 16, 2008
Example code using PTX CUDA Programming and Performance	6	8695	March 25, 2008
Difference between the registers usage information showed in ptx file and cubin file CUDA Programming and Performance	4	1336	March 3, 2011
Register economy when using constant make compiler use registers efficiently CUDA Programming and Performance	5	5989	July 30, 2008
Is this a bug of NVCC 5.5 on code generation/optimization? CUDA Programming and Performance	4	806	April 25, 2014
Strange PTX Output CUDA Programming and Performance	9	3292	December 19, 2014
CUDA compiler needs too much help in order to use select instead of branch CUDA Programming and Performance	6	542	October 12, 2021

ptx optimization

Related topics