How to disable optimization and analyze ptx assembly code

Consider the code

__global__ void someFunc(){

	for(unsigned int i = 0; i < 10000000; i++){




which skips some numbers (offset). generateNextRandomNumber() is device function that returns next random number and accordingly changes the state of the generator.

When I run this code in a single block with a single thread and measure the runtime, it takes no time to execute (I use cudaThreadSynchronise() after the kernel call). No matter how big the i is, it always runs in 0 ms.

I decided to check whether the compiler optimizes the code. I use CUDA build rule shipped with 2.3 SDK. I disabled optimization for both solution and .cu file that contains kernels (Properties -> CUDA Build Rule v2.3.0 -> General -> Optimization set to Disabled (/Od)). After that built the project with preprocessed files kept and took a look at ptx file:


	// Options:


	//  Target:ptx, ISA:sm_13, Endian:little, Pointer Size:64

	//  -O3	(Optimization level)

	//  -g0	(Debug level)

	//  -m2	(Report advisories)


Down the file there was

.entry _Z22testMersenneOffsetTimev


	.loc	2	179	0


	.loc	2	192	0



	} // _Z22testMersenneOffsetTimev

I don’t know how to read this assembly code but is seems to me that it’s hardly optimized :turned:

Well, the same results I got passing -Xopencc -O0 and/or -Xptxas -O0 command line options.

Note that I used project clean/rebuild every time, and the ptx file was updated.

Note that my kernel is in and I looked into kernels.ptx file.

Note that changing GPU architecture from 1.3 to 1.2 actually had an effect on ptx file (ISA:sm_13 became ISA:sm_12).


  1. Am I doing something wrong, e.g. looking into the wrong file?

  2. If not, how do I disable optimization?

Sorry for such a long reading :teehee:

Dear CUDA and Visual Studio gurus,

Any ideas?


Compiling for debugging should turn all the optimizations off. IIRC It will also spill a lot of stuff into local memory, so if you are interested in register level stuff if might not be so useful. If you are getting stub functions like that it is usually a sign that the dead code optimizer is culling everything away. Adding a dummy global or shared memory store using a variable who result is carried through the code usually cures it.

Is there any way to disable “dead code optimizer”?

Yes, adding some memory writes helps, but it is not a good solution.

What I want is to see

//  -O0	(Optimization level)

in my ptx file =)

I don’t think that is an optimization you can turn off. Certainly doesn’t appear to be an option in opencc for it (and lots of opencc options in the NVISA port don’t seem to do anything anyway).