Preventing code illimination for nvcc ?

Skybuck · June 20, 2011, 6:51am

Hello,

As far as I can tell, simple kernel code like the one below gets completed illiminated by the nvcc/cuda compiler, is it possible to turn this “code illimination” off ? (I already tried -O0 (doesn’t seem to help, code is still illiminated(?!?))) External Image =D

adder.cu:

global void kernel( int a, int b )
{
int c;

c = a + b;

}

adder.ptx:

.version 1.4
.target sm_10, map_f64_to_f32
.entry _Z6kernelii (
	.param .s32 __cudaparm__Z6kernelii_a,
	.param .s32 __cudaparm__Z6kernelii_b)
{
.loc	16	1	0

$LDWbegin__Z6kernelii:
.loc 16 6 0
exit;
$LDWend__Z6kernelii:
} // _Z6kernelii

I’d like to have some simple kernels be generated so I can test them in visual studio cuda debugger of visual profiler…

Perhaps this empty kernel will already do the trick, but I’d much rather see a simple addition done so I know the parameters were passed correctly…

Bye,
Skybuck.

hyqneuron · June 20, 2011, 8:28pm

No, anything that does not write to any type of memory gets eliminated by the front-end no matter what the optimization level is.

I think this should be considered as a bug on the front-end’s side.

tera · June 21, 2011, 12:19am

You can at least preserve the code up to the PTX stage by giving nvcc the flag [font=“Courier New”]–opencc-options=-O0[/font]. ptxas will still optimize it away though, even with [font=“Courier New”]–ptxas-options=-O0[/font].

Skybuck · June 21, 2011, 5:48am

So this means even hand writing a ptx file would be useless since hand writing assembler instructions would be “optimized/illiminated” away ?! :(

Assuming ptxas is called before the ptx file is executed by the driver ?

Or perhaps ptxas is only for cubins or so ?

I tried executing such an unoptimized ptx file… which had 6 registers and some mov’s… so far according to visual profiler it only executed 2 instructions or so… so this could be an indication that the runtime environment/and/or driver optimizes the files before running, so this code gets illiminated…

The CUDA Driver API does contain this enumeration:

	//
	// Level of optimizations to apply to generated code (0 - 4), with 4
	// being the default and highest level of optimizations.
	// Option type: unsigned int
	//
	CU_JIT_OPTIMIZATION_LEVEL,

I am not yet sure what it’s for… or which api function it’s for… something about “online” compiler…

Ok I see it’s for:

cuModuleLoadDataEx

So perhaps with this API it’s possible to prevent the code illimination from happening… when loading the ptx file…

However there is a little problem… the api is not really for a file… it’s only for “an image”, which is some kind of memory structure.

hyqneuron · June 21, 2011, 2:47pm

If you want control over what code goes into the image, perhaps you really have to use an assembler. I’m working on one; PathScale also has one. Though the progress on PathScale’s side seems to have stopped (at least not that we could see. Perhaps if you buy their Enzo and they will tell you more).

Skybuck · June 21, 2011, 2:50pm

Hmmm what will the input be to your assembler ? Will it be PTX ? or perhaps some higher language ? :)

hyqneuron · June 21, 2011, 3:03pm

It takes the format of the output of cuobjdump and it does no optimization

So far it’is only partially functional because I haven’t implemented the rules for quite a lot of instructions.

It won’t be complete for at least another two months unless someone else is willing to take up my work…

Skybuck · June 21, 2011, 4:14pm

Could you perhaps copy & paste a small example of such a cuobjdump ? might be interesting to include such examples on your website as well, so people can get an idea of what this is all about External Image

hyqneuron · June 21, 2011, 6:40pm

You can try this: cuobjdump -sass cubinFileName_or_executableFileName

Just make sure you’re on toolkit 4.0

Skybuck · June 21, 2011, 8:58pm

Yeah, but the problem is I have no decent kernels yet (or any decent cubins !?!), so perhaps you can provide a little example ?

(I am not even sure what a cubin is… I think I asked question about that… maybe I forgot answer External Image External Image

I did take look at cubin.pdf or something like that… with the micro instruction set, looks somewhat interesting ! External Image

Also I am not sure if cubins would be usefull for me ?!?

Perhaps they can be loaded and used as an image to LoadDataEx ?

Topic		Replies	Views
Compiler code optimisation CUDA Programming and Performance	4	5015	July 1, 2008
How to disable optimization and analyze ptx assembly code CUDA Programming and Performance	4	3563	November 30, 2009
Example code using PTX CUDA Programming and Performance	6	9083	March 25, 2008
OptiX sample from scratch OptiX	3	690	June 14, 2022
How can I set the optimization level when compiling a .ptx to a .cubin file? CUDA Programming and Performance	0	4184	June 28, 2010
ptx optimization CUDA Programming and Performance	3	1211	May 30, 2009
Ptx is not embedded in the binary CUDA NVCC Compiler	1	1027	January 28, 2022
A question about command options of -ptx,-cubin How to use command options correctly ? CUDA Programming and Performance	4	6629	May 12, 2008
CUDA low-level programming - strange ptxas behavior CUDA Programming and Performance	4	1549	February 17, 2014
Preventing Nvcc from calling host compiler CUDA Programming and Performance	1	3307	August 15, 2011

Preventing code illimination for nvcc ?

Related topics