about the cycles some silly questions

I’ve made 3 kernels:

[codebox]global void inc()

{

float i;

for(i=0;i<100000; i++)

{

	i++;i++;i++;i++;i++;i++;i++;i++;i++;i++; //10 times

}

	a=i;

}[/codebox]

[codebox]global void inc2()

{

float i;

for(i=0;i<1000000; i++)

{

	i++;

}

	a=i;

}[/codebox]

[codebox]global void inc3()

{

float i;

for(i=0;i<10000; i++)

{

	i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;		i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;

	i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;		i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;

	i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;		i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;

	i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;		i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;

	i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;		i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;//100 times

}

	a=i;

}[/codebox]

So the number of +±operation for every kernel should be the same, but profiler shows such data:

grid: 8;1;1; block: 100;0;0

gpu time: 4200.42; 80048.4; 365.568

instructions: 472778; 8.00005e+06; 41254

Why does the third kernel executes so fast, and has so little instructions?

by the way1: where can I read about *.ptx, *.cubin file formats;

by the way2: for what purpose can be used “decuda”, if *.ptx are accessible, I don’t understand;

by the way3: where can I find information for example how memory controller on the video card works, which algorithms uses, etc, on the lowest level

pls forgive my silly questions, i’m just learning :)

Surely those three pieces of code, as written, aren’t the actual kernels you are profiling?

emm… no, this codes are the kernels i’m profiling:
here is a scrennshot
External Image

OK, Just so I understand this correctly, you have three kernels (but not the code you have posted, which is mostly nonsense), and you would like someone to explain for you why they profile differently?

f*ck, I should sleep more…
avidday, thank you, showing me that I’m an idiot :)

My guess is that in all three cases the compiler can figure out at compile time what the final value of the loop will be and simply assign it to a without doing any actual work…