I’ve made 3 kernels:
[codebox]global void inc()
{
float i;
for(i=0;i<100000; i++)
{
i++;i++;i++;i++;i++;i++;i++;i++;i++;i++; //10 times
}
a=i;
}[/codebox]
[codebox]global void inc2()
{
float i;
for(i=0;i<1000000; i++)
{
i++;
}
a=i;
}[/codebox]
[codebox]global void inc3()
{
float i;
for(i=0;i<10000; i++)
{
i++;i++;i++;i++;i++;i++;i++;i++;i++;i++; i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;
i++;i++;i++;i++;i++;i++;i++;i++;i++;i++; i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;
i++;i++;i++;i++;i++;i++;i++;i++;i++;i++; i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;
i++;i++;i++;i++;i++;i++;i++;i++;i++;i++; i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;
i++;i++;i++;i++;i++;i++;i++;i++;i++;i++; i++;i++;i++;i++;i++;i++;i++;i++;i++;i++;//100 times
}
a=i;
}[/codebox]
So the number of +±operation for every kernel should be the same, but profiler shows such data:
grid: 8;1;1; block: 100;0;0
gpu time: 4200.42; 80048.4; 365.568
instructions: 472778; 8.00005e+06; 41254
Why does the third kernel executes so fast, and has so little instructions?
by the way1: where can I read about *.ptx, *.cubin file formats;
by the way2: for what purpose can be used “decuda”, if *.ptx are accessible, I don’t understand;
by the way3: where can I find information for example how memory controller on the video card works, which algorithms uses, etc, on the lowest level
pls forgive my silly questions, i’m just learning :)