time execution HIGHT and LOW !?!?!?

My very problem is this:

example:

global void kernel(unsigned short* dataDevice,uchar4* output)

{

int j = (blockIdx.x) * blockDim.x + threadIdx.x;		

int i = (blockIdx.y) * blockDim.y + threadIdx.y;	

if ((j<width) && (i<height))

{

float luminance;

            float variable;

            ........

             ......

              calculation a corretct index

            ....

            ....

            for (int k = 0; k<index; k++)

            {

                     luminance+=(float)dataDevice[........];

variable += 1; (for example)

            }

           ......

output[i*width+j] = make_uchar4(luminance,luminance,luminance,255);

    }

}

time execution VERY VERY VERY HIGHT

identical except “output[i*width+j] = make_uchar4(luminance,luminance,luminance,255);” but for example output[i*width+j] = make_uchar4(100,100,100,255);" or another variable but not luminance

“output[i*width+j] = make_uchar4(lvariable,variable,variable,255);” (for example)

time execution VERY VERY VERY LOW

why? read/write whit luminance? wait ? conflict? WHY?

My very problem is this:

example:

global void kernel(unsigned short* dataDevice,uchar4* output)

{

int j = (blockIdx.x) * blockDim.x + threadIdx.x;		

int i = (blockIdx.y) * blockDim.y + threadIdx.y;	

if ((j<width) && (i<height))

{

float luminance;

            float variable;

            ........

             ......

              calculation a corretct index

            ....

            ....

            for (int k = 0; k<index; k++)

            {

                     luminance+=(float)dataDevice[........];

variable += 1; (for example)

            }

           ......

output[i*width+j] = make_uchar4(luminance,luminance,luminance,255);

    }

}

time execution VERY VERY VERY HIGHT

identical except “output[i*width+j] = make_uchar4(luminance,luminance,luminance,255);” but for example output[i*width+j] = make_uchar4(100,100,100,255);" or another variable but not luminance

“output[i*width+j] = make_uchar4(lvariable,variable,variable,255);” (for example)

time execution VERY VERY VERY LOW

why? read/write whit luminance? wait ? conflict? WHY?

The compiler is able to optimize away the whole luminance calculation if it’s result is not used.

The compiler is able to optimize away the whole luminance calculation if it’s result is not used.

for example: nThreads (32,16) nBlock(200,400) with luminance ~ 8000 ms !!! (I disabled the “Timeout Detection and Recovery”)

without luminance ~ 0.002 ms

for example: nThreads (32,16) nBlock(200,400) with luminance ~ 8000 ms !!! (I disabled the “Timeout Detection and Recovery”)

without luminance ~ 0.002 ms