My very problem is this:
example:
global void kernel(unsigned short* dataDevice,uchar4* output)
{
int j = (blockIdx.x) * blockDim.x + threadIdx.x;
int i = (blockIdx.y) * blockDim.y + threadIdx.y;
if ((j<width) && (i<height))
{
float luminance;
float variable;
........
......
calculation a corretct index
....
....
for (int k = 0; k<index; k++)
{
luminance+=(float)dataDevice[........];
variable += 1; (for example)
}
…
......
…
output[i*width+j] = make_uchar4(luminance,luminance,luminance,255);
}
}
time execution VERY VERY VERY HIGHT
identical except “output[i*width+j] = make_uchar4(luminance,luminance,luminance,255);” but for example output[i*width+j] = make_uchar4(100,100,100,255);" or another variable but not luminance
“output[i*width+j] = make_uchar4(lvariable,variable,variable,255);” (for example)
time execution VERY VERY VERY LOW
why? read/write whit luminance? wait ? conflict? WHY?
My very problem is this:
example:
global void kernel(unsigned short* dataDevice,uchar4* output)
{
int j = (blockIdx.x) * blockDim.x + threadIdx.x;
int i = (blockIdx.y) * blockDim.y + threadIdx.y;
if ((j<width) && (i<height))
{
float luminance;
float variable;
........
......
calculation a corretct index
....
....
for (int k = 0; k<index; k++)
{
luminance+=(float)dataDevice[........];
variable += 1; (for example)
}
…
......
…
output[i*width+j] = make_uchar4(luminance,luminance,luminance,255);
}
}
time execution VERY VERY VERY HIGHT
identical except “output[i*width+j] = make_uchar4(luminance,luminance,luminance,255);” but for example output[i*width+j] = make_uchar4(100,100,100,255);" or another variable but not luminance
“output[i*width+j] = make_uchar4(lvariable,variable,variable,255);” (for example)
time execution VERY VERY VERY LOW
why? read/write whit luminance? wait ? conflict? WHY?
tera
October 28, 2010, 12:53pm
3
The compiler is able to optimize away the whole luminance calculation if it’s result is not used.
tera
October 28, 2010, 12:53pm
4
The compiler is able to optimize away the whole luminance calculation if it’s result is not used.
for example: nThreads (32,16) nBlock(200,400) with luminance ~ 8000 ms !!! (I disabled the “Timeout Detection and Recovery”)
without luminance ~ 0.002 ms
for example: nThreads (32,16) nBlock(200,400) with luminance ~ 8000 ms !!! (I disabled the “Timeout Detection and Recovery”)
without luminance ~ 0.002 ms