Dear njuffa,
I think that __mul24 was a red herring since a small kernel can be constructed which has a
problem which appears to be the problem I had which does not use __mul24(). Instead it seems
that the problem arises from the difference between using a macro and setting a const int.
Here is the code:
global void stereoKernel(int disparityMinSSD,
int width,
int height,
size_t out_pitch,
const int BLOCK_W) { #ifdef CONST
const int X = (blockIdx.xBLOCK_W) + threadIdx.x; #else #define X ((blockIdx.xBLOCK_W) + threadIdx.x) #endif
const int Y = blockIdx.yROWSperTHREAD;
}
When CONST is defined and X is zero disparityMinSSD may be set to -1 (ie using const int X)
but when it is not defined the same elements of disparityMinSSD are set to 2147483647
This is a cut down version and disparityMinSSD is only set for the first row in each
grid element.
I do not know that this is really what C++ should do? The original code was far more
convoluted and the essential substraction from X and int–float conversions were buried
inside other code.
Notice also that the macros do wrap stuff in () to ensure correct precedence when
expanded but this does not prevent the problem.
This seems to be the result of standard sign propagation rules.
“const int X” is signed, while (depending on how BLOCK_W is defined) “((blockIdx.x*BLOCK_W) + threadIdx.x)” most likely is unsigned.
Thus for X=0 the value of “X-1” is either -1 if X is signed or 4294967295 if X is unsigned. Why you end up with half that in the second case I don’t immediately see but might be traced down on a complete, compilable example.
I have gone back and tried __mul24()
The documentation says its return type is 32-bit int (I think that means signed int).
The upshot is a macro definition using __mul24() has the same problem as a macro
definition using *.
Bill
ps: the new code fragment is
#ifdef CONST #ifdef MUL24
const int X = (__mul24(blockIdx.x,BLOCK_W) + threadIdx.x); #else
const int X = (blockIdx.xBLOCK_W) + threadIdx.x; #endif #else #ifdef MUL24 #define X (__mul24(blockIdx.x,BLOCK_W) + threadIdx.x) #else #define X ((blockIdx.xBLOCK_W) + threadIdx.x) #endif #endif
blockIdx and threadIdx are of type uint3. So it seems to me tera is on the right track in suspecting that the issue at hand stems from mixed signed / unsigned computation, leading to unexpected results due to C/C++ type promotion rules.
In particular, for integer types of identical bit-width, the unsigned type is considered “wider”, and in an expression containing both signed int and unsigned int operands the signed int operands are thus converted to unsigned int first, which turns any negative number into a large positive number.
Dear tera and njuffa,
Many thanks for your help.
Just a short reply to confirm that if variable X is of type const unsigned int then I get the same
answers as if X is replaced by a macro #define X. Also I get the same answers if I use
(__mul24(blockIdx.x,BLOCK_W) + threadIdx.x) or ((blockIdx.x*BLOCK_W) + threadIdx.x).
The 2147483647 problem turned out to be a bug elsewhere and (after fixing)
when X is zero const float x_tex = X - 1; does indeed set x_tex to 4.29497e+09
Once again many thanks
Bill
ps: I have removed not__mul24_bug.tar.gz from Langdon, William, W B