problem with replacing __mul24

Has anyone experinced nvcc compiler problems with replacing macros defined to use __mul24
with int variables using * ?

I inherited some convoluted code which does thinks like
#define X (__mul24(blockIdx.x,BLOCK_W) + threadIdx.x)

BLOCK_W=96 and I would expect 0 <= blockIdx.x < 96.

If I replace this with
const int X = (blockIdx.x*BLOCK_W) + threadIdx.x;

I do not get the same answers.
Confused? You bet

All help and comments welcome

ps nvcc release 5.0, V0.2.1221

–compiler-options -fno-strict-aliasing -arch sm_13

Could you post a minimal compilable and runnable example that reproduces this issue?

Dear njuffa,
I think that __mul24 was a red herring since a small kernel can be constructed which has a
problem which appears to be the problem I had which does not use __mul24(). Instead it seems
that the problem arises from the difference between using a macro and setting a const int.
Here is the code:

global void stereoKernel(int disparityMinSSD,
int width,
int height,
size_t out_pitch,
const int BLOCK_W) {
#ifdef CONST
const int X = (blockIdx.x
BLOCK_W) + threadIdx.x;
#define X ((blockIdx.xBLOCK_W) + threadIdx.x)
const int Y = blockIdx.y

if((blockIdx.x*BLOCK_W + threadIdx.x) < width && Y < height) {
  const float x_tex = X - 1;
  disparityMinSSD[(Y*out_pitch) + blockIdx.x*BLOCK_W + threadIdx.x] = 

When CONST is defined and X is zero disparityMinSSD may be set to -1 (ie using const int X)
but when it is not defined the same elements of disparityMinSSD are set to 2147483647
This is a cut down version and disparityMinSSD is only set for the first row in each
grid element.
I do not know that this is really what C++ should do? The original code was far more
convoluted and the essential substraction from X and int–float conversions were buried
inside other code.
Notice also that the macros do wrap stuff in () to ensure correct precedence when
expanded but this does not prevent the problem.


This seems to be the result of standard sign propagation rules.

“const int X” is signed, while (depending on how BLOCK_W is defined) “((blockIdx.x*BLOCK_W) + threadIdx.x)” most likely is unsigned.
Thus for X=0 the value of “X-1” is either -1 if X is signed or 4294967295 if X is unsigned. Why you end up with half that in the second case I don’t immediately see but might be traced down on a complete, compilable example.

Dear tera,

I have put the whole example in
492368 bytes

Many thanks
ps: let me know when you have it and then I can remove the gzipped tar ball

I have gone back and tried __mul24()
The documentation says its return type is 32-bit int (I think that means signed int).
The upshot is a macro definition using __mul24() has the same problem as a macro
definition using *.
ps: the new code fragment is

#ifdef CONST
#ifdef MUL24
const int X = (__mul24(blockIdx.x,BLOCK_W) + threadIdx.x);
const int X = (blockIdx.xBLOCK_W) + threadIdx.x;
#ifdef MUL24
#define X (__mul24(blockIdx.x,BLOCK_W) + threadIdx.x)
#define X ((blockIdx.x
BLOCK_W) + threadIdx.x)

I don’t immediately see why you get half the expected result and I’m not willing to wade through the whole project to make it compilable for me.

Anyway, just replace const int X with const unsigned int X and your problem should be gone.
And have a look at C integer promotion rules.

blockIdx and threadIdx are of type uint3. So it seems to me tera is on the right track in suspecting that the issue at hand stems from mixed signed / unsigned computation, leading to unexpected results due to C/C++ type promotion rules.

In particular, for integer types of identical bit-width, the unsigned type is considered “wider”, and in an expression containing both signed int and unsigned int operands the signed int operands are thus converted to unsigned int first, which turns any negative number into a large positive number.

Dear tera and njuffa,
Many thanks for your help.
Just a short reply to confirm that if variable X is of type const unsigned int then I get the same
answers as if X is replaced by a macro #define X. Also I get the same answers if I use
(__mul24(blockIdx.x,BLOCK_W) + threadIdx.x) or ((blockIdx.x*BLOCK_W) + threadIdx.x).

The 2147483647 problem turned out to be a bug elsewhere and (after fixing)
when X is zero const float x_tex = X - 1; does indeed set x_tex to 4.29497e+09

Once again many thanks
ps: I have removed not__mul24_bug.tar.gz from