Compiler bug on cuda 3.2

I found some bug, probably compiler’s. First, the kernel works only(!) with gpu debug info option included. Anything else does not matter. And it works right. With out enabling gpu debug info kernel returns cuda_unknown_error. And does not write any results. I will try to get isolated case, but I am not sure it could be possible.

The kernell has cycle

int index=a[…]; // a device array

for (x=x1; x<x2; x++)

for (y=y1; y<y2; y++)

for (z=z1; z<z2; z++)

{

b[index]=xstride2+ystride+z; // b device array

index++;

}

the problem seems is in this cycle

cause the fix is to make some false operation with index at the start of a cycle

for (x=x1; x<x2; x++)

for (y=y1; y<y2; y++)

for (z=z1; z<z2; z++)

{

index=index+(z>>20); //z>>20 =0, but compiler does not know it

b[index]=xstride2+ystride+z; // b device array

index++;

}

and this run correctly with out debug gpu info included/

So, I think it is either compiler or ptx generator bug.

I could not try it on other cuda versions now. I use win 7, drivers 260.99 and msvs8.0

Can you post code that shows the problem? That snippet isn’t enough by itself.

When arbitrary no-op code changes “fix” a mysterious problem, I tend to think of threading races in the code, not a compiler issue.

I will work later on short test case, there are not suspicious operation. With the fix program run long time with out errors, while with out fix kernell just does not write any results end returns unknown_error.

I found another fix

while (z<=z2)

{

z++;

b[index]=xstride2+ystride+(z-1);

index++;

}

I am pretty sure it is compiler or ptx generator bug, cause no shared variables there, and would it be bug, kernell would work wrongly, but it returns cuda_unknown_error and does not write anything to b, or write zeroes.

I found compiler bug in other kernell, huge one with a lot of control flow. Compiler just died with external exception reading on small address. I rearranged branches and could complete compilation. However, if program goes to some branch, kernell return cudaUnknownError. I suspect it may be ptx to machine code bug in driver or run time. I use geforce 465 and driver 260.99. Now I uninstall cuda 3.2 and install cuda 3.0, cause I need emulation mode to check algorithm of a program. Btw, if I generate gpu debug info, program works much differently, it does not return cudaUnknownError, but just hand driver. And compiler somehow reports different register usage. It is all pointless.