Hey, I’m using CUDA to do some bit operations. I love it because so far I’m to about a 50x speedup from the 4-core version on my CPU! The problem is that I’m running into some very strange compiler behavior that causes it to crash once the data set gets large. The code snippet below explains the issue: (This bit gets executed about 1.5 billion times)
if (row1+k <= maxTexFetch && row2+k <= maxTexFetch) // THIS WILL ALWAYS BE TRUE! maxTexFetch is 0x7fffffff!
{
if (mask[maskIdx + k] && (tex1Dfetch(tex, row1 + k) ^ tex1Dfetch(tex, row2 + k)) & mask[maskIdx + k])
goto keep_going;
}
else
{
// IF WE UN-COMMENT THE STATEMENT BELOW, IT STOPS CRASHING
// (which shows this never actually gets executed because a fetch to 0x7ffffffe would definitely cause a crash)
//if (mask[maskIdx + k] && (tex1Dfetch(tex, maxTexFetch-1) ^ tex1Dfetch(tex, maxTexFetch-1)) & mask[maskIdx + k])
//;
}
The workaround doesn’t work for my full program, so I’m stalled until I can get it fixed.
Below is some of my system info:
-
CUDA toolkit 3.0
-
Intel Q6600 Processor (2.4 GHz quad core)
-
GeForce 8800GTS with 512 MB RAM
-
Windows Vista 32 bit
-
2GB system RAM
-
nvcc compiler built on Tue_Feb_23_16:37:32_PST_2010, release 3.0, V0.2.1221
-
I’ve made a light-weight program that reproduces it on my 8800GTS if it would be helpful