Very strange compilation behavior

Hey, I’m using CUDA to do some bit operations. I love it because so far I’m to about a 50x speedup from the 4-core version on my CPU! The problem is that I’m running into some very strange compiler behavior that causes it to crash once the data set gets large. The code snippet below explains the issue: (This bit gets executed about 1.5 billion times)

if (row1+k <= maxTexFetch && row2+k <= maxTexFetch) // THIS WILL ALWAYS BE TRUE! maxTexFetch is 0x7fffffff!

{

   if (mask[maskIdx + k] && (tex1Dfetch(tex, row1 + k) ^ tex1Dfetch(tex, row2 + k)) & mask[maskIdx + k])

	  goto keep_going;

}

else 

{

   // IF WE UN-COMMENT THE STATEMENT BELOW, IT STOPS CRASHING

   //   (which shows this never actually gets executed because a fetch to 0x7ffffffe would definitely cause a crash)

//if (mask[maskIdx + k] && (tex1Dfetch(tex, maxTexFetch-1) ^ tex1Dfetch(tex, maxTexFetch-1)) & mask[maskIdx + k])

	  //;

}

The workaround doesn’t work for my full program, so I’m stalled until I can get it fixed.

Below is some of my system info:

  • CUDA toolkit 3.0

  • Intel Q6600 Processor (2.4 GHz quad core)

  • GeForce 8800GTS with 512 MB RAM

  • Windows Vista 32 bit

  • 2GB system RAM

  • nvcc compiler built on Tue_Feb_23_16:37:32_PST_2010, release 3.0, V0.2.1221

  • I’ve made a light-weight program that reproduces it on my 8800GTS if it would be helpful

Are you aware of texture size limit? 128MB Or maybe you bind it to cuda array? It is even smaller. Try to get rid from texture fetches.

I thought about that as well. The texture is under 4MB. I tried it with global memory as well and got the same results. Thanks for the idea, though.

The texture fetches (or global memory accesses if I do it that way) are what’s crashing, it though. If I get rid of them or even have it always fetch from 0 it doesn’t crash anymore either. Other things I’ve found are:

  • If I set maxTexFetch to anything between the size of the texture and the size - 5000, it seems to always crash

  • If I set maxTexFetch be less than the size of the texture - 7000, it doesn’t crash anymore

  • Checkig whether the index I’m fetching >= 0 doesn’t seem to have any effect

  • The oddest behavior is if I set maxTexFetch to 0x7fffffff (I’m using ints) and put a statement in the else (which is never executed), it doesn’t crash. Without the the statement it does crash.

What is about mask array?