Arbitrary kernel crashes device crashes in "random" locations

It appears that my kernel is crashing at arbitrary locations in the code. My colleague suggests that we are using either too many registers or too much memory. Some of the crashes are a result of memory access. Others are not. For example:

[codebox]

//does not crash

Ap[0] = …

for(…)

if(…)

//crashes

Ap[0] = ...[/codebox]

I was stuck on this one for a while, then it just went away. I made no changes that would obviously affect this code. My first thought was to check that the memory was actually declared, which it was.

[codebox]//both crash

float tstar = cos(1);

float tstar = cosf(1);

//does not crash

float tstar = __cosf(1);[/codebox]

This one was really annoying. For some reason, executing the cos of anything at this one spot caused the kernel to crash. It only went away when I replaced it with the fast cosine.

Earlier today it was crashing on a global memory access which of course never caused a crash before. Note that when I skipped over the line of code that was crashing (either cosine or Ap[0]) the kernel would finish properly. This means that only one of these problems exists at a given time. Any time one problem goes away, another one pops up. And yes, I am compiling with optimizations turned off.

My question is this: what factors can cause kernels to crash on correct code? The crashes happen at particular spots reliably, then they choose another particular spot to crash. This leads me to believe that it has something to do with the size of the program. I think that changing completely unrelated code causes the memory state to be different in the computer. I tested the same code on a different machine when it was crashing on cosine, but the same problem occurred.

Side question: which memory space does statically declared memory from device code allocate from? Is typing float array[10]; in device code global or shared memory? And is this deallocated upon leaving a code block {}?

Thanks.

“Too much memory” means you’d have a failed cudaMalloc, not a random kernel crash after a successful alloc.
“Too many registers” means the kernel would fail to launch, not a random crash.

Just like on the CPU, CPU crashes tend to be accessing memory via incorrect pointers, either uninitalized, invalidated, inappropriate, or beyond some allocated bound.
A common “gotcha” is accessing host pointers from the device, which will indeed crash.

Post a small program kernel that actually crashes (not just one line) and the problem may be easy to spot.