cudaErrorLaunchFailure -- potential causes?

I have an application that performs several GPU operations successfully, then suddenly returns an error code of cudaErrorLaunchFailure. Before getting into the details of my code, could anyone tell me some general causes for this error? I could not find anything too descriptive in the documentation. As a note, I am working with very large arrays (pushing the bounds of allocatable device memory), but I do not get any allocation errors. I will also note that for smaller data sets, my application runs smoothly. Here are a few specs for my system:

-Windows XP 64
-Using one Tesla C1060
-CUDA 2.3

Thank you for any help!

In my experience this error appeared to me when launching with an incorrect grid/block_size configuration. For instance, using a grid with z != 0, such kind of things…