Unspecified launch failure when run on C2070 Program functions fine on GTS 250 but gives a vague err

Attached is a file containing several kernels. The program I have written calls the computeAggregate kernel which functions fine on a GTS 250 (Windows or Linux 64-bit, latest drivers, toolkit and SDK). However, on a Linux machine with two C2070 (in which I am only attempting to use one), I get the “cutilCheckMsg() CUTIL CUDA error : Kernel execution failed : unspecified launch failure.” error. The machine is running Ubuntu with the 260.19.21 drivers and has the 3.2 toolkit and SDK (same as the other machine). There is enough memory on both machines and the same execution environment is used. I’ve attached the makefile I use as well to compile on Linux machines (note, this makefile works fine on a laptop with a 9500M, although much slower).

There are warnings in regards to “Warning: Cannot tell what pointer points to, assuming global memory space” on both machines, but because the pointer does in fact point to global memory, there should be no problems.

I wrote a simple kernel called testKernel, and wrote a quick wrapper to test it’s functionality and it works fine on both machines. Same execution environment (threads/grids) as for the other kernel.

Any insight or suggestions are much welcome, I’ve been banging my head on this one for a while.
Makefile.txt (14.5 KB)
gpuDB_kernels.cu (2.29 KB)

A unspecified launch failure is usually due to out of bounds memory access. Fermi has much more strict shared and local memory protection than on earlier cards. I would suggest running your code through cuda-memcheck on the Tesla and see if it reports anything.

Thanks for the idea, that was very helpful. I did not know cuda-memcheck existed. It’s definitely showing some errors which I’m trying to work through now.