I’ve got a codebase with a dozen or so CUDA kernels in it, and it was all developed on a box with a GTX480, where it runs just fine. Cuda-memcheck and cuda-gdb don’t spot any memory errors or anything and I’ve run it continuously for more than a day before. However, when I move it over to a new machine we just got with a GTX580 in it I start getting unspecified launch failures, however, it’s not consistent, cuda-memcheck reports invalid read/writes in differing kernels depending on the run (sometimes no errors are reported, just an ULF). Even in very simple kernels that can be verified by inspection, and in kernels from CUFFT (which I have to believe are validated…)
Has anyone else seen this behavior? Is it likely an issue with the card itself? I’m running the 580 on driver 260.19.29, CUDA 3.2.