A good idea, the bugs I found are
The first has to do with reading and writing aligned structures to device memory
The second has to do with reading constant memory. For some reason in some of my kernels this fails, read out zero or other weird values, unless I prefix the kernel with __synchronizeThreads();
This one is kind of arcane and I’m not sure exactly when and why they occur. All I did was create a minimal program that still demonstrated the bug to NVidia.
Under Linux, when you use CUDA in a shared library, and the program terminates you sometimes get a nasty segmentation fault.
All of these were reported to NVidia, and are supposedly fixed in the internal development release.