Every code I wrote takes more time on 2.0 Beta (more or less, but slower). Because it gives the same results on both versions, I think there is no problem on my system, libraries, and runtime. (I hope!)
I would like to show the profiles from the cuda profiler, but I do not have the data on 1.0. If I have to show it, I should uninstall the current CUDA and video drivers, reinstall the older version, and run it again.
I thought some of you have had similar experiences.
(FYI, my codes use about 20~30 registers, about 60 kb of shared memory, about 12kb of common memory, and fully coalesced memory access to the global memory.)
My comment is not about productivity - but about serious CUDA 2.0 limitation. For some reason, none of NVIDIA people answered single word about it. Please, check out this thread: Watchdog problem
Similar question has been posted more than once before by me and some other forum visitors, still no answer.
Hopefully, this time watchdog problem on secondary cards will be clarified. Thanks in advance.