Is cuda 2.0 faster than the previous versions?

Hi.

I use 8800GTX, and recently moved to CUDA 2.0 beta from 1.0. It is because the nvcc in 1.0 does not work well if the code uses many registers.

Anyway, I recompiled and ran the software I wrote in CUDA 1.0 on the 2.0 beta, and I found that the execution took always slower in 2.0.

For example, one of my kernels took about 300 ms when using 1.0, but it took about 400~500 ms in 2.0 beta.

Currently, there is no problem to compile the codes on CUDA 2.0 beta.

Do I have to give some options to nvcc to reach the performance of 1.0?

Any Idea?

How can this be reproduced?

Thanks for the fast reply.

Every code I wrote takes more time on 2.0 Beta (more or less, but slower). Because it gives the same results on both versions, I think there is no problem on my system, libraries, and runtime. (I hope!)

I would like to show the profiles from the cuda profiler, but I do not have the data on 1.0. If I have to show it, I should uninstall the current CUDA and video drivers, reinstall the older version, and run it again.

I thought some of you have had similar experiences.

(FYI, my codes use about 20~30 registers, about 60 kb of shared memory, about 12kb of common memory, and fully coalesced memory access to the global memory.)

My comment is not about productivity - but about serious CUDA 2.0 limitation. For some reason, none of NVIDIA people answered single word about it. Please, check out this thread: Watchdog problem

Similar question has been posted more than once before by me and some other forum visitors, still no answer.

Hopefully, this time watchdog problem on secondary cards will be clarified. Thanks in advance.