My program generates lots of small computation tasks, running asynchronously, with other computations being done on CPU at the same time. It works great with 174.55 driver and CUDA 2.0 Beta 2. I have tried the recently released CUDA 2.0 and 177.84 driver. Unfortunately, CUDA part gets dramatic performance dropdown after migrating to 177.84 driver. Actually I have noticed the same performance loss with 177.35 driver, however I hoped that in official release this bug will be fixed. The problem is with driver and not with CUDA toolkit: I have rolled back to 174.55 and checked the program re-compiled with CUDA 2.0 and performance is OK.
With 177.84 driver I have noticed much higher kernel CPU usage level than with the older driver.
My program is not GPU-intensive: it works fine on 8600GT and even on 8400GS. In fact, it does most of computations on CPU while using GPU as a co-processor for certain pieces of code.
I use NForce 570 SLI motherboard, AMD Athlon 64 X2 3800 CPU, Windows XP SP2, 2GB Ram. I work with driver API. Unfortunately, I can’t put here the full code of my program, however I can share CUDA - related code fragments.
I’m ready to answer any questions.