Vista driver problem?

I have helped integrate an RC5-72 CUDA core into the distributed.net client. We are encountering a problem on both 32-bit and 64-bit Vista that’s not present on other platforms. On all platforms other than Vista, the client works correctly with every driver version 177.x and greater that we’ve tested. On both 32-bit and 64-bit Vista, however, the CUDA core works using driver 178.x, but fails using driver versions 180.x and 181.x. We have also tested the BETA 185.x driver on Vista and the core works correctly (running code simultaneously on the GPU and CPU seems broken in 185.x, but it’s not surprising there is some problem with an unofficial driver). The client is compiled with Visual Studio 2003 using CUDA 2.0. The discussion along with the code is in the bug report [url=“4030 – (cuda) Implement new RC5-72 core for nVidia CUDA video cards”]Invalid Bug ID

Additional information: There is a report that when the client fails in Vista using driver 180.x or 181.x, GPU gets downclocked. In the reported case, the default core clock is 650mhz and when it hangs, it drops to 301mhz. Default shader clock is 1620mhz and when it hangs, it drops to 602mhz. The full client source including the CUDA core is at [url=“http://http.distributed.net/pub/dcti/source/pub-20090127.tar.gz”]http://http.distributed.net/pub/dcti/sourc...20090127.tar.gz[/url]

Any ideas?

I have been involved in testing the distributed.net client on CUDA and have tested numerous drivers with the client.

When doing ‘dnetc -stress’ test, which is intended to find errors I got the following result:

178.24 Passes ALL
180.60 Core #0 Test #4 HANGS
180.84 Core #0 Test #4 HANGS
181.00 Core #0 Test #4 HANGS
181.20 Core #0 Test #4 HANGS
181.22 Core #0 Test #4 HANGS
185.20 Passes ALL (+/- Half speed)

A HANG means, that the client just stops responding, and a kill through the task manager is required.

The test system I used is:

Windows Vista Ultimate 64bit
Intel Core2Duo E6550
GeForce 8600GT

I hope you guys can help to solve this issue.

PS.
When I compile the client, with the CUDA 2.1 library’s, I get the same results for the test!

Dear nVidia,

Any help/information regarding this issue would be helpfull

Thanx,

On a Vista64 w/ a 8800GTS/320 and the 182.50 drivers the client was getting nearly 200 Mkeys/sec on average, with the 185.x series drivers, although the client no longer crashes with lost events, the keyrate is less than 1/2 of what it was previously. Was something changed that would affect performance that heavily?

With the 182 series and previous drivers, the client would occasionally ‘lose’ CUDA events [i.e. an event would be recorded but never completed]. When this happened, calls to cudaEventSynchronize would return cudaErrorUnknown, rather than any useful error, and quite often the performance of the card would tank until the entire process was killed and restarted. Even terminating the thread and restarting the client [within the same process], was insufficient to return full speed functionality. Attempting to forceably terminate the current thread’s context, or creating a new context on the same thread failed with the former allowing attach/detach but no other operations, and the latter throwing the same unknown errors.

The failure behavior during normal operation seemed to be the same as the failure during the ‘-stress’ test, somehow the card would lose an event [or the context would get corrupted] and all further work on that thread was useless. Terminating the thread and starting a new one seemed to get things working again, but at a much lower speed… I was unable to determine if the context was entirely leaked [leading to an eventual out-of-memory exception], or just caused some sort of underclocking of the card…

For now, the 185 drivers have stopped the crashing from what I can tell, but at a pretty severe cost in performance.

Any ideas here? The 186.x BETA drivers improve performance somewhat, but it’s still a mere fraction of the 182 driver performance with the 8800 GTS/320 on Vista64.

Is this even being looked into, can someone confirm that this is expected behavior? We’d be happy to work with someone on the CUDA team to troubleshoot, but the response thus far has been pretty poor…