I have helped integrate an RC5-72 CUDA core into the distributed.net client. We are encountering a problem on both 32-bit and 64-bit Vista that’s not present on other platforms. On all platforms other than Vista, the client works correctly with every driver version 177.x and greater that we’ve tested. On both 32-bit and 64-bit Vista, however, the CUDA core works using driver 178.x, but fails using driver versions 180.x and 181.x. We have also tested the BETA 185.x driver on Vista and the core works correctly (running code simultaneously on the GPU and CPU seems broken in 185.x, but it’s not surprising there is some problem with an unofficial driver). The client is compiled with Visual Studio 2003 using CUDA 2.0. The discussion along with the code is in the bug report [url=“4030 – (cuda) Implement new RC5-72 core for nVidia CUDA video cards”]Invalid Bug ID
On a Vista64 w/ a 8800GTS/320 and the 182.50 drivers the client was getting nearly 200 Mkeys/sec on average, with the 185.x series drivers, although the client no longer crashes with lost events, the keyrate is less than 1/2 of what it was previously. Was something changed that would affect performance that heavily?
With the 182 series and previous drivers, the client would occasionally ‘lose’ CUDA events [i.e. an event would be recorded but never completed]. When this happened, calls to cudaEventSynchronize would return cudaErrorUnknown, rather than any useful error, and quite often the performance of the card would tank until the entire process was killed and restarted. Even terminating the thread and restarting the client [within the same process], was insufficient to return full speed functionality. Attempting to forceably terminate the current thread’s context, or creating a new context on the same thread failed with the former allowing attach/detach but no other operations, and the latter throwing the same unknown errors.
The failure behavior during normal operation seemed to be the same as the failure during the ‘-stress’ test, somehow the card would lose an event [or the context would get corrupted] and all further work on that thread was useless. Terminating the thread and starting a new one seemed to get things working again, but at a much lower speed… I was unable to determine if the context was entirely leaked [leading to an eventual out-of-memory exception], or just caused some sort of underclocking of the card…
For now, the 185 drivers have stopped the crashing from what I can tell, but at a pretty severe cost in performance.
Any ideas here? The 186.x BETA drivers improve performance somewhat, but it’s still a mere fraction of the 182 driver performance with the 8800 GTS/320 on Vista64.
Is this even being looked into, can someone confirm that this is expected behavior? We’d be happy to work with someone on the CUDA team to troubleshoot, but the response thus far has been pretty poor…