CUDA 3.2 Driver BROKE ? Oops....

Sarnath · November 26, 2010, 10:46am

Hi There!

I fear the worst – the CUDA 3.2 driver is “broke”. Platform: Linux, 32-bit, Ubuntu 9.04,

I am running a complex genetic algorithm on CUDA. We finished the project long time back - CUDA 2.3 times…

So, today I just re-compiled it on CUDA 3.2 installation and found that it does not return correct results anymore.
All results were zeroes… OMG! Unbelievable…
The same code ran rock solid during CUDA 2.3

So, Keeping the driver at 260.19 (the one that comes with CUDA 3.2 Linux 32-bit), I just changed the toolkit to CUDA 2.3.
No Change! The problem was persistent.

So, I downgraded the system to 190.53 driver (the one that comes with Linux CUDA 2.3) and then everything works!!

I think this problem could be related to what was posted in http://forums.nvidia.com/index.php?showtopic=186015

Is NVIDIA aware of this problem?
It would be difficult for me to get a bug-report on this. One because, genetic algorithms are very complex to control and debug. Other because, my bandwidth is limited. It would require hours together to produce a repro case,

Any help guys?,

Thanks,
Best Regards,
Sarnath

Sarnath · November 26, 2010, 10:46am

Hi There!

I fear the worst – the CUDA 3.2 driver is “broke”. Platform: Linux, 32-bit, Ubuntu 9.04,

I am running a complex genetic algorithm on CUDA. We finished the project long time back - CUDA 2.3 times…

So, today I just re-compiled it on CUDA 3.2 installation and found that it does not return correct results anymore.
All results were zeroes… OMG! Unbelievable…
The same code ran rock solid during CUDA 2.3

So, Keeping the driver at 260.19 (the one that comes with CUDA 3.2 Linux 32-bit), I just changed the toolkit to CUDA 2.3.
No Change! The problem was persistent.

So, I downgraded the system to 190.53 driver (the one that comes with Linux CUDA 2.3) and then everything works!!

I think this problem could be related to what was posted in The Official NVIDIA Forums | NVIDIA

Is NVIDIA aware of this problem?
It would be difficult for me to get a bug-report on this. One because, genetic algorithms are very complex to control and debug. Other because, my bandwidth is limited. It would require hours together to produce a repro case,

Any help guys?,

Thanks,
Best Regards,
Sarnath

avidday · November 26, 2010, 11:03am

That driver version most certainly isn’t broken. I have it in production with both the CUDA 2.3 and 3.2 toolkits on our cluster on a mixture of GT200 and GF100 cards and it works perfectly, including “legacy” code written in the pre-Fermi, pre-3.0 toolkit era.

Are you sure it isn’t just execution parameters? Could it be that by recompiling with the newer toolkit and compiler, the kernel register consumption has changed and kernels are no longer launching?

avidday · November 26, 2010, 11:03am

That driver version most certainly isn’t broken. I have it in production with both the CUDA 2.3 and 3.2 toolkits on our cluster on a mixture of GT200 and GF100 cards and it works perfectly, including “legacy” code written in the pre-Fermi, pre-3.0 toolkit era.

Are you sure it isn’t just execution parameters? Could it be that by recompiling with the newer toolkit and compiler, the kernel register consumption has changed and kernels are no longer launching?

Sarnath · November 26, 2010, 11:13am

Not really. I did compile with older toolkit and ran the executable on the latest driver. It failed. That is how I ruled out the “Toolkit”.

I just changed the driver and then re-ran the same executable and it ran fine…

We check errors for all the calls. In any case, I will recheck what you said. Thanks a LOT for your hints on a friday evening here!

Sarnath · November 26, 2010, 11:13am

Not really. I did compile with older toolkit and ran the executable on the latest driver. It failed. That is how I ruled out the “Toolkit”.

I just changed the driver and then re-ran the same executable and it ran fine…

We check errors for all the calls. In any case, I will recheck what you said. Thanks a LOT for your hints on a friday evening here!

Ken_g6 · November 27, 2010, 5:50am

FYI, the driver seems semi-broken to me too. I developed an application involving prime numbers, PSieve-CUDA, and with the 260 drivers I’ve gotten more and more reports of problems with the 260 drivers. I’m using them myself, and the app fails for me in some cases but not others. This app uses integers exclusively, so there’s no floating-point issue. It’s also embarassingly parallel, so it’s easy to adjust runtime parameters. It uses almost no memory (a few MB at most), and only registers and constants are accessed in the inner loop. And I always compile with the 2.3 SDK, so that version isn’t an issue.

In the first ranges I tested, the driver wasn’t a problem. I did notice some problem when I overloaded the card with about eight times as many CUDA threads as the GPU should need. For some reason, this makes the app run a bit faster. I have a couple of versions of the app, and for BOINC I have to disable PThreads, which otherwise allow running the app on multiple GPUs at the same time. When using PThreads I didn’t notice the problem either, and got the small speedup I was looking for. But without PThreads and with that overload I got computation errors (which means the GPU miscalculated something, and the CPU caught it.)

I’m going to try running the new range with PThreads tomorrow and see if it helps. But for now you might try decreasing your CUDA thread count, if you can. You might also try breaking up your computation into smaller (or perhaps larger) pieces per kernel run. I suspect that might be the cause of the newest errors, and I’m going to explore that tomorrow too.

Ken_g6 · November 27, 2010, 5:50am

FYI, the driver seems semi-broken to me too. I developed an application involving prime numbers, PSieve-CUDA, and with the 260 drivers I’ve gotten more and more reports of problems with the 260 drivers. I’m using them myself, and the app fails for me in some cases but not others. This app uses integers exclusively, so there’s no floating-point issue. It’s also embarassingly parallel, so it’s easy to adjust runtime parameters. It uses almost no memory (a few MB at most), and only registers and constants are accessed in the inner loop. And I always compile with the 2.3 SDK, so that version isn’t an issue.

In the first ranges I tested, the driver wasn’t a problem. I did notice some problem when I overloaded the card with about eight times as many CUDA threads as the GPU should need. For some reason, this makes the app run a bit faster. I have a couple of versions of the app, and for BOINC I have to disable PThreads, which otherwise allow running the app on multiple GPUs at the same time. When using PThreads I didn’t notice the problem either, and got the small speedup I was looking for. But without PThreads and with that overload I got computation errors (which means the GPU miscalculated something, and the CPU caught it.)

I’m going to try running the new range with PThreads tomorrow and see if it helps. But for now you might try decreasing your CUDA thread count, if you can. You might also try breaking up your computation into smaller (or perhaps larger) pieces per kernel run. I suspect that might be the cause of the newest errors, and I’m going to explore that tomorrow too.

tmurray · November 27, 2010, 6:26am

I need actual repro cases before I can dig any deeper. If the driver was just “broken,” it never would have passed internal QA. Obviously, we can’t test everything, so maybe we missed something, but just saying that it doesn’t work doesn’t do any good. I need code that works with one driver but not another.

tmurray · November 27, 2010, 6:26am

I need actual repro cases before I can dig any deeper. If the driver was just “broken,” it never would have passed internal QA. Obviously, we can’t test everything, so maybe we missed something, but just saying that it doesn’t work doesn’t do any good. I need code that works with one driver but not another.

ceearem · November 27, 2010, 11:30am

Just as a side note, I experienced problems as well (my app running with previous drivers stable, but crashing with the newer ones) but it turned out that the reason was a bug in the program, which for some reason just didnt have any influence with the older drivers.

Cheers
Ceearem

ceearem · November 27, 2010, 11:30am

Just as a side note, I experienced problems as well (my app running with previous drivers stable, but crashing with the newer ones) but it turned out that the reason was a bug in the program, which for some reason just didnt have any influence with the older drivers.

Cheers
Ceearem

haridy · November 27, 2010, 9:18pm

i experienced lower performance on my homebrew FFT kernel using the driver two, rolled back to 3.0 and things went back to normal

haridy · November 27, 2010, 9:18pm

i experienced lower performance on my homebrew FFT kernel using the driver two, rolled back to 3.0 and things went back to normal

Ken_g6 · November 28, 2010, 11:52pm

Ceearem, could you give me an idea of what your bug was? It might help others if anyone else has the same problem.

As for my app, it seems that when it runs inside a separate PThread, there’s no problem. But this is very inconvenient in certain situations. Also, when the bug does appear, it’s not always in the same step of the calculation. This makes me suspect some kind of race condition.

I’ll post back when I have a verified case that passes on older drivers.

Ken_g6 · November 29, 2010, 12:55am

OK, I have a test case for you:

Get TPSieve-CUDA. The source code is on that GitHub link I posted earlier, on the redc branch.
On 64-bit Linux, run “./tpsieve-cuda-boinc-x86_64-linux -p420700e9 -P420701000e6 -k 1201 -K 9999 -N 3000000 -c 60 -M 2 -T -m 64 --device 0”. If it completes correctly it should print that it found 208 factors. If it fails, which it does on 260.19.* drivers, it won’t print that and will print a “computation error” message to stderr.txt.

marcuse · November 29, 2010, 9:09am

I haven’t had errors on calculations, but I’ve noticed serious decrease in performance using CUSP compared to version 3.0. I describe the problem here:

http://forums.nvidia.com/index.php?showtopic=184785

Sarnath · December 9, 2010, 1:27pm

Good news is that my fears were wrong!

Apologies to NVIDIA…

It was a “cudaMemcpy” bug that CUDA 2.3 did not complain about – meaning – there was a silent memory corruption!
The software did not have error-checks on that memcpy and few other places as well. (as was rightly pointed out by Avidday).

After fixing the bug, the results of the complex simulation are the same between driver vesions!

Sorry about that Tim,

Best Regards,
Sarnath

Lev · December 10, 2010, 3:17pm

What kind of bug was it?

ceearem · December 11, 2010, 8:13pm

Several things, but mostly faulty memory accesses (i.e. also a memcpy with larger than allocated size etc.) which worked in earlier cuda versions [and I routinely run stuff on 3-4 different Linux Systems]. I think I remember also that there were out of bounds shared memory accesses. Btw. does someone know a good toolchain (or can hint me to a post for one) in linux to get find memory access errors, and memory leaks in an MPI bases multi GPU code. I mean there are bugs which might only occur when running more than 27 (3x3x3 grid) MPI processes (each with a seperate GPU hooked to it).

Cheers

Ceearem

Topic		Replies	Views
cuda 2.2 bug? CUDA Programming and Performance	29	19878	May 3, 2010
CUDA Toolkit 3.2 release candidate available to registered developers CUDA Programming and Performance	68	63464	December 3, 2010
CUDA Toolkit and SDK 2.3 released CUDA Programming and Performance	127	320433	November 3, 2009
CUDA 3.2 on GTX 480 is "busy or unavailable" CUDA Programming and Performance	19	73610	March 21, 2011
CUDA, Linux Ubuntu 10.04 and strange mismatch version CUDA Programming and Performance	26	19312	November 18, 2010
CUDA Toolkit 3.0 beta released now with public downloads CUDA Programming and Performance	104	430895	March 25, 2010
S1070 device 0 broken Test case provided CUDA Programming and Performance	10	4417	June 9, 2009
CUDA Toolkit and SDK v2.2 released CUDA Programming and Performance	59	65133	January 25, 2011
Cuda broken in 396.24.02 and 396.24.10 Vulkan beta drivers on Linux Linux	47	9518	October 14, 2021
Linux Kernel Crashes under 260.19.21 Investigating Linux Kernel Crashes CUDA Programming and Performance	35	37866	February 1, 2011

CUDA 3.2 Driver BROKE ? Oops....

Related topics