edit: I am a dirty liar
apparently pkg2, not pkg0, is the one you want, because it’s the only one that includes the 32-bit compatibility libraries.
So I tested it… sure enough, the bad state after ctrl+c interrupt seems to be fixed. However, it’s still not usable due to a new bug it introduces.
Most of the sdk won’t build against it… e.g.:
[codebox]Mandelbrot.cpp: In function âvoid keyboardFunc(unsigned char, int, int)â:
Mandelbrot.cpp:641: warning: deprecated conversion from string constant to âchar*â
ptxas /tmp/tmpxft_00001001_00000000-2_Mandelbrot_sm11.ptx, line 1797; warning : Double is not supported. Demoting to float
/usr/lib64/libnvidia-tls.so.1: undefined reference to `__unixTLSWWMPDlsym(void*, char const*)’
collect2: ld returned 1 exit status
make[1]: *** […/…/bin/linux/release/Mandelbrot] Error 1
make[1]: Leaving directory `/usr/local/cuda/sdk/C/src/Mandelbrot’[/codebox]
Only one other reference churned up by google on this issue so far:
http://www.nvnews.net/vbulletin/showthread.php?t=137769
Looks like I’m still holding on to 2.2 for now :( .
And hopefully this driver fixes things.
190.32 for x86
190.32 for x86-64
board is mangling my links, it’s on an FTP server at:
download.nvidia.com/XFree86/Linux-x86/190.32/NVIDIA-Linux-x86-190.32-pkg0.run
download.nvidia.com/XFree86/Linux-x86_64/190.32/NVIDIA-Linux-x86_64-190.32-pkg2.run
Works! I think I can now move to cuda 2.3 safely! However, this driver won’t allow the recent OpenCL sdk to build. The driver released with the OpenCL sdk (190.29) of course allows the sdk to build… and also appears to have the ctrl+c issue fixed as well.
thx for posting here!
Jeremy
We are now at driver 195.17, cuda 3.0 beta.
Occasionally we still run into this bad state problem. When it happens, our memory test program shows the expected data is opposite of the read data.
We have been seen this several times in the past several month in our cluster. A reload of the nvidia kernel module will fix the problem.
Unfortunately this time the error cannot be reliably reproduced.
I am going to file a bug report and I think it might be a good idea to post here as well.
I will be happy to provide any information if needed.
-gshi
Interested for our own system:
Do you still have problems with CUDA 3.0? Which driver version are you using at the moment?
We still have this problem with CUDA 3.0 but it happens much less often and we cannot reliably reproduce it with our test program.
The driver version is 195.36.24
If you have similar problem, please share your information.
Not yet. But as I want to make our GPUs available for a lot of users, I am now (that I know that the problems still occur) considering running a memory test after each job.
I know that you published a memtest utility. Which kinds of the tests do you run after each job?
I will let you know whether we run into the same issue as soon as I have results. But I think, that will still take a while.
After each job, we run one pass of stress test, with the following argument
./cuda_memtest --stress --num_passes 1 --num_iterations 100 --device
In practice we found this is enough to flush out errors caused by the driver.