185.18.10 CUDA does not work, 180.X sires sorta works....

mdoerner · May 30, 2009, 9:09pm

Hi All,

I posted this on the NVNews site, but they’re referring me here. All 185.X series drivers do not work with GPUGrid. The 185 drivers try to initialize the work, but give a 100% complete and subsequently fail the task. The newer 180.X series (180.37 and up, I suppose) work, but a lot of temporary screen freezes occur, with an occasional screen lock that does not recover. Here’s the bug reports for an earlier 180 series (that worked with screen freezes) and a 185 series (that did not work) for debugging purposes.

Mike Doerner

mdoerner · June 2, 2009, 7:50pm

Anybody at Nvidia working on this?

Mike D

mdoerner · June 5, 2009, 2:21am

Bueller?

tmurray · June 5, 2009, 4:13am

This is not much of a repro case…

netllama · June 5, 2009, 1:49pm

You stated that you originally reported this on NVNews. Did anyone else confirm that they were also experiencing the same problem?

Its not entirely clear to me what this failure even looks like from your current description. I’d like to see the log(s) from GPUGRID/BOINC that include the failure.

Does this reproduce if you run GPUGRID while X is NOT running?

mdoerner · June 5, 2009, 8:24pm

I’ve only tried running BOINC in X. How do you run it in the shell?

Basically, under the 185 driver, a task will download, and as soon as CUDA starts to work on it, it claims the task is 100% complete, and tries to upload. Under the 180 driver, the task will begin the timer and will progress to 0.295% within a few minutes. 9600 GSO if that helps. OpenSuSe 11.1 and KDE 4.2.2.

Mike D

netllama · June 5, 2009, 8:31pm

I’ve never tried to run it under X, but I certainly wouldn’t’ expect stellar performance under those conditions. GPUGRID tends to consume all free time on a GPU, so even if everything were working perfectly, performance wouldn’t be zippy.

All that you need to do to run is run the ‘boinc’ executable. I’d suggest reading the BOINC documentation for more information on all the options.

Also, I’d still like to see the following information:

You’re the only person that I’m aware who has reported this. Has anyone else reported this problem that you know of?
Please attach the BOINC log that includes the failure.

mdoerner · June 5, 2009, 8:32pm

??? Sorry if the nvidia-bugreport doesn’t show what you need. Have you run any tasks from GPUGrid under the 185 driver?

Mike D

mdoerner · June 6, 2009, 12:53pm

Since I don’t know of anyone else running CUDA on Linux for GPUGrid, that may make my case unique… :">

After poking around some more documentation, I found this little gem in the CUDA release notes hereCUDA 2.2 Release Notes

The ‘offending’ snippet is here…

When compiling with GCC, special care must be taken for structs that contain 64-bit integers. This is because GCC aligns long longs to a 4 byte boundary by default, while NVCC aligns long longs to an 8 byte boundary by default. Thus, when using GCC to compile a file that has a struct/union, users must give the -malign-double option to GCC. When using NVCC, this option is automatically passed to GCC.

OK, now when I recompile the 185.18.14 driver with -malign-double, CUDA exhibits a different behavior. Instead of CUDA trying to process the tasks, and immediately calling them 100% complete, it now just sits there with the tasks in the queue and doesn’t do anything. (I think this is an improvement??!?!)

To me, it looks like it’s a flag issue with the way gcc compiles the driver. Either I need ALL the other flags to make this thing work, or I need to grab a copy of Nvidia’s NVCC compiler and re-compile the driver with that compiler. Anybody know where I can grab NVCC? Thanks.

Mike Doerner

mdoerner · June 6, 2009, 2:04pm

OK, I’ve got the CUDA toolkit on my system (OpenSUSE 11.1 and KDE 4.2.2) but I don’t see a flag in NVIDIA-Linux-x86_64-185.18.14-pkg2.run to use the nvcc compiler. I think using nvcc or getting the appropriate gcc flags will solve this problem, since -malign-double fixed the “immediate 100% completion” problem. Presently, BOINC grabs the tasks from GPUGrid, but does not change the stratus on the 1st task from “Ready to Start” to “Running”.

Mike Doerner

avidday · June 6, 2009, 2:39pm

You can’t use nvcc to compile the host driver interface. nvcc is designed for compiling CUDA code into executable GPU payloads and preprocessing/annotating host source with CUDA driver API functions to get those payloads running on the GPU. The intermediate host code produced by nvcc must be passed to the host C compiler for compilation into host object code. It has nothing to do with driver installation or compilation.

NVIDIA ship their host drivers/gpu firmware as a binary blob with a kernel interface wrapper which needs to be compiled against a given kernel configuration and source tree to produce a kernel module. That must be done with the same compiler that built the target kernel, which will be your vendor gcc.

mdoerner · June 6, 2009, 4:21pm

OK, that makes sense. Then what flags should be enabled? Without the -malign-double flag, CUDA says it’s 100% complete on a task even though in reality it has just begun computation. With -malign-double set in CFLAGS, CUDA doesn’t start computation, but then again it doesn’t screw up either. I’d like to know what flags MUST be enabled to get the driver to compile properly. This worked in the 180.X series drivers, but has never worked correctly on the 185.X series drivers on my system. FWIW, gcc 4.3.3 is the default in OpenSUSE 11.1.

Mike Doerner

avidday · June 6, 2009, 5:06pm

You shouldn’t have to set any compiler flags. The NVIDIA installer will (and must) use the exact compiler flags used to build the running kernel, which it will read out of the kernel configuration file. It should be a complete “hands off” process.

You seem to have latched onto something in the CUDA release notes, but it is a complete red herring. What is discussed in the release notes is a remark about data alignment when compiling user space host code which will share data with CUDA kernels running on the GPU. It has absolutely nothing to do with building the kernel driver module.

mdoerner · June 8, 2009, 9:54pm

OK, so I’m back to square 1. How how do you get the 185.X series of CUDA to function? The standard installation does not work. Using the same installation procedure with any 180.X series gets CUDA working, though with frozen screens on occasion…

Mike Doerner

netllama · June 8, 2009, 9:58pm

How is 185.x not functioning? Can you point me to where you provided the information that I requested last week?

avidday · June 9, 2009, 7:52am

I presume you are basing this “it doesn’t work” purely on the fact that gpugrid doesn’t complete tasks. You mentioned you have installed the CUDA 2.2 toolkit. If you install the 2.2 SDK you should be able to build the examples therein and see if they pass or not. That will provide independent confirmation of whether the CUDA driver is working properly or not.

mdoerner · June 9, 2009, 1:32pm

Bug reports are included in the top post. What other information do you require? CUDA doesn’t run without the X-server, I don’t think.

Compared to the help I’ve recieved on NVNews site, this isn’t very helpful. If you want me to help you get more information beyond the 2 bug reports and a description of the problem, you need to tell me a specific step-by-step procedure on what you’d like me to try.

Dammit, Jim, I’m a mechanical engineer, not a CUDA developer… External Media

PS I’ll add a BOINC log like you’ve requested in the next post…

Mike Doerner

mdoerner · June 9, 2009, 1:33pm

How do I do this? What test examples should I use? How does this test BOINC/GPUGrid? Please be as specific as possible, I’m a mechanical engineer, not a computer engineer. A step-by-step procedure would be helpful.

Mike Doerner

avidday · June 9, 2009, 2:01pm

Go to the same place you downloaded the CUDA toolkit from. Follow the onscreen instructions to select the CUDA SDK version for your linux distribution and then install it.

Any of them will achieve the desired result, but the simplest is deviceQuery. It will use the CUDA driver API to query your GPU and print out its CUDA capabilities and hardware features. If that works, your CUDA installation and drivers work.

It doesn’t. It tests CUDA and confirms you have working drivers. If you can compile and run the SDK examples, then you problem lies somewhere other than the CUDA drivers.

mdoerner · June 9, 2009, 2:08pm

OK, it seems 185.18.14 is exhibiting a slightly different problem. 185.18.14 does not start any GPUGrid tasks within BOINC. Heres’ the lastest bug report, screenshot, and BOINC log…

Mike Doerner

Topic		Replies	Views
Tesla card on Lucid Lynx - no CUDA-capable device is detected CUDA Programming and Performance	18	20045	February 2, 2011
deviceQuery passes but other demos fail CUDA Programming and Performance	7	2609	January 22, 2009
cuda 2.2 bug? CUDA Programming and Performance	29	19880	May 3, 2010
CUDA, Linux Ubuntu 10.04 and strange mismatch version CUDA Programming and Performance	26	19426	November 18, 2010
Cuda broken in 396.24.02 and 396.24.10 Vulkan beta drivers on Linux Linux	47	9674	October 14, 2021
New 285gtx issue - no CUDA-capable device is available message: There is no device supporting CUDA. CUDA Programming and Performance	13	13096	June 19, 2009
CUDA and Fedora 15 Can get dev drivers installed CUDA Programming and Performance	12	65921	December 8, 2011
Hello! I tried to install CUDA 5 on my ubuntu 12.4 with GeForce 210 CUDA Setup and Installation	13	14827	December 13, 2012
CUDA working on ubuntu-desktop not on ubuntu-server CUDA Programming and Performance	21	19372	March 13, 2014
CUDA 2.1 discussion CUDA Programming and Performance	71	64450	February 17, 2009

185.18.10 CUDA does not work, 180.X sires sorta works....

Related topics