Cuda support for legacy GPUs

I’m trying to use an older laptop to try some Cuda-related code. It has a Quadro FX 2700M GPU. If I go to CUDA GPUs - Compute Capability | NVIDIA Developer this GPU is NOT listed under the Cuda-Enabled Quadro products. However, if I click on the link on that page to the ‘legacy CUDA GPUs’ page, this GPU is listed as a Cuda-Enabled Quadro Products. So I’m hopeful.

So I installed Ubuntu 14.04 on my 64-bit laptop computer and the NVidia 340.98 driver as I understand the latter is the latest 64-bit Linux driver that is compatible with this GPU. And I installed Cuda 6.5 (which is the latest version of Cuda compatiable with this driver). (And Ubuntu 14.04 is the latest Ubuntu compatible with Cuda 6.5, I believe).

Anyway - got everything working and tried to run a simple cuda program:

#include <stdio.h>
int main(void)
{
    int count
    cudaGetDeviceCount(&count);
    printf("count = %d\n",count);
    return 0;
}

The result is ‘count = 0’. So it does not appear to be seeing my GPU. If I run ‘nvidia-smi -a’ I get a long list of stats about the GPU (many listed as ‘N/A’) but I’m not sure if there is something there that IDs the problem.

I have also tried to run ‘sudo nvidia-xconfig --enable-all-gpus’ but that also does not seem to help.

Is there something I need to do to get this ‘legacy’ GPU recognized by Cuda?

All cuda API calls return an error code. You should always be checking those when you are having trouble with a CUDA code.

Even better, run the cuda sample code deviceQuery and report what it says. also include the output of nvidia-smi

My guess would be you have 1 of 2 issues:

  1. Improper GPU driver install
  2. optimus issue

For item 1, the exact method you used to install 340.98 and then CUDA 6.5 could have corrupted your 340.98 driver install.

For item 2, something like this is what I am referring to:

https://devtalk.nvidia.com/default/topic/977952/cuda-setup-and-installation/quadro-k1100m-cuda-support/

To have a good shot of resolving item 1, start over with a clean install of the OS and just install CUDA 6.5. It should pull in a usable driver.

@txbob, Thanks for the comments. I had actually started by running deviceQuery - but when that failed, I turned to the simpler program defined above. For the record, when running deviceQuery, I get the following output:

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

So, not much help there. Given the two potential sources you mentioned, I’ll start w/ your Item #2. As your referenced discussion suggested, I searched optimus Cuda and found some instructions for engaging the Optimus set for Thinkpads. (I was encouraged by this b/c I’m trying to do all this on a Lenovo W700 Thinkpad). The particular instructions were for a Thinkpad W520. However, the problem seems to be that I don’t have an ‘Optimus’ option w/in my BIOS (I can only switch between a PCIe graphics option when ‘docked’ or to use the ‘Internal’ graphics card - but there does not seem to be a way to specify my NVidia GPU over the GPU that is probably attached to the motherboard of the laptop.) So the Optimus route does not seem to work for me. (My W700 is either too old - or I have the wrong BIOS installed to get Optimus functioning…)

As for your Item #1, here’s how I got Cuda 6.5 going on my system (w/ comments on why I did each step):

  1. Started with a clean operating install of Ubuntu 14.04
  2. If I try to use any kind of advanced packaging tool (e.g. sudo apt-get install cuda) or even download the cuda 6.5 DEB file, it automatically tries to install Cuda 8.0. So I had to use the ‘distribution-independent package’ install method as follows.
  3. First, I got the 340.98 Nvidia driver running by doing the following:
sudo apt-get --purge remove nvidia-*
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
  1. Then go to the ‘Additional Drivers’ app and specifically select the NVidia 340.98 driver and ‘apply changes’ and then reboot.
  2. Then go to CUDA Toolkit 6.5 | NVIDIA Developer and go to the Linux x86 tab and download the cuda_6.5.14_linux_64.run file (which btw takes at least an hour to download!)
  3. I then extracted the three *.run files w/in this file by running:
sh cuda_6.5.14_linux_64.run -extract=/home/eric/Downloads/

6a. This creates three *.run files:

cuda-samples-linux-6.5.14-18745345.run
cuda-linux64-rel-6.5.14-18749181.run
NVIDIA-Linux-x86_64-340.29.run

6b. Btw, if instead of 340.98 which I installed above, I try to move forward with the 340.29 driver here, when I try to reboot, I am unable to even log-in and am forced to reinstall the O/S and start over!
7. Make the run files executable and run the Cuda install with:

sudo chmod +x *.run
sudo ./cuda-linux64-rel-6.5.14_18749181.run

8 Then edit the path, etc. variables by editing the /etc/environment file to the following:

PATH="/usr/local/cuda-6.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games"
LD_LIBRARY_PATH="/usr/local/cuda-6.5/lib64"
  1. Reboot
  2. Install the samples and run deviceQuery with the following commands
sudo ./cuda-samples-linux-6.5.14-18745345.run
cd /usr/local/cuda/samples
sudo chown -R eric:eric
cd 1_Utilities/deviceQuery
make
./deviceQuery
  1. … and upon doing the above, I get the output I gave at the top of this post.

So, unless someone can see a mistake in the process I’ve used above, it appears that Cuda running on my laptop is not able to access the NVidia GPU. Is it just not possible to ‘turn on’ this GPU for Cuda access b/c it’s too old? Or is there some command or process that would allow me to do that? For example, is the BIOS I’m running out of date? How would I find an updated BIOS that could give me Optimus functionality?

I believe the issue is with the 340.98 driver you pulled in.

Just because you found a driver somewhere on the web that seems to work with your laptop, does not mean it has been packaged properly to support CUDA. This:

sudo add-apt-repository ppa:graphics-drivers/ppa

is not necessarily the best place to go to get drivers. The people who packaged up drivers there are not NVIDIA, and they may not know about or care about properly packaging an NVIDIA driver to support CUDA.

Your best bet (IMO, if you want to make this work) is to grind your way through the process of getting an NVIDIA driver to work. That is, a driver retrieved from an NVIDIA-maintained web site or NVIDIA-maintained repository.

I don’t know much about the design of that W700. If it is old enough, it may predate the Optimus era, in which case there is effectively no Intel iGPU and there is only an NVIDIA dGPU. You can discover this in linux by running:

lspci |grep VGA

and see what display adapters are listed. If the only display adapter listed is a FX2700 (NVIDIA) GPU, then this is not really an optimus laptop.

If both Intel and NVIDIA VGA adapters are listed, then likely the reason for the display failure at step 6b in your sequence is because the NVIDIA driver standard install method is corrupting the X stack of the Intel iGPU which is actually driving the display. In that case you would want to follow a procedure to install the 340.29 (or other NVIDIA-genuine CUDA display driver) while not installing the OpenGL libs, which are what will corrupt the exising X display stack for the Intel iGPU.

A process of that type is outlined here:

https://devtalk.nvidia.com/default/topic/878117/-solved-titan-x-for-cuda-7-5-login-loop-error-ubuntu-14-04-/

And after having said all that, I still offer no guarantees. There may be something about the design of the W700 that makes this extraordinarily difficult, e.g. some interaction that only allows the dGPU to be powered up if a proper Lenovo-approved windows display driver is running (ie. generically an “optimus issue”).

When running the lspci |grep VGA command I get

01:00.0 VGA compatible controller: NVIDIA Corporation G94GLM [Quadro FX 2700M] (rev a1)

so it does seem like this is not an optimum laptop. So if I go with the advice that the NVidia driver I install should be the one attached to the Cuda software, I guess, advice on the discussion boards suggests I should do the following:

sudo apt-get remove nvidia-cuda-*

<download the file: cuda-repo-ubuntu1404_6.5-14_amd64.deb … and then run:>

sudo dpkg -i cuda-repo-ubuntu1404_6.5-14_amd64.deb
sudo apt-get update
sudo apt-get install cuda

The problem is that upon trying to run the last command it appears that it is about to install Cuda 8.0. But I know that is not compatible with my GPU. So I suppose I need to go the route of manually installing that 340.29 driver. And, as you rightly point out, this will break the X server. Can you point me at a set of directions to fix the X server after this happens? (I assume the directions start by hitting Ctrl-Alt-F1?)

Thanks

OK that is not an optimus laptop then.

The best advice I can offer is to carefully follow the instructions here:

[url]http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_Getting_Started_Linux.pdf[/url]

Given that there is no intel display adapter, this should not break the x stack.

The instructions you have shown are approximately correct for the package manager install method. ( Note that there are 2 possible install methods, the other being the runfile install method.) However you would want to be sure that any previous display driver installs were carefully removed also. If in doubt, use the methods described in section 2.6 of the above document to fully remove any previous installs. For example:

sudo apt-get --purge remove nvidia*

in addition to what you have shown

These are actually the instructions I originally tried to follow. However, just to make sure, I just reinstalled the operating system and started from the top. And it fails. I can see two major problems:

First, I go to CUDA Toolkit 6.5 | NVIDIA Developer and download cuda-repo-ubuntu1404_6.5-14_amd64.deb for Cuda verions 6.5.

Then, following step 3.6 of the document you referenced, I run the commands there:

sudo dpkg -i cuda-repo-ubuntu1404_6.5-14_amd64.deb
sudo apt-get update
sudo apt-get install cuda

But this installs Cuda 8.0! (not Cuda 6.5, which is the latest version that is compatible with my GPU). I think the problem is with the ‘sudo apt-get update’ command. <= isn’t that telling the o/s to find the ‘latest/greatest’ version of the software?

And the second (and probably related) problem is that it also installs Nvidia driver 361.93

So, anyway - once it’s all installed, I try to perform some of the post-install steps but they don’t seem to really work. Then, upon reboot, I get into that ‘can’t login loop bug’ referenced previously where it looks like the X-server is broken.

Any suggestions on what I might do next?

Yes, I forgot that when you are installing a legacy CUDA toolkit, the package manager method will pick up the latest toolkit unless you ask for an older one. This is actually covered in section 3.8 of the previously linked document.

And driver 361.93 will not work with that old GPU.

So you can try starting the process over again from a clean setup and do

sudo apt-get install cuda-6-5 (or it may be cuda-6.5)

Alternatively you can try the runfile installer method. It’s possible that the previous attempt with that (and driver 340.29) did not work for you because you did not remove the nouveau driver, which is also covered in the linked document.

Upon your suggestion, I first tried the easier ‘sudo apt-get install cuda-6.5’ but it failed as well - as it tried to install a later graphics driver (I think 360.xx or something close to that).

So I went back to trying to install via the runfiles. Long story short - this fails as well. Here’s what I did:

  1. Clean install of Ubuntu 14.04
  2. I understand section 4.6 in the linked document talks about the Nouveau drivers - but the instructions there are really not that explicit. Luckily, I did found some additional instructions from the accepted answer at apt - Installing and testing CUDA in Ubuntu 14.04 - Ask Ubuntu & on other discussion board about blacklisting Nouveau…
  3. I first blacklisted the Nouveau driver. I did this by adding the following lines to the /etc/modprobe.d/blacklist.conf file:
blacklist nouveau
blacklist lbm-nouveau
options noveau modeset=0
alias nouveau off
alias lbm-nouveau off
  1. And rebooting
  2. Then I purged any existing nvidia drivers (honestly I don’t think there were any) w/ ‘sudo apt-get --purge remove nvidia-*’
  3. I dropped to the physical layer w/ Ctrl-Alt-F1 and ran
sudo service lightdm stop
sudo killall Xorg
cd <to the folder w/ the extracted *.run files>
sudo ./NVIDIA-Linux-x86_64-340.29.run

6a. This starts with an error about the distribution-provided pre-install script failing. However, I just okay’d through that and continued.
6b. It proceeds to try to build the NVIDIA kernel module - but then gives the error:

ERROR: Unable to build the NVIDIA kernel module.

6c. Then tells me:

ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' 
for details.  You may find suggestions on fixing installation problems in the README
available on the Linux driver download page at www.nvidia.com

6d. Taking a look at this log file, the first error it flags seems to be related to the command

test -e include/generated/autoconf.h -a -e include/config/auto.conf

Indicating the the ‘Kernel configuration is invalid.’ <= it might not be able to find the autoconf.h or auto.conf files… But these files do appear to exist in the /usr/src/linux-headers-4.4.0-31-generic/ folder structure.

So I have no idea how to tackle these cryptic errors… At this point, I think I’m at the end of the line. Thanks for all the help anyway, @txbob!

Regarding item 2, on a runfile install, it’s not sufficient to just blacklist nouveau. Any nouveau driver must be removed from the initrd image as well. Section 4.3.5 covers the exact process for an Ubuntu distro, and it looks like you skipped this step:

sudo update-initramfs -u

That is right out of the linked document section 4.3.5, I’m not sure how it could have been any clearer.

However that is not the source of your kernel module build failure, I don’t think.

Unfortunately, there’s not really enough information to go on. I’ve done successful installs with 14.04.01 but haven’t tried 14.04.05 which appears to be what you are using.

The package manager method should be workable as well, but its necessary to get the package manager to install an older/appropriate driver. I rarely use the package manager install (partly due to lack-of-control issues like this) so I don’t know the exact method to force it to choose a 340.xx or 343.xx driver from the NVIDIA repo, which should work with that GPU.

I’m not seeing a section 4.3.5 in the linked document: http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_Getting_Started_Linux.pdf

In section 4.6 (on page 15 of document: DU-05347-001_v6.5 (which is what comes up when I click on the above link)) entitled “Interaction with Nouveau” it says the Nouveau drivers may be installed into the root filesystem (initramfs) and may cause the Display Driver installation to fail and talks about renaming the initramfs-$(uname -r).img file. But in my case $(uname -r) returns ‘4.4.0-31-generic’ and there is no file named ‘/boot/initramfs-4.4.0-31-generic.img’. There is a file called ‘initrd.img-4.4.0-31-generic’. It appears that this Ubuntu version is using the ‘initial ramdisk’ setup instead of the ‘initial RAM file system’… But, bottom line, the directions there seem unclear to me.

Is there another document to which you are referring?

Yes, I was looking at the current install guide:

[url]http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#axzz4R8fhBerV[/url]

sorry. The dracut command is used in Fedora/CentOS/RHEL, but not deb based systems. The corresponding tool in the deb based systems is update-initiramfs

I don’t believe this is the crux of the issue you’re currently having, however.

I agree. But just to be sure, I used the current install guide you ID’d and repeated the process from the top. Still running into the same failure with the kernel. A quick search indicated that the problem may be that the kernel was compiled with a newer version of gcc.

At this point, I’m going to move on. Thanks for your help anyway.

In case anyone else comes along, I was able to get this working with the 340.98 driver downloaded from here:

http://www.nvidia.com/Download/driverResults.aspx/107868/en-us

along with a Quadro FX3700 (a “legacy” GPU).

I set up a test install with Ubuntu 14.04.5. Drivers like 340.29 that comes with CUDA 6.5 installer will not work with these newer Ubuntu kernels, possibly due to this:

https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-331-updates/+bug/1409190

However the offending issue was fixed in 340.98 (runfile installer). Going back to comment #9 in this thread, I believe if you replace this:

sudo ./NVIDIA-Linux-x86_64-340.29.run

with this:

sudo ./NVIDIA-Linux-x86_64-340.98.run

(using the file downloaded from the link above) the compile issue (kernel module build failure) there will be avoided. After that step, I was successfully able to run compiled CUDA codes.

$ uname -a
Linux bob-Precision-WorkStation-T7500 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ nvidia-smi
Mon Nov 28 21:34:15 2016
+------------------------------------------------------+
| NVIDIA-SMI 340.98     Driver Version: 340.98         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro FX 3700      Off  | 0000:03:00.0     N/A |                  N/A |
| 30%   48C   P12    N/A /  N/A |     37MiB /   511MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+
$ /usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery
/usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Quadro FX 3700"
  CUDA Driver Version / Runtime Version          6.5 / 6.5
  CUDA Capability Major/Minor version number:    1.1
  Total amount of global memory:                 511 MBytes (536150016 bytes)
  (14) Multiprocessors, (  8) CUDA Cores/MP:     112 CUDA Cores
  GPU Clock rate:                                1250 MHz (1.25 GHz)
  Memory Clock rate:                             800 Mhz
  Memory Bus Width:                              256-bit
  Maximum Texture Dimension Size (x,y,z)         1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(8192), 512 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(8192, 8192), 512 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  768
  Maximum number of threads per block:           512
  Max dimension size of a thread block (x,y,z): (512, 512, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 1)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             256 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 1, Device0 = Quadro FX 3700
Result = PASS
$

Wow - I concur. That 340.98 driver solved the problem. I can now run the samples (at least deviceQuery) too!

Thanks for hanging in there and solving this @txbob! You rock.