I’m trying to build an Accelerator program for the first time and PGI Accelerator (or CUDA) doesn’t appear to be seeing the NVIDIA hardware.
The build of my little test program goes okay and ends with the message: “Inner sequential loop scheduled on accelerator”. But when I run the program, I get “call to cuInit returned error 100: No device”.
When I run pgaccelinfo -v, I get:
CUDA Driver Version 2030
could not initialize CUDA runtime, error code=100
libamdcalcl.so not found
No accelerators found.
Check that you have installed the CUDA or CAL libraries properly
Check that your LD_LIBRARY_PATH environment variable points to the CUDA or CAL runtime installation directory
Check the permissions on your device
pgfortran -V reports “pgfortran 10.2-1 64-bit target on x86-64 Linux -tp nehalem-64” and it is running on CentOS release 5.4 (Final) (x86_64-redhat-linux-gnu GNU/Linux) on Linux release 2.6.18-164.11.1.el5.
The PGI Installation manual says that the CUDA software is installed as part of the PGI installation but it does not mention CAL. (What is CAL exactly?)
libcuda.so is present and is on the LD_LIBRARY_PATH. I don’t know where to look for libamdcalcl.so, but it is not in /lib, /usr/lib, /usr/local/lib, /opt/pgi/linux86-64/10.2/lib, nor /opt/pgi/linux86-64/10.2/libso.
Please advise where I should go from here. Is there a tool that can detect the NVIDIA hardware? Once I’ve determined that it can be detected, how do I get the PGI Accelerator to see it? Thanks!
Neil L. Jackson
P.S. The hardware is two Intel Xeon 5500 Series processors and a TESLA C1060 card.
P.P.S. nvidia-installer reports that I’m running version 190.18 of the Nvidia driver (and that the latest available version is 256.35). I tried running nvidia-settings, but it complains that I’m not running X-Windows on the Nvidia card (I can’t see any reason why I would) and consequently fails to provide any information.
I would first try updating your NVIDIA driver to something more recent, you can download the CUDA 3.0 drivers here.
If you have the CUDA SDK installed and built, you can try running ‘deviceQuery’ ($SDK_INSTALL_PATH/C/bin/linux/release/deviceQuery); if it doesn’t see your card then the PGI compilers won’t see your card either.
If it doesn’t see your card, check to make sure your driver is loaded:
lsmod | grep nvidia
You should see something similar to:
nvidia 10840968 38
i2c_core 56129 3 i2c_ec,nvidia,i2c_i801
If not, you can run (as root):
modprobe -v nvidia
Also check to make sure the devices have been created in Linux; run:
ls -l /dev/nvidia*
It should show something similar to:
crw-rw-rw- 1 root root 195, 0 Jun 21 09:53 /dev/nvidia0
crw-rw-rw- 1 root root 195, 1 Jun 21 09:54 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Jun 21 09:53 /dev/nvidiactl
You can add devices by running (as root):
mknod /dev/nvidia0 c 195 0
mknod /dev/nvidia1 c 195 1
mknod /dev/nvidiactl c 195 255
You will also have to make sure these devices have appropriate permissions to allow you to access them. If all of that checks out, let me know and we can try something else.
(CAL is part of the AMD Stream SDK, you can just ignore any references to it).
Thank you for this information.
As you suggested, the driver was not loading; there was no entry in /proc/modules or /dev. I reinstalled the NVIDIA driver (devdriver_3.1_linux_64_256.35) and
/sbin/lsmod | grep nvid*
nvidia 11148864 0
i2c_core 56641 3 nvidia,i2c_ec,i2c_i801
And (although I had to add them by hand):
ls -l /dev/nvid*
crwxrwxrwx 1 root root 195, 0 Jul 9 16:14 /dev/nvidia0
crwxrwxrwx 1 root root 195, 1 Jul 9 16:14 /dev/nvidia1
crwxrwxrwx 1 root root 195, 255 Jul 9 16:15 /dev/nvidiact1
I also reinstalled the CUDA Toolkit (cudatoolkit_3.1_linux_64_rhel5.4) and the SDK samples (gpucomputingsdk_3.1_linux).
The Tesla card now shows up in the KDE “Device Manager” tool.
However, pgaccelinfo still gives:
CUDA Driver Version 3010
No accelerators found.
I wasn’t able to successfully build all the programs in the CUDA SDK samples – it gets a few programs in and then linker aborts on failing to find -lGLU – but I did pursuade the deviceQuery program to build okay. It reports:
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount FAILED CUDA Driver and Runtime version may be mismatched.
Press to Quit…
I will try support at NVIDIA, but if you have any further suggestions they would be welcome. Thank you.
I found the solution to the problem half way through the CUDA_Release_Notes_3.1.txt under “Known Issues”. If one isn’t running X-Windows (on the accelerator) then one must run a script at startup to make the card available. (As usual, the solution was: RTFM! I had just assumed that the driver install/setup program would actually set up the card into a usable state in the system.)
The commands in this script are essentially the same as the commands dholt suggested, with possibly the only difference being the permissions on the devices.
pgaccelinfo now shows the Tesla card. (In fact it shows three Tesla C1060 cards (Devices 0, 1, and 2). This might be some sort of mirage but I suppose it’s just conceivable that there are three cards in the machine – that would explain why there are three copies of the Tesla install CD, for example. I’ll need to investigate further…)
Thank you for your assistance.