I’m running a linux-based GPU cluster with S1070’s
% cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.4 (Tikanga)
We have the CUDA3 SDK/Toolkit/Driver installed:
% cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 195.36.15 Fri Mar 12 00:29:13 PST 2010
GCC version: gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
The NVIDIA_GPU_Computing_SDK sample programs for both CUDA and OpenCL build successfully.
However, I am only able to run the CUDA samples.
If I try to run any of the OpenCL samples, such as oclDeviceQuery, I get an error message:
% ./oclDeviceQuery
oclDeviceQuery.exe Starting…
OpenCL SW Info:
Error -1001 in clGetPlatformIDs Call !!!
!!! Error # -1000 () at line 42 , in file oclDeviceQuery.cpp !!!
Exiting…
Error -1001 means that clGePlatformIDs() was unable to find any valid GPU devices.
It is also defined as CL_PLATFORM_NOT_FOUND_KHR in the CL_KHR_ICD spec
Also tried running the program as root with similar results.
I had the same problem – after updating devdriver the clGetPlatformIDs returned -1001 that is CL_PLATFORM_NOT_FOUND_KHR
Then i updated also the cuda-toolkit, and recompiled everything, still i got -1001.
After googling, i found somewhere that running as root works. After running the OpenCL program once as root somehow solved the issue – the run was successfull and after that even normal user runs started to work.
Maybe it installed something somewhere, that needs root rights?
You can try use “strace” to get some output where the program fails.
I had the same problem – after updating devdriver the clGetPlatformIDs returned -1001 that is CL_PLATFORM_NOT_FOUND_KHR
Then i updated also the cuda-toolkit, and recompiled everything, still i got -1001.
After googling, i found somewhere that running as root works. After running the OpenCL program once as root somehow solved the issue – the run was successfull and after that even normal user runs started to work.
Maybe it installed something somewhere, that needs root rights?
You can try use “strace” to get some output where the program fails.
Ok another add. After rebooting the machine (i do it quite seldom, as the machine is dedicated server for developing) the problem comes back, and i have to run the program again as root to get rid of -1001 error.
Ok another add. After rebooting the machine (i do it quite seldom, as the machine is dedicated server for developing) the problem comes back, and i have to run the program again as root to get rid of -1001 error.
Without running X the “/dev/nvidiactl” is missing, and program fails on init.
After starting X one can find
autobot@mylly:~/lib_debug/bin$ ls /dev/nv*
/dev/nvidia0 /dev/nvidiactl
But as there are 2 cards installed, the libraries tries also to open the another card
open(“/dev/nvidia1”, O_RDWR) = -1 ENOENT (No such file or directory)
and that causes program to fail.
This problem and how to solve it is discussed in the CUDA/OpenCL toolkit release notes for linux. If you are not running X11, you need to install a boot time script to create the device files.