Problem running OpenCL programs

I’m running a linux-based GPU cluster with S1070’s

% cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.4 (Tikanga)

We have the CUDA3 SDK/Toolkit/Driver installed:
% cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 195.36.15 Fri Mar 12 00:29:13 PST 2010
GCC version: gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)

The NVIDIA_GPU_Computing_SDK sample programs for both CUDA and OpenCL build successfully.
However, I am only able to run the CUDA samples.
If I try to run any of the OpenCL samples, such as oclDeviceQuery, I get an error message:

% ./oclDeviceQuery
oclDeviceQuery.exe Starting…

OpenCL SW Info:

Error -1001 in clGetPlatformIDs Call !!!

!!! Error # -1000 () at line 42 , in file oclDeviceQuery.cpp !!!

Exiting…

Error -1001 means that clGePlatformIDs() was unable to find any valid GPU devices.
It is also defined as CL_PLATFORM_NOT_FOUND_KHR in the
CL_KHR_ICD spec

Also tried running the program as root with similar results.

Does anyone have a suggestion on any sort of diagnostic I can run to understand why this is happening? i.e. system log output?

If you are trying to run the programs from a remote machine using ssh, it may be the same issue reported in this post: [url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA

I had the same problem – after updating devdriver the clGetPlatformIDs returned -1001 that is CL_PLATFORM_NOT_FOUND_KHR

Then i updated also the cuda-toolkit, and recompiled everything, still i got -1001.

After googling, i found somewhere that running as root works. After running the OpenCL program once as root somehow solved the issue – the run was successfull and after that even normal user runs started to work.

Maybe it installed something somewhere, that needs root rights?

You can try use “strace” to get some output where the program fails.

I had the same problem – after updating devdriver the clGetPlatformIDs returned -1001 that is CL_PLATFORM_NOT_FOUND_KHR

Then i updated also the cuda-toolkit, and recompiled everything, still i got -1001.

After googling, i found somewhere that running as root works. After running the OpenCL program once as root somehow solved the issue – the run was successfull and after that even normal user runs started to work.

Maybe it installed something somewhere, that needs root rights?

You can try use “strace” to get some output where the program fails.

Ok another add. After rebooting the machine (i do it quite seldom, as the machine is dedicated server for developing) the problem comes back, and i have to run the program again as root to get rid of -1001 error.

Ok another add. After rebooting the machine (i do it quite seldom, as the machine is dedicated server for developing) the problem comes back, and i have to run the program again as root to get rid of -1001 error.

Just to update, this problem still exists with newest driver

NVRM: loading NVIDIA UNIX x86_64 Kernel Module 260.19.26 Mon Nov 29 00:53:44 PST 2010

Without running X the “/dev/nvidiactl” is missing, and program fails on init.
After starting X one can find
autobot@mylly:~/lib_debug/bin$ ls /dev/nv*
/dev/nvidia0 /dev/nvidiactl

But as there are 2 cards installed, the libraries tries also to open the another card
open(“/dev/nvidia1”, O_RDWR) = -1 ENOENT (No such file or directory)
and that causes program to fail.

This problem and how to solve it is discussed in the CUDA/OpenCL toolkit release notes for linux. If you are not running X11, you need to install a boot time script to create the device files.

Could you give me the link please because i don’t find this script

Thanks

edit: i find it

  • In order to run CUDA applications, the CUDA module must be

    loaded and the entries in /dev created. This may be achieved

    by initializing X Windows, or by creating a script to load the

    kernel module and create the entries.

An example script (to be run at boot time):

#!/bin/bash

/sbin/modprobe nvidia

if [ “$?” -eq 0 ]; then

Count the number of NVIDIA controllers found.

N3D=/sbin/lspci | grep -i NVIDIA | grep "3D controller" | wc -l

NVGA=/sbin/lspci | grep -i NVIDIA | grep "VGA compatible controller" | wc -l

N=expr $N3D + $NVGA - 1

for i in seq 0 $N; do

mknod -m 666 /dev/nvidia$i c 195 $i;

done

mknod -m 666 /dev/nvidiactl c 195 255

else

exit 1

fi