Cannot run any CUDA kernels CUDA runtime doesn't recognize NVIDIA GPU

Hi everybody,

This is a repost of what I already wrote here and, since nobody answered, I am trying to raise the attention. ;-)

I’ve installed the NVIDIA Toolkit 2.3 as well as the SDK and the NVIDIA driver (versions 190.18 as well as 190.42) and whenever I try to run a cuda-capable application the CUDA Runtime tells me:

[codebox]~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release$ ./scan

cudaSafeCall() Runtime API error in file <scan.cu>, line 100 : no CUDA-capable device is available.[/codebox]

But the deviceQuery tells me:

[codebox]./deviceQuery

CUDA Device Query (Runtime API) version (CUDART static linking)

There is 1 device supporting CUDA

Device 0: “Quadro FX 570M”

CUDA Driver Version: 2.30

CUDA Runtime Version: 2.30

CUDA Capability Major revision number: 1

CUDA Capability Minor revision number: 1

Total amount of global memory: 133496832 bytes

Number of multiprocessors: 4

Number of cores: 32

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 0.95 GHz

Concurrent copy and execution: Yes

Run time limit on kernels: Yes

Integrated: No

Support host page-locked memory mapping: No

Compute mode: Default (multiple host threads can use this device simultaneously)

Test PASSED[/codebox]

So it looks like I am having the same issue as alecn2002 had in the original posting. But in his case the GFX card overheated and that was the reason for these errors. But the Quadro FX 570M seems to be running just fine; temperature is ok, 3D-applications run fine, so I assume the card to be running fine as well. I tried reinstalling the driver (versions 190.18 as well as 190.42) but it didn’t help.

I read that this may happen in case you’re not having read/write rights on the nvidia devices but this is not the case

[codebox]ll /dev/nv*

crw-rw-rw- 1 root root 195, 0 2009-12-11 14:12 /dev/nvidia0

crw-rw-rw- 1 root root 195, 255 2009-12-11 14:12 /dev/nvidiactl

crw-rw-rw- 1 root kmem 10, 144 2009-12-11 14:12 /dev/nvram

[/codebox]

I am running Ubuntu 09.10, nvidia driver 190.42 as well as CUDA toolkit + sdk 2.3 and compiled the examples with GCC 4.3.4.

[codebox]$gcc --version

gcc (Ubuntu 4.3.4-5ubuntu1) 4.3.4

Copyright © 2008 Free Software Foundation, Inc.

This is free software; see the source for copying conditions. There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.[/codebox]

Does anybody have any idea about what the reason might be that I cannot run any CUDA example?

Any help is highly appreciated! ;-)

Marius

Is this 32 or 64 bit? I am running the exact same drivers and toolkit version on my development box with a pair of GTX275s under 64bit Ubuntu 9.04 and it just works.

Hi Avidday,

I am running on Ubuntu 09.10 32bit version with kernel 2.6.31-16-generic. I tried to reinstall nearly everything (except for Ubuntu itself) but nothing helped.

Ubuntu 9.10 isn’t officially supported, so you might be out of luck.

I don’t see why that should be a problem, but I am going to try that on a 09.04 as well.

I just tried the following drivers wihtout luck:

NVIDIA-Linux-x86-190.53-pkg1

NVIDIA-Linux-x86-195.22-pkg1

And I noticed something really odd; when running the bandwidthtest I get the follwing output:

NVIDIA_GPU_Computing_SDK/C/bin/linux/release/bandwidthTest 

Running on......

	  device 0:Quadro FX 570M

Quick Mode

Host to Device Bandwidth for Pageable memory

cudaSafeCall() Runtime API error in file <bandwidthTest.cu>, line 643 : no CUDA-capable device is available.

How can it find my device and then the runtime isn’t able to use it?

Which brings me to the last point: What is the difference between the deviceQuery using the Runtime to gather information about the device and actually using it?

What it most likely means is that the driver is working fine, but the toolkit libraries aren’t. There are two APIs in CUDA, the direct driver API and the runtime API. The deviceQuery type stuff uses the driver API which talks straight to the driver (libcuda) and looks like it works. Most SDK stuff (and most CUDA code) uses the runtime API requires a toolkit library (libcudart). That looks like it doesn’t work, and changing drivers and/or compiler versions probably won’t help - it may well be a libC incompatibility or something else in the linux runtime for 9.10 which is different to 9.04.

Okay, I did a fresh test install of Ubuntu 09.04, installed the 2.3 Toolkit as well as the SDK and the 190.53 driver.

And still, I am having the same problem as before: deviceQuery as well as deviceQueryDrv both report the correct values, but whenever I try to run any program the Runtime tells me that no CUDA-capable device could be found!
This seems to me to be a runtime bug (that btw. also exists in the 3.0 beta1).

I am out of ideas and am wondering why no one else seems to have had this problem. Maybe it is related to the NVidia card I am using (the Quadro FX 570M)?

It is either a 32 bit problem or a problem related to your hardware. I can’t tell you which because I don’t have access to either a 32 bit system or a mobile card like that to test it, I am sorry.

Can you try one of the cuda driver examples (no Runtime). On my cuda version 3.0:

matrixMulDrv
simpleTextureDrv
vectorAddDrv

Except that I have a similar problem under a 64 bit RedHat Enterprise - the deviceQueryDrv works, but deviceQuery does not. I’ve spent a couple of days trying to figure out how to bypass the runtime stuff, and it is pretty close to impossible for me. There has to be a common bug in the runtime somewhere that is preventing things from working right. At this point I think it might be worth while trying to find the exact call that fails and see what might be done to dig a little deeper into this problem. Attempting to get around the runtime when everything has been built up to depend on it is really frustrating.

I have matrixMulDrv and simpleTextureDrv and they both run fine. Thank you for pointing those out - I will be digging into them a lot!!

The difference between driver and runtime cuda in the generic case:

driver: uses nvcc to make a cubin, cu* calls load cubin and run.

runtime: uses nvcc and g++ to compile a mixed cuda/C++ source. links with cudart library

libcudart.so.3 => /usr/local/cuda/lib64/libcudart.so.3 (0x00007f3b09db5000)

If the second doesn’t work, I would:

verify that the binary is linked properly with the correct cudart (I use LD_LIBRARY_PATH) (note cuda 3.0 and amd64)

ldd -r volumeRender

[codebox]

    linux-vdso.so.1 =>  (0x00007fffc00cf000)

libcudart.so.3 => /usr/local/cuda/lib64/libcudart.so.3 (0x00007f3b09db5000)

libGL.so.1 => /usr/lib/libGL.so.1 (0x00007f3b09b91000)

libGLU.so.1 => /usr/lib/libGLU.so.1 (0x00007f3b09920000)

libX11.so.6 => /usr/lib/libX11.so.6 (0x00007f3b095e5000)

libXi.so.6 => /usr/lib/libXi.so.6 (0x00007f3b093d5000)

libXmu.so.6 => /usr/lib/libXmu.so.6 (0x00007f3b091bc000)

libglut.so.3 => /usr/lib/libglut.so.3 (0x00007f3b08f78000)

libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f3b08c67000)

libm.so.6 => /lib/libm.so.6 (0x00007f3b089e5000)

libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f3b087cf000)

libc.so.6 => /lib/libc.so.6 (0x00007f3b0847a000)

libpthread.so.0 => /lib/libpthread.so.0 (0x00007f3b0825e000)

libdl.so.2 => /lib/libdl.so.2 (0x00007f3b0805a000)

librt.so.1 => /lib/librt.so.1 (0x00007f3b07e51000)

libGLcore.so.1 => /usr/lib/libGLcore.so.1 (0x00007f3b06418000)

libnvidia-tls.so.1 => /usr/lib/tls/libnvidia-tls.so.1 (0x00007f3b06316000)

libXext.so.6 => /usr/lib/libXext.so.6 (0x00007f3b06103000)

libxcb.so.1 => /usr/lib/libxcb.so.1 (0x00007f3b05ee7000)

libXt.so.6 => /usr/lib/libXt.so.6 (0x00007f3b05c83000)

/lib64/ld-linux-x86-64.so.2 (0x00007f3b09ff5000)

libXau.so.6 => /usr/lib/libXau.so.6 (0x00007f3b05a7f000)

libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0x00007f3b0587a000)

libSM.so.6 => /usr/lib/libSM.so.6 (0x00007f3b05671000)

libICE.so.6 => /usr/lib/libICE.so.6 (0x00007f3b05456000)

libuuid.so.1 => /lib/libuuid.so.1 (0x00007f3b05252000)

[/codebox]

And check nvcc

[codebox]x@desktop:/opt/cudabin$ which nvcc

/usr/local/cuda/bin/nvcc

x@desktop:/opt/cudabin$ nvcc --version

nvcc: NVIDIA ® Cuda compiler driver

Copyright © 2005-2009 NVIDIA Corporation

Built on Mon_Oct_26_09:40:14_PDT_2009

Cuda compilation tools, release 3.0, V0.2.1221

$ ldd -r which nvcc

linux-vdso.so.1 =>  (0x00007fff121ff000)

libpthread.so.0 => /lib/libpthread.so.0 (0x00007f86a3014000)

libdl.so.2 => /lib/libdl.so.2 (0x00007f86a2e0f000)

libz.so.1 => /usr/lib/libz.so.1 (0x00007f86a2bf8000)

libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f86a28e8000)

libm.so.6 => /lib/libm.so.6 (0x00007f86a2665000)

libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f86a244f000)

libc.so.6 => /lib/libc.so.6 (0x00007f86a20fb000)

/lib64/ld-linux-x86-64.so.2 (0x00007f86a3257000)

[/codebox]

Then maybe you can narrow it down to a bug or unsupported usage. Maybe try the cuda 3.0 beta.

I get:

[codebox]$ ldd -r volumeRender

    libcudart.so.3 => /usr/local/cuda/lib64/libcudart.so.3 (0x00002b8709dc3000)

    libGL.so.1 => /usr/lib64/libGL.so.1 (0x0000003858600000)

    libGLU.so.1 => /usr/lib64/libGLU.so.1 (0x0000003855600000)

    libX11.so.6 => /usr/lib64/libX11.so.6 (0x0000003855200000)

    libXi.so.6 => /usr/lib64/libXi.so.6 (0x000000385cc00000)

    libXmu.so.6 => /usr/lib64/libXmu.so.6 (0x0000003853200000)

    libglut.so.3 => /usr/lib64/libglut.so.3 (0x00002b870a003000)

    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003865800000)

    libm.so.6 => /lib64/libm.so.6 (0x0000003852a00000)

    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003862000000)

    libc.so.6 => /lib64/libc.so.6 (0x0000003852600000)

    libdl.so.2 => /lib64/libdl.so.2 (0x0000003852e00000)

    libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b870a249000)

    librt.so.1 => /lib64/librt.so.1 (0x0000003853600000)

    libGLcore.so.1 => /usr/lib64/libGLcore.so.1 (0x0000003860800000)

    libnvidia-tls.so.1 => /usr/lib64/tls/libnvidia-tls.so.1 (0x000000366bc00000)

    libXext.so.6 => /usr/lib64/libXext.so.6 (0x0000003855a00000)

    libXau.so.6 => /usr/lib64/libXau.so.6 (0x0000003854600000)

    libXdmcp.so.6 => /usr/lib64/libXdmcp.so.6 (0x0000003854a00000)

    libXt.so.6 => /usr/lib64/libXt.so.6 (0x0000003864c00000)

    libXxf86vm.so.1 => /usr/lib64/libXxf86vm.so.1 (0x00002b870a466000)

    /lib64/ld-linux-x86-64.so.2 (0x0000003852200000)

    libSM.so.6 => /usr/lib64/libSM.so.6 (0x000000385b000000)

    libICE.so.6 => /usr/lib64/libICE.so.6 (0x0000003859600000)

[/codebox]

Which is clearly different, but I don’t know what that means.

For version I get:

[codebox]$ which nvcc

/usr/local/cuda/bin/nvcc

[mrosing@bouredhat include]$ nvcc --version

nvcc: NVIDIA ® Cuda compiler driver

Copyright © 2005-2009 NVIDIA Corporation

Built on Mon_Oct_26_09:40:14_PDT_2009

Cuda compilation tools, release 3.0, V0.2.1221

bin]$ ldd -r nvcc

    libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003853200000)

    libdl.so.2 => /lib64/libdl.so.2 (0x0000003852e00000)

    libz.so.1 => /usr/lib64/libz.so.1 (0x0000003853a00000)

    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003865800000)

    libm.so.6 => /lib64/libm.so.6 (0x0000003852a00000)

    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003862000000)

    libc.so.6 => /lib64/libc.so.6 (0x0000003852600000)

    /lib64/ld-linux-x86-64.so.2 (0x0000003852200000)

[/codebox]

So I have the same nvcc version, but the libraries have different codes. What does the code mean? Edit: I notice that I’m using lib64 compared to your lib, so that might be a difference.

I just verified that the cudart is linked to the correct rt and not some other installed location.

Each binary can use shared libraries. (like DLLs in windows). The binary does not know the actual location of the shared libraries on the filesystem (the OS determines that). This ldd just shows what the OS links the binary to. I do this command whenever I want to verify that the binary is using the libraries that I expect.

edit: I use debian testing, the lib layout is different (lib64 or lib). The code is just an offset, expected to be different.

One thing to do would be check the syslog (/var/log/messages) and also kernel messages: type dmesg when running the binary that doesn’t work.

Nothing in messages, but dmesg has:

[codebox]volumeRender[31970]: segfault at 0000000000000000 rip 0000003858693980 rsp 00007fff8fdad840 error 4

Mandelbrot[31972]: segfault at 0000000000000000 rip 0000003858693980 rsp 00007fffb0b380d0 error 4[/codebox]

which doesn’t mean much to me. Bad pointers if it is trying to read or write 00000?

When you run Mandelbrot, does it say on the console “segmentation fault”. Or go “cuda failed”. Something is trying to dereference a NULL (0) pointer.

It says a lot more, but it ends in segfault:

[codebox]$ ./Mandelbrot

[ CUDA Mandelbrot & Julia Set ]

Initializing GLUT…

freeglut (./Mandelbrot): Unable to create direct context rendering for window ‘./Mandelbrot’

This may hurt performance.

Loading extensions: No error

Error: failed to get minimal extensions for demo

This sample requires:

OpenGL version 1.5

GL_ARB_vertex_buffer_object

GL_ARB_pixel_buffer_object

Segmentation fault

[/codebox]

I think the starting point is not getting the context - if it can’t see the devices there isn’t much else it can do.

Here’s what happens with something simple like the template:

./template

cudaSafeCall() Runtime API error in file <template.cu>, line 76 : CUDA version is insufficient for CUDART version.

Line 76 is:

cutilSafeCall( cudaMalloc( (void**) &d_idata, mem_size));

which is the first call to the runtime library. I didn’t see this before, but I just re-installed everything to make sure I’m actually starting from scratch.

These problems seem different than the original poster. 1. no direct rendering context: are you running over an ssh connection? Requires to be on the same box. If it is, then is the “nvidia” driver in the xorg.conf? is the correct nvidia driver installed? glxgears and glxinfo should work.

I’m running over vcn. When I put the nvidia driver in xorg.conf the system won’t boot. The correct driver is installed - but I think you are correct that any graphics output won’t work directly. I’ll have to use file outputs or figure out what vcn wants to see for video information.

glxgears works!

[codebox]8077 frames in 5.1 seconds = 1597.871 FPS

[root@bouredhat ~]# glxinfo

name of display: :1.0

display: :1 screen: 0

direct rendering: No

server glx vendor string: SGI

server glx version string: 1.2

server glx extensions:

GLX_ARB_multisample, GLX_EXT_visual_info, GLX_EXT_visual_rating, 

GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, GLX_OML_swap_method, 

GLX_SGI_make_current_read, GLX_SGIS_multisample, GLX_SGIX_hyperpipe, 

GLX_SGIX_swap_barrier, GLX_SGIX_fbconfig, GLX_MESA_copy_sub_buffer

client glx vendor string: NVIDIA Corporation

client glx version string: 1.4

client glx extensions:

GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_visual_info, 

GLX_EXT_visual_rating, GLX_EXT_import_context, GLX_SGI_video_sync, 

GLX_NV_swap_group, GLX_NV_video_out, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 

GLX_SGI_swap_control, GLX_ARB_create_context, GLX_NV_float_buffer, 

GLX_ARB_fbconfig_float, GLX_EXT_fbconfig_packed_float, 

GLX_EXT_texture_from_pixmap, GLX_EXT_framebuffer_sRGB, 

GLX_NV_present_video, GLX_NV_copy_image, GLX_NV_multisample_coverage

GLX version: 1.2

GLX extensions:

GLX_ARB_multisample, GLX_EXT_visual_info, GLX_EXT_visual_rating, 

GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, GLX_SGIX_fbconfig, 

GLX_ARB_get_proc_address

OpenGL vendor string: Mesa project: www.mesa3d.org

OpenGL renderer string: Mesa GLX Indirect

OpenGL version string: 1.2 (1.5 Mesa 6.5.1)

OpenGL extensions:

GL_ARB_depth_texture, GL_ARB_imaging, GL_ARB_multitexture, 

GL_ARB_point_parameters, GL_ARB_point_sprite, GL_ARB_shadow, 

GL_ARB_texture_border_clamp, GL_ARB_texture_cube_map, 

GL_ARB_texture_env_add, GL_ARB_texture_env_combine, 

GL_ARB_texture_env_dot3, GL_ARB_texture_mirrored_repeat, 

GL_ARB_texture_non_power_of_two, GL_ARB_window_pos, GL_EXT_abgr, 

GL_EXT_bgra, GL_EXT_blend_color, GL_EXT_blend_func_separate, 

GL_EXT_blend_minmax, GL_EXT_blend_subtract, GL_EXT_draw_range_elements, 

GL_EXT_framebuffer_object, GL_EXT_fog_coord, GL_EXT_multi_draw_arrays, 

GL_EXT_packed_pixels, GL_EXT_rescale_normal, GL_EXT_secondary_color, 

GL_EXT_separate_specular_color, GL_EXT_shadow_funcs, GL_EXT_stencil_wrap, 

GL_EXT_texture3D, GL_EXT_texture_edge_clamp, GL_EXT_texture_env_add, 

GL_EXT_texture_env_combine, GL_EXT_texture_env_dot3, 

GL_EXT_texture_lod_bias, GL_EXT_texture_object, GL_EXT_vertex_array, 

GL_ATI_texture_mirror_once, GL_IBM_texture_mirrored_repeat, 

GL_NV_blend_square, GL_NV_texture_rectangle, GL_NV_texgen_reflection, 

GL_SGIS_generate_mipmap, GL_SGIS_texture_lod, GL_SGIX_depth_texture, 

GL_SGIX_shadow

visual x bf lv rg d st colorbuffer ax dp st accumbuffer ms cav

id dep cl sp sz l ci b ro r g b a bf th cl r g b a ns b eat


0x22 16 tc 0 16 0 r y . 5 6 5 0 0 16 0 0 0 0 0 0 0 None

0x23 16 tc 0 16 0 r y . 5 6 5 0 0 16 8 16 16 16 0 0 0 None

0x24 16 tc 0 24 0 r y . 5 6 5 8 0 16 8 16 16 16 16 0 0 None

0x25 16 tc 0 24 0 r . . 5 6 5 8 0 16 8 16 16 16 16 0 0 None

[/codebox]

that is good to know about -Thanks! but what about the error: CUDA version is insufficient for CUDART version.

What does that mean?

Edit: If I run bandwidthTest I get the same error, and when I run deviceQuery I get “no devices” as well. But it looks like the driver (CUDA version) does not match the library (CUDART). But that is what I downloaded. There must be a way to figure out what the versions need to be to match - maybe the kernel I’m building the driver against needs to change?