SDK sample code failures only on samples that launch a kernel

hello all, thanks for looking:

fedora core 9, 64bit, c1060 bd and 9800GT bd., latest driver.

for example, the “release” build of simpleMultiGPU when run gives an error:
cutilCheckMsg() CUTIL CUDA error: reduceKernel() execution failed.
in file <simpleMultiGPU.cpp>, line 92 : invalid device function .
Segmentation fault

it’s the line in simpleMultiGPU.cpp (90) that launches the kernel code.
the emurelease simpleMultiGPU works.

this goes for any examples that run a “kernel”, all give some similar error.

is there a linux kernel module to load that i’m just missing to run GPU SDK examples?
(other examples run fine. ie: emurelease)

thanks,
ek

what does the devicequery program from the SDK tell you?

Device 0: “GeForce 9800 GT”

Major revision number: 1

Minor revision number: 1

Total amount of global memory: 536150016 bytes

Number of multiprocessors: 14

Number of cores: 112

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 8192

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.51 GHz

Concurrent copy and execution: Yes

Device 1: “Tesla C1060”

Major revision number: 1

Minor revision number: 3

Total amount of global memory: 4294705152 bytes

Number of multiprocessors: 30

Number of cores: 240

Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 16384 bytes

Total number of registers available per block: 16384

Warp size: 32

Maximum number of threads per block: 512

Maximum sizes of each dimension of a block: 512 x 512 x 64

Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

Maximum memory pitch: 262144 bytes

Texture alignment: 256 bytes

Clock rate: 1.30 GHz

Concurrent copy and execution: Yes

Test PASSED

thanks–ek

That is strange, I don’t have a clue what is wrong, since the driver is apparently loaded, since the devicequery works. Does the bandwith test also work?

yes, on both devices too (–device=0 or --device=1).

this has me stumped too. any other ideas?

also, let me get this straight, in emulation, it runs on the cpu’s not the gpu’s right?

eddie

btw: added cudaSetDevice(dev); (dev = 1, for c1060 bd or dev = 0 for 9800gt bd)

to the simpleMultiGPU.cpp and results and error are the same.

yep.

I honestly have no clue, and it looks like mfatica is on holidays, so maybe send him a pm? (he knows a lot about linux & cuda)

what driver are you using? are you mixing 2.1 toolkit with a 2.0 SDK? what version of libcuda.so is actually installed? what kind of machine is this, and what motherboard is it using?

i let the NVIDIA web site choose what got installed (by choosing the linux 64bit/FC9 drivers, etc)

NVIDIA-Linux-x86_64-180.06-pkg2.run

cuda-linux64-rel-nightly-2.1.1635-3065709.run

cuda-sdk-linux-2.10.1126.1520-3141441.run

(which is all correct, i bet…)

as for HW: (see a post from yesterday for an output of “deviceQuery”

9800gt video bd

c1060 telsa bd.

EVGA mb – NOTE: it’s a “nForce 780i SLI” mb w/3ea PCI-E x16 slots

(which make almost EVERYTHING on the MB made by NVIDIA) (should buy NVIDIA stock…)

8 gigs ram, 500g hd, 850w ps, etc. except for cdroms/dvd’s and chassis, it’s all new.

the PN for the MB: 132-CK-NF78 (nForce 780i SLI)

if i forgot anything or you need more info, please let me know. --eddie

just wondering - could a segmentation fault be a hardware error?

Let me ask a few more questions:

  1. is it possible to write a multicpu (multigpu) software without a kernel?

  2. is the memory on cuda complient boards memory mapped?

(could the segmentation fault be a host system fault?)

thanks, eddie

Only for make sure you have install what you said

Know the devices

$lspci | grep VGA
glxinfo | grep rendering

glxinfo | grep NVIDIA
ldconfig -p | grep cuda

lspci | grep VGA

03:00.0 VGA compatible controller: nVidia Corporation Unknown device 0614 (rev a2)

glxinfo | grep rendering

direct rendering: Yes

glxinfo | grep NVIDIA

server glx vendor string: NVIDIA Corporation

client glx vendor string: NVIDIA Corporation

OpenGL vendor string: NVIDIA Corporation

OpenGL version string: 2.1.2 NVIDIA 177.82

ldconfig -p | grep cuda

    libicudata.so.38 (libc6,x86-64) => /usr/lib64/libicudata.so.38

    libcufftemu.so.2 (libc6,x86-64) => /usr/local/cuda/lib/libcufftemu.so.2

    libcufftemu.so (libc6,x86-64) => /usr/local/cuda/lib/libcufftemu.so

    libcufft.so.2 (libc6,x86-64) => /usr/local/cuda/lib/libcufft.so.2

    libcufft.so (libc6,x86-64) => /usr/local/cuda/lib/libcufft.so

    libcudart.so.2 (libc6,x86-64) => /usr/local/cuda/lib/libcudart.so.2

    libcudart.so (libc6,x86-64) => /usr/local/cuda/lib/libcudart.so

    libcuda.so.1 (libc6,x86-64) => /usr/lib64/libcuda.so.1

    libcuda.so.1 (libc6) => /usr/lib/libcuda.so.1

    libcuda.so (libc6,x86-64) => /usr/lib64/libcuda.so

    libcuda.so (libc6) => /usr/lib/libcuda.so

    libcublasemu.so.2 (libc6,x86-64) => /usr/local/cuda/lib/libcublasemu.so.2

    libcublasemu.so (libc6,x86-64) => /usr/local/cuda/lib/libcublasemu.so

    libcublas.so.2 (libc6,x86-64) => /usr/local/cuda/lib/libcublas.so.2

    libcublas.so (libc6,x86-64) => /usr/local/cuda/lib/libcublas.so

there is a output of “deviceQuery” listed in one of my replies.

thanks for looking – eddie

You stated that you installed 180.06, yet the output you just provided indicates that you have 177.82. Did you switch drivers, or is your driver installation broken (due to a known Ubuntu bug)?

that’s interesting, but the install files that i listed the other day are still

on my computer. like i said, let the Nvidia web site do the choosing.

(fc9, 64-bit.) these are the files in my Download directory:

cuda-linux64-rel-nightly-2.1.1635-3065709.run

cuda-sdk-linux-2.10.1126.1520-3141441.run

NVIDIA-Linux-x86_64-180.06-pkg2.run

could there be a problem with the install files?

thanks, eddie

The only problem that I see here is that your driver installation is broken. If (re)installing 180.06 does not fix the issue, then please generate and attach an nvidia-bug-report.log.

that certainly took care of the “segmentation fault” on kernel SDK samples.

wish i knew how that old driver got on the system (new harddrive).

i’m happy - thanks for the help.

eddie

Only guess, perhaps you installed from the ubutu repos, then you only double clicked the installation file of 180, maybe ;).