Segmentation Fault after rebooting

Hi there,

I’ve spent the last couple of days installing Fedora 13 and CUDA on my new 4 x C2070 machine, and yesterday I was quite excited that everything seemed to finally be working.

But today, when I fired up the PC, I get a Segmentation Fault when I run the SDK sample functions.

What I did the last couple of days was:

Install Fedora 13

Install developement driver from the nvidia homepage

Install CUDA Toolkit

Install GPU computing SDK

Compiling SDK programs from “C” directory (required the installation of some packages and some use of ldconfig)

Run SDK programs - they all worked

Install Matlab 2010b

Run Matlab 2010b CUDA tests (it worked out of the box)

Install free version of CULA

Run CULA examples

Install Intel Parallel Studio XE

Compile MAGMA with Intel compilers

Run MAGMA test programs (seemed to work out of the box)

Now today I:

Start up my computer

Try to run MAGMA tests - I get a Segmentation Fault

Go back to CUDA SDK and run some of the samples in “C” - Segmentation Fault

Recompile the samples of the “C” folder in the SDK - this seems to work

Run the recompiled files - still Segmentation Fault

Except the “deviceQuery” program which correctly returns my 4 Tesla C2070 cards.

I’m guessing that the problem is allocating memory on the tesla card, as matrixMul returns:

Device 0: "Tesla C2070" with Compute 2.0 capability

Using Matrix Sizes: A(640 x 960), B(640 x 640), C(640 x 960)

Segmentation fault (core dumped)

I’ve checked that /usr/local/cuda/bin is in $PATH and that /usr/local/cuda/lib64 and /usr/local/cuda/lib are in LD_LIBRARY_PATH.

Most likely I’m just being stupid, but I can’t figure out which setting I’ve forgotten.

Any suggestion as to what I’m missing?

Sincerely,

Christian Fisker

Ok, going through matrixMul, the seg. fault appears in the lines:

cutilSafeCall(cudaMalloc((void**) &d_A, mem_size_A));

    cutilSafeCall(cudaMalloc((void**) &d_B, mem_size_B));

Any reasons these lines should give a problem because I rebooted the system?

I checked the driver version again - seems correct.

cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module  270.41.19  Mon May 16 23:32:08 PDT 2011

GCC version:  gcc version 4.4.5 20101112 (Red Hat 4.4.5-2) (GCC)

Any help would be appreciated!

Sincerely,

Christian Fisker

After reading some earlier posts, in matrixMul I changed the line

cutilSafeCall(cudaGetDevice(&devID));

to

cutilSafeCall(cudaSetDevice(1));

Forcing the code to run on Device 1 in stead of Device 0. Now matrixMul runs fine.

I assume this is because I use Device 0 with a monitor. Somehow, the programs must have been using another Device yesterday.

Next, to find out if there is a way to make “cudaGetDevice” ignore devices with a monitor connected. Or at least make it choose the card with a monitor last.

/Christian