Segmentation Fault after rebooting

CFisker · August 11, 2011, 9:51am

Hi there,

I’ve spent the last couple of days installing Fedora 13 and CUDA on my new 4 x C2070 machine, and yesterday I was quite excited that everything seemed to finally be working.

But today, when I fired up the PC, I get a Segmentation Fault when I run the SDK sample functions.

What I did the last couple of days was:

Install Fedora 13

Install developement driver from the nvidia homepage

Install CUDA Toolkit

Install GPU computing SDK

Compiling SDK programs from “C” directory (required the installation of some packages and some use of ldconfig)

Run SDK programs - they all worked

Install Matlab 2010b

Run Matlab 2010b CUDA tests (it worked out of the box)

Install free version of CULA

Run CULA examples

Install Intel Parallel Studio XE

Compile MAGMA with Intel compilers

Run MAGMA test programs (seemed to work out of the box)

Now today I:

Start up my computer

Try to run MAGMA tests - I get a Segmentation Fault

Go back to CUDA SDK and run some of the samples in “C” - Segmentation Fault

Recompile the samples of the “C” folder in the SDK - this seems to work

Run the recompiled files - still Segmentation Fault

Except the “deviceQuery” program which correctly returns my 4 Tesla C2070 cards.

I’m guessing that the problem is allocating memory on the tesla card, as matrixMul returns:

Device 0: "Tesla C2070" with Compute 2.0 capability

Using Matrix Sizes: A(640 x 960), B(640 x 640), C(640 x 960)

Segmentation fault (core dumped)

I’ve checked that /usr/local/cuda/bin is in $PATH and that /usr/local/cuda/lib64 and /usr/local/cuda/lib are in LD_LIBRARY_PATH.

Most likely I’m just being stupid, but I can’t figure out which setting I’ve forgotten.

Any suggestion as to what I’m missing?

Sincerely,

Christian Fisker

CFisker · August 11, 2011, 12:33pm

Ok, going through matrixMul, the seg. fault appears in the lines:

cutilSafeCall(cudaMalloc((void**) &d_A, mem_size_A));

    cutilSafeCall(cudaMalloc((void**) &d_B, mem_size_B));

Any reasons these lines should give a problem because I rebooted the system?

I checked the driver version again - seems correct.

cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module  270.41.19  Mon May 16 23:32:08 PDT 2011

GCC version:  gcc version 4.4.5 20101112 (Red Hat 4.4.5-2) (GCC)

Any help would be appreciated!

Sincerely,

Christian Fisker

CFisker · August 11, 2011, 1:44pm

After reading some earlier posts, in matrixMul I changed the line

cutilSafeCall(cudaGetDevice(&devID));

to

cutilSafeCall(cudaSetDevice(1));

Forcing the code to run on Device 1 in stead of Device 0. Now matrixMul runs fine.

I assume this is because I use Device 0 with a monitor. Somehow, the programs must have been using another Device yesterday.

Next, to find out if there is a way to make “cudaGetDevice” ignore devices with a monitor connected. Or at least make it choose the card with a monitor last.

/Christian

Topic		Replies	Views
Segmentation fault on cudaMalloc even in SDK example CUDA Programming and Performance	6	4651	January 16, 2012
Segmentation Fault on cudaMalloc CUDA Programming and Performance	6	3813	March 28, 2010
Segmentation fault Problem from 1.0 to 2.0 SUSE Enterprise CUDA Programming and Performance	6	3780	November 16, 2008
cudaMalloc in cuda 3.0, Segmentation fault on cudaMalloc CUDA Programming and Performance	0	885	December 1, 2010
Seg Fault CUDA Programming and Performance	2	1812	August 24, 2009
memory fragmentation? CUDA Programming and Performance	2	4304	April 15, 2009
cudaMalloc segfaulting Possible cause? CUDA Programming and Performance	7	4082	September 26, 2008
CudaMalloc? CUDA Programming and Performance	11	9657	December 14, 2010
cudaFree, segmentation fault CUDA Programming and Performance	4	3677	July 29, 2009
cudaMalloc in cuda 3.0 Segmentation fault on cudaMalloc CUDA Programming and Performance	1	1773	October 6, 2010

Segmentation Fault after rebooting

Related topics