CudaMalloc?

Hi all,

I am new at cuda programming. I am working on a project for parallelizing a sequential program.

I began using the cuda 2.3 version on a GPU X295 card. Then I obtained an access to a server with a Fermi card with cuda 3.0 on it.

When I try to run my program on this server with Fermi on it, I have a segmentation fault on the first cudaMalloc of the program.

I have tried some simple programs and they run well on the Fermi.

The messages I have from cuda-gdb are:

Breakpoint 1, cuda_allocation () at kernel.cu:133
133
size_mat=mat_length_maxVProd(cells)sizeof(int);
Current language: auto; currently c++
(cuda-gdb) n
134 size_site=nSites
sizeof(Site);
(cuda-gdb)
135 size_neigh=nebrTabMax
sizeof(int);
(cuda-gdb)
136 size_nebrEl=nSitessizeof(int);
(cuda-gdb)
137 size_inertia = nTypes;
(cuda-gdb)
138 size_massDev = nTypes;
(cuda-gdb)
140 size_interactionTypeDev1 = nebrTabMax;
(cuda-gdb)
141 size_sigDev = nTypes
nTypes;
(cuda-gdb)
142 size_epsDev = nTypesnTypes;
(cuda-gdb)
143 size_productElectroMomentsDev = nTypes
nTypes;
(cuda-gdb)
145 widthTex = ceil(sqrt(nSites));
(cuda-gdb)
147 channelDesc_V = cudaCreateChannelDesc();
(cuda-gdb)
149 cudaMalloc((void**)&mat, size_mat);
(cuda-gdb)
Segmentation fault

The program runs normally (I guess with an accentuated latency), only when I set the -G options te the nvcc Cuda compilator, otherwise it fails with a segmentation fault exactly in the lines I described above.

I paste here what is written in the core generated from the failure of the program:

<<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<< <<<<<<<<<<<<<<<< <<<<<<<<<<<<<

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law. Type “show copying”

and “show warranty” for details.

This GDB was configured as “x86_64-linux-gnu”.

For bug reporting instructions, please see:

http://www.gnu.org/software/gdb/bugs/

Reading symbols from /home/shkurti/Prove/13000configuration/program…done.

[New Thread 13970]

warning: Can’t read pathname for load map: Input/output error.

Reading symbols from /lib/libm.so.6…Reading symbols from /usr/lib/debug/lib/libm-2.11.1.so…done.

done.

Loaded symbols for /lib/libm.so.6

Reading symbols from /usr/lib/libcuda.so.1…(no debugging symbols found)…done.

Loaded symbols for /usr/lib/libcuda.so.1

Reading symbols from /usr/local/cuda/lib64/libcudart.so.3…(no debugging symbols found)…done.

Loaded symbols for /usr/local/cuda/lib64/libcudart.so.3

Reading symbols from /lib/libc.so.6…Reading symbols from /usr/lib/debug/lib/libc-2.11.1.so…done.

done.

Loaded symbols for /lib/libc.so.6

Reading symbols from /usr/lib/libstdc++.so.6…(no debugging symbols found)…done.

Loaded symbols for /usr/lib/libstdc++.so.6

Reading symbols from /lib/libpthread.so.0…Reading symbols from /usr/lib/debug/lib/libpthread-2.11.1.so…done.

done.

Loaded symbols for /lib/libpthread.so.0

Reading symbols from /lib/libz.so.1…(no debugging symbols found)…done.

Loaded symbols for /lib/libz.so.1

Reading symbols from /lib/libdl.so.2…Reading symbols from /usr/lib/debug/lib/libdl-2.11.1.so…done.

done.

Loaded symbols for /lib/libdl.so.2

Reading symbols from /lib/librt.so.1…Reading symbols from /usr/lib/debug/lib/librt-2.11.1.so…done.

done.

Loaded symbols for /lib/librt.so.1

Reading symbols from /lib/libgcc_s.so.1…(no debugging symbols found)…done.

Loaded symbols for /lib/libgcc_s.so.1

Reading symbols from /lib64/ld-linux-x86-64.so.2…Reading symbols from /usr/lib/debug/lib/ld-2.11.1.so…done.

done.

Loaded symbols for /lib64/ld-linux-x86-64.so.2

Core was generated by `./program’.

Program terminated with signal 11, Segmentation fault.

#0 0x00007f75a7ae555e in ?? () from /usr/lib/libcuda.so.1

(gdb) bt

#0 0x00007f75a7ae555e in ?? () from /usr/lib/libcuda.so.1

#1 0x00007f75a7aea99e in ?? () from /usr/lib/libcuda.so.1

#2 0x00007f75a7afdecd in ?? () from /usr/lib/libcuda.so.1

#3 0x00007f75a7b03729 in ?? () from /usr/lib/libcuda.so.1

#4 0x00007f75a7b048b5 in ?? () from /usr/lib/libcuda.so.1

#5 0x00007f75a7b07ee4 in ?? () from /usr/lib/libcuda.so.1

#6 0x00007f75a7b08e2a in ?? () from /usr/lib/libcuda.so.1

#7 0x00007f75a7b0921c in ?? () from /usr/lib/libcuda.so.1

#8 0x00007f75a7c69120 in ?? () from /usr/lib/libcuda.so.1

#9 0x00007f75a7c831db in ?? () from /usr/lib/libcuda.so.1

#10 0x00007f75a794d082 in ?? () from /usr/lib/libcuda.so.1

#11 0x00007f75a78c1f75 in ?? () from /usr/lib/libcuda.so.1

#12 0x00007f75a79020a6 in ?? () from /usr/lib/libcuda.so.1

#13 0x00007f75a78c1c55 in ?? () from /usr/lib/libcuda.so.1

#14 0x00007f75a78c13ed in ?? () from /usr/lib/libcuda.so.1

#15 0x00007f75a78baf91 in ?? () from /usr/lib/libcuda.so.1

#16 0x00007f75a7773d2a in ?? () from /usr/lib/libcuda.so.1

#17 0x00007f75a775bb29 in ?? () from /usr/lib/libcuda.so.1

#18 0x00007f75a7804ab9 in ?? () from /usr/lib/libcuda.so.1

#19 0x00007f75a74b51ce in ?? () from /usr/local/cuda/lib64/libcudart.so.3

#20 0x00007f75a74aabdb in ?? () from /usr/local/cuda/lib64/libcudart.so.3

#21 0x00007f75a74af00c in ?? () from /usr/local/cuda/lib64/libcudart.so.3

#22 0x00007f75a74a8f89 in cudaMalloc () from /usr/local/cuda/lib64/libcudart.so.3

#23 0x000000000043682f in cuda_allocation ()

#24 0x0000000000407aa1 in main (argc=1, argv=0x7fff93e07cc8) at main.c:266

Does anybody have any idea about how can I set right this kind of problem? External Image

Are you re-compiling for the 2.x architecture?

The nvcc flags in my Makefile are these:

NVCC_FLAGS = -O0 -use_fast_math --ptxas-options=-v

I see you’re loading the pthread library.
Is this cudaMalloc inside a pthread?

No, it is not inside a pthread. The cudaMallocs are called in a function performed by the host.

How about posting some actual code?

Here I report the snippets. I have to underline that in the cuda 2.3 version in the x295 I have not a segmentation fault at this point. The problem is only when I run the program in the server with the Fermi card and cuda3.0.

int* mat;

size_mat=mat_length_maxVProd(cells)sizeof(int); // mat_length_max is a constant (in my case it has a value of 40) and VProd is a Macro which does cells.xcells.ycells.z (and in my case cells.x=cells.y=cells.z=2 )

cudaMalloc((void**)&mat, size_mat); // here there is a segmentation fault and this is the first cudaMalloc of the program

cudaError_t error=cudaGetLastError();if(error != cudaSuccess)printf(“cudaMalloc: %s\n”,cudaGetErrorString(error));

Are you recompiling this code for the Fermi card with a CUDA 3.0 or later toolkit, or are you trying to run the CUDA 2.3 version directly on the Fermi machine? I am pretty certain the latter is not going to work.

Yes, before I run the code on Fermi, I recompile the code with the CUDA 3.0 toolkit on the server provided with the Fermi GPU architecture.

Well then there’s something in your code that we’re not seeing.

You say you have:

int* mat;
int size_mat=320*sizeof(int);
cudaMalloc((void**)&mat, size_mat);

There is nothing wrong with that code and it runs perfectly well on Fermi with cuda 3.0

The problem cannot be here.

I have the same problem, also moving from a 200 series card to my brand new 580s.
Compilation works fine, no warnings, but the first cudaMalloc that gets called returns an invalid argument statement.
I don’t see how that’s possible, since this is all the call is:
SC( cudaMalloc((void**)&m_pSimInput_CUDA, 3 * 256 * 320 * sizeof(unsigned char)) );

where m_pSimInput is in the class header:
unsigned char *m_pSimInput_CUDA;

and SC is just a macro for CUDA_SAFE_CALL

All of the SDK examples compile and run just fine of course.

As with ardis, everything runs fine on my 295 machine.

Solved, was using some binaries compiled pre-Fermi of course. I forgot I was linking to them…