cudaMalloc causes segmentation fault 2 Mo is far from my 1,2 Go card memory limit

nezix · June 27, 2011, 9:36pm

Everything is in the title:

I am doing some allocations using CUDA 4.0, Ubuntu 64 bits and a 470 GTX and for unexplained reasons, a cudaMalloc makes my program crashing.

This is a part of my code:

//nbprot = 3 and the lengths are about 10000

HANDLE_ERROR( cudaMallocHost((void**)&results,sizeof(int3 *)*nbprot));

HANDLE_ERROR( cudaMalloc((void**)&res_dev,sizeof(int3 *)*nbprot));

HANDLE_ERROR( cudaMalloc((void**)&tab_prot_dev,sizeof(char *)*nbprot));

for(i=0;i<nbprot;i++){

	HANDLE_ERROR(cudaMalloc((void**)&res_dev[i],sizeof(int3)*leng[i]));

	HANDLE_ERROR( cudaMalloc((void**)&tab_prot_dev[i],sizeof(char)*(leng[i]+1)));	

	HANDLE_ERROR( cudaMallocHost((void**)&results[i],sizeof(int3)*leng[i],cudaHostAllocDefault)); //page-locked memory

}

The error occurs during the first turn of the loop, at the first instruction.

It started since I have put some cudaMallocHost allocations so I think this is related but even when I remove that part, my cudaMalloc crashes. I changed the size of memory allocated and it still crashes.

I know that I am not using too much memory on my card since I do not make any other allocation before (and after) and my total amount of device memory allocated is about 2 Mo.

Maybe I have to make it an other way.

seibert · June 27, 2011, 10:42pm

The key here is to think about where your pointers are pointing. You appear to be trying to make an array of pointers to other arrays on the device. The problem is that you first allocate the top-level array on the device, then try to pass pointers to those top-level memory locations to a host function. As soon as cudaMalloc() tries to write the pointer to the newly allocated memory segment in the loop, it dereferences a device pointer on the host and segfaults.

To allocate arrays like this, you have to do it backwards. Collect device address to the subarrays in a host array, then allocate the top-level device array and cudaMemcpy() the pointers from host to device.

Better still, don’t allocate storage this way on the device if you have any choice. Pointers to pointers with CUDA are painful to setup (as you can see) and not very good for performance. Multidimensional arrays are best represented as a 1D array with precomputed strides for each dimension.

nezix · June 28, 2011, 9:25am

No I am not. Even if I remove the cudaMallocHost part, the program is crashing (it seems weird cause I already did it a thousand times).

Can you give me a simple example?

I will do that later, maybe, but I want this way to work.

Thank you

avidday · June 28, 2011, 9:51am

But you are:

cudaMalloc((void**)&res_dev[i],sizeof(int3)*leng[i]);

cudaMalloc((void**)&tab_prot_dev[i],sizeof(char)*(leng[i]+1));

both res_dev and tab_prot_dev are device pointers. They cannot be indirected in host code because they do not contain valid host addresses. Further, I suspect that you are not declaring them correctly either, although we don’t get to see that code. To repeat what Seibert advised, do each of the recursive device allocations into a host array, then copy that array from host memory to an array of pointers allocated on the device.

nezix · June 28, 2011, 5:17pm

Obviously there is something that I misunderstand.

That’s why I do a cudaMalloc and not a simple malloc, right?

Here is my light test program:

int main(){

	int3 **res_dev;

	int i;

	int nbprot=3;

	int leng[3];

	

	leng[0]=123;

	leng[1]=15;

	leng[2]=10;

	

	cudaMalloc((void**)&res_dev,sizeof(int3 *)*nbprot);

	

	for(i=0;i<nbprot;i++)

		cudaMalloc((void**)&res_dev[i],sizeof(int3)*leng[i]);

	

	for(i=0;i<nbprot;i++)

		cudaFree(res_dev[i]);

	cudaFree(res_dev);

	

	return 0;

}

This is what I think I do:

-allocate a device array of pointers

-in each cell of the array, allocate an array

It seems to be a simple issue but there is something semantic that I don’t understand here.

avidday · June 28, 2011, 7:54pm

No. cudaMalloc stores a device address in a host pointer. That pointer cannot be indirected in host code - it does not contain a valid host address, it contains an address in GPU memory.

Your repro case should be written something like this (warning not been near a compiler):

int main(){

  int3 **res_dev, **res_host;

  int i;

  int nbprot=3;

  int leng[3];

leng[0]=123;

  leng[1]=15;

  leng[2]=10;

res_host = (int3 **)malloc(sizeof(int3 *)*nbprot);

  cudaMalloc((void***)&res_dev,sizeof(int3 *)*nbprot);

for(i=0;i<nbprot;i++)

    cudaMalloc((void**)&res_host[i],sizeof(int3)*leng[i]);

cudaMemcpy(res_dev, res_host, sizeof(int3 *)*nbprot, cudaMemcpyHostToDevice);

for(i=0;i<nbprot;i++)

    cudaFree(res_host[i]);

cudaFree(res_dev);

  free(res_host);

return 0;

}

Device and host pointers are not interchangeable. Hopefully this example also shows why you should take Seibert’s very sound advice and consider abandoning jagged arrays in favour of linear memory.

tmurray · June 28, 2011, 8:01pm

at some point in the past three years (now I feel old), I posted how to create an array of arrays on the device. can’t find it now, but needless to say it was extremely painful. don’t do it, because it also performs terribly.

nezix · June 28, 2011, 8:30pm

Okay ! That is a big revelation for me :D !

I could find it either but I will not do that.

Thank you all for your time and advices

Topic		Replies	Views
cudaMalloc error in big loop CUDA Programming and Performance	12	15611	May 21, 2008
How to cudaMalloc two-dimensional array ? CUDA Programming and Performance	46	66532	September 7, 2023
using cudaMalloc and cudaFree within a loop unspecified launch failure! CUDA Programming and Performance	21	37700	April 23, 2009
segmentation fault at the first cudaMalloc with --device-emulation everything was fine CUDA Programming and Performance	10	4324	January 25, 2010
cudaMallocHost confusion CUDA Programming and Performance	6	9814	June 24, 2011
Strange memory gremlins Getting pwned by pointers CUDA Programming and Performance	9	12175	July 1, 2009
Segmentation error in cudaMalloc - concurrent kernel execution CUDA Programming and Performance	8	1182	January 30, 2012
CudaMalloc on Vista : strange behaviour Works on XP, Fails on Vista CUDA Programming and Performance	6	12258	July 1, 2009
cudaMalloc failure CUDA Programming and Performance	5	28476	June 22, 2008
Cudamalloc attempting to allocate more memory than it is supposed to CUDA Programming and Performance cuda	15	177	January 13, 2025

cudaMalloc causes segmentation fault 2 Mo is far from my 1,2 Go card memory limit

Related topics