How to deal with dynamically allocated 3-dimentional arrays in device's memory?

Florimond · April 13, 2013, 5:31pm

Hello,

I have to port to CUDA a pre-existing “host-only” backpropagation implementation. Let me briefly introduce the purpose of the array I’m talking about…
We are talking about neural networks. The current “host-only” implementation use several arrays, but lets just talk about the one keeping the weights of connections between neurons.
It looks like following:

w[layer][i][j]

This array is supposed to give the weight of the connection between the neuron “i” from layer “layer”, and the neuron “j” from layer “layer – 1”. When the program starts, the user chose a number of layers, and for each layer, a number of neurons. The “w” array is then dynamically built using multiple malloc calls.
The algorithms then update weights in this array during the training phase of the network. Obviously, this array has to be in the device memory when running the CUDA version.

–About CUDA–

This array has to be persistent between two call of the kernel function, and accessed by the host at the end of a training session.
First, I thought about the classic cudaMalloc function used from host code. But I don’t see how I could easily allocate such an array from the host, into the device memory. I would have to keep track of several pointers, and do numerous cudaMemcpy to update pointers in device’s memory. It does not look like an efficient solution to me.
Then, I wondered if I could call a global function that would directly build the array from inside the device code, an global initalization function called with <<<1,1>>>. Something like this:

typedef struct
{
…
int L; // number of layer without taking the inpuft layer into account
…
double ***w;
…
} WorkingData;

HOST CODE, FROM CPP CLASS

WorkingData* dev_workingData; // pointer to device memory
cudaMalloc((void**)&dev_workingData, sizeof(WorkingData));
CUDA_initWorkingData(dev_workingData, L);

CODE FROM CU FILE

__global__ void initWorkingData(WorkingData* p_workingData, int p_L)
{
	...
	p_workingData->w = (double ***)malloc( (p_L+1) * sizeof(double **) );
	...
}

extern "C" void CUDA_initWorkingData(WorkingData* p_workingData, int p_L)
{
	initWorkingData<<<1,1>>>(p_workingData, p_L);
}

Now, the thing is that such a code gives me an “calling a host function(“malloc”) from a global function” error, with Visual Studio 2010, despite the “compute_20,sm_20” option in the “Code generation” parameter of my project and CU file (and I have a 2.1 device).

I also read here and there that such malloc inside a kernel should be avoided.

Here are my questions:

Why such an error with Visual Studio?
Why malloc should be avoided inside a kernel?
Is it ok to write such a kernel inteded only to be called with <>>?
Any hint about a different way of doing what I want to do?

Thank you
(Sorry, but the [CODE] tag interpretation seems to bug and mess all the code if I use more than one instance of it)

Florimond · April 14, 2013, 2:27am

Ok, I got the answer to my first question here: Kernel Malloc sm_20 - CUDA Programming and Performance - NVIDIA Developer Forums

I had to remove sm_10.

As you probably understood, I’m a beginner. I search for my answers, but I’m not reluctant to receive some help.

Thanks.

Topic		Replies	Views
Can we do malloc inside a __global__ function CUDA Programming and Performance	26	10090	February 21, 2010
how to create a dynamic array in the device function? CUDA Programming and Performance	4	14996	November 13, 2009
malloc inside a CUDA kernel malloc for a pointer inside CUDA, declared in host code CUDA Programming and Performance	3	10079	January 5, 2012
cudaDeviceSynchronize returned error code 700 while using dynamically allocated array? CUDA Programming and Performance	8	696	May 30, 2023
Question Dynamic Memory Allocation in the kernel function CUDA Programming and Performance	2	3720	November 30, 2009
Dynamic memory allocation during kernel execution Is it posible? CUDA Programming and Performance	13	169596	January 25, 2013
cudaMalloc from inside a kernel CUDA Programming and Performance	3	12923	September 2, 2009
__device__ function array help CUDA Programming and Performance	4	1329	September 30, 2010
malloc memory in kernel linked via in/out variable CUDA Programming and Performance	10	2088	October 17, 2015
question about memory allocation CUDA Programming and Performance	1	1662	October 16, 2007

How to deal with dynamically allocated 3-dimentional arrays in device's memory?

Related topics