Allocate non-constant number of arrays on device

Thomas_I · April 17, 2009, 6:17pm

Hi everyone,

I’m new user of CUDA, so sorry if my question looks stupid. I read several topic in this forum by I didn’t see the solution :-(

I’m transforming a c code in cuda code. The “kernel” algorithm uses dynamic programming (each new value needs previous computations). Each thread performs this algorithm and constructs a matrix (N*M). Both lengths depends on the input data. N is the same for all threads, but M is different.

So I want allocate T arrays of size (N*M) of float on the device. In the following code, N is the “input_size”, M is given by the “reference_size” array and T corresponds to the “number_reference”. For arrays allocation no problem (if I’ve not made mistakes) :

[codebox]

float* compute_score(float2** references_vec, float2* input_vec, int number_reference, int* reference_size, int input_size)

[…]

    for(int i = 0; i < number_reference; i++) {

        float* devPtr;

        int pitch;

            cudaMallocPitch((void**)&devPtr, &pitch,reference_size[i]*sizeof(float),input_size);

   }[/codebox]

or something like that.

My problem is how to store the pointers “devPtr” in an array in order to each thread can find its array. (I guess that a float** array declared in host code can not store device pointers ).

I read that cudaGetSymbolAddress works only for “device” variables.

One solution is to use 3D array, but it means a huge loss of memory.

I tried many things but none works.

Thank you for your help !

navier-stokes · April 17, 2009, 6:42pm

Hi,

at first you have to allocate memory on the device for the pointers to the arrays. It may look like this (not testet):

cudaMalloc((void**)&devPtr, number_of_arrays*sizeof(float*));

float **devPtrDummy;

devPtrDummy = malloc(number_of_arrays*sizeof(float*));

Now there’s an array of float* on the device and on the host.

The second step is to loop over all the pointers and to allocate the desired number of array elements for each array:

for(i = 0; i< number_of_arrays; i++)

  cudaMalloc((void**)(devPtrDummy+i), length_of_array[i]*sizeof(float));

Now, you have allocatet enough mem for the arrays on the device. But these arrays are not reachable from the device:

cudaMemcpy((void*)devPtr, (void*)devPtrDummy, cudaMemcpyHostToDevice);

Now devPtr points to an array (on device) of pointers to an array (on device).

Hope I answered your question :unsure:

Thomas_I · April 20, 2009, 9:34pm

Thank you, it looks working (I have to try on my complete code).

I have one more question :

I tried the following code :

[codebox]device float** devPtr;

global myFunc() {

    float* tmp = devPtr[num_thread]; //num thread is well calculated

    ...

}

void init() {

    //int size well calculated

    cudaMalloc((void**)&devPtr, size);

    float** tmp = (float**) malloc(size);

    for(int i = 0; i < N; i++) {

            ...

    }

    cudaMemcpy(devPtr,tmp...)

    ...

    myFunc<<<dimGrid,dimBlock>>>();

    ...

}[/codebox]

My problem is that in the kernel function, devPtr[num_thread] return seg fault (I work in emulation mode). Apparently the devPtr is set to (nil) in kernel, and I don’t understant why : cudaMalloc returns no error and the pointer exists.

If I put devPtr as parameter of kernel function, all works. So apparently devPtr isn’t initialized in global scope. Maybe I made a mistake, but I thought that device float** devPtr was global and persistent.

Thank you

Topic		Replies	Views
Strange memory gremlins Getting pwned by pointers CUDA Programming and Performance	9	12170	July 1, 2009
Problems with creating an array of Cuda pointers CUDA Programming and Performance	7	13541	April 20, 2009
Question Regarding an Array of Pointers on the Device CUDA Programming and Performance	1	8509	January 22, 2009
__device__ variables and arrays CUDA Programming and Performance	8	15127	August 16, 2014
passing an array to a kenel ? CUDA Programming and Performance	9	13083	June 10, 2009
List of pointers to device CUDA Programming and Performance	7	5568	May 8, 2008
a problem with double pointer CUDA Programming and Performance	6	9126	February 16, 2011
Allocating a multidimensional array onto a device variable CUDA Programming and Performance	6	1585	July 15, 2015
Pointers array CUDA Programming and Performance	7	5562	July 28, 2009
CUDA - Array of Pointers - Revisited accessing pointers inside kernel CUDA Programming and Performance	1	3505	May 28, 2010

Allocate non-constant number of arrays on device

Related topics