Array pointing to multiple arrays on the device

christopher.fowler · January 23, 2021, 11:56pm

I have a CPU code that I am trying to set up on the GPU which involves an array of pointers where each address in the array points to one of two different arrays. Very succinctly,

double ** ptrArray_h = new double * [size];

ptrArray_h[i] = &array1_h[some index]
ptrArray_h[i+1] = &array2_h[some other index]

Then, I can do this when using the ptrArray:

value = *ptrArray[index].

I would like to do this on the device and am aware that my pointer array can not be simply copied as is because the device arrays (array1_d and array2_d) have different memory addresses. Typically I would advocate against pointer arrays but I’m not certain how I could go about doing this without this type of indirection. If there is a cleaner way to do this, I’m open to it.

njuffa · January 24, 2021, 12:55am

It is not clear what the question is. CUDA is a subset of C++, so

(1) You can create the same kind of data structures on the device that you create on the host, including an array of pointers to arrays of elements of type T.
(2) You can copy data structures like an array of pointers to arrays of elements of type T just like you normally do in C++, by performing a deep copy.

Some people like to represent 2D matrices as an array of pointers to 1D vectors of T (not something I wold advise, for performance reasons). Here is an example of how such a matrix can be moved between host and device. Your use case seems closely related.

#include <cstdio>
#include <cstdlib>
#include <complex>
#include "cuComplex.h"

#define N  (2)
#define M  (3)

typedef std::complex<float> T;

__global__ void print_device_matrix (cuComplex** mat)
{
    printf ("matrix on device:\n");
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < M; j++) {
            printf ("(%f, %f)  ", cuCrealf (mat[i][j]), cuCimagf (mat[i][j]));
        }
        printf ("\n");
    }
}

int main (void)
{
    /* allocate host "matrix" */
    T **mat = (T**)malloc (N * sizeof (mat[0]));
    for (int i = 0; i < N; i++) {
        mat[i] = (T *)malloc (M * sizeof (mat[0][0]));
    }
    
    /* fill in host "matrix" */
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < M; j++) {
            mat[i][j] = T (float(i)+1, float(j)+1);
        }
    }

    /* print host "matrix" */
    printf ("matrix on host:\n");
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < M; j++) {
            printf ("(%f, %f)  ", real(mat[i][j]), imag(mat[i][j]));
        }
        printf ("\n");
    }

    /* allocate device "matrix" */
    T **tmp = (T**)malloc (N * sizeof (tmp[0]));
    for (int i = 0; i < N; i++) {
        cudaMalloc ((void **)&tmp[i], M * sizeof (tmp[0][0]));
    }
    cuComplex **matD = 0;
    cudaMalloc ((void **)&matD, N * sizeof (matD[0]));

    /* copy "matrix" from host to device */
    cudaMemcpy (matD, tmp, N * sizeof (matD[0]), cudaMemcpyHostToDevice);
    for (int i = 0; i < N; i++) {
        cudaMemcpy (tmp[i], mat[i], M * sizeof (matD[0][0]), cudaMemcpyHostToDevice);
    }
    free (tmp);

    /* print device "matrix" */
    print_device_matrix<<<1,1>>> (matD);

    /* free host "matrix" */
    for (int i = 0; i < N; i++) {
        free (mat[i]);
    }
    free (mat);
    
    /* free device "matrix" */
    tmp = (T**)malloc (N * sizeof (tmp[0]));
    cudaMemcpy (tmp, matD, N * sizeof (matD[0]), cudaMemcpyDeviceToHost);
    for (int i = 0; i < N; i++) {
        cudaFree (tmp[i]);
    }
    free (tmp);
    cudaFree (matD);

    return EXIT_SUCCESS;
}

christopher.fowler · January 24, 2021, 1:33am

I’ll have to study your code. Thank you for the response. I’m just unsure how this works with the references to two different arrays. If I can create the data structure on the device then fair enough - I can do that.

njuffa · January 24, 2021, 1:37am

In my example, each row of the 2D matrix is stored in a separate array of T. The matrix itself is presented by an array of row pointers. Unless I overlooked something, your use case is a 2D matrix with two rows called array1 and array2, and your array of row pointers is called ptrArray.

Robert_Crovella · January 24, 2021, 1:41am

this may be of interest

christopher.fowler · January 24, 2021, 2:39am

I think maybe I’m doing a poor job of explaining what I’m doing. I also may just not be following both of your recommendations very well. Allow me to try again.

Array1_h is of size Ni
Array2_h is of size Np

My ptrArray is pointing to particular addresses in each of these separately allocated arrays:

ptrArray_h[i] = &array1_h[some index]
ptrArray_h[i+1] = &array2_h[some other index]

So the ptrArray at element i is pointing to a single memory reference in either array1 or array2. This way, when I access ptrArray[index] I’m getting a memory address for either array1_h or array2_h. When I dereference ptrArray, i.e. *ptrArray[i] it will give me the value in array1_h or array2_h. In this case, we can be somewhat literal. In my above code I said that *ptrArray_h[i] will give me the value of array1_h[some index]. If I had asked for *ptrArray[i+1] I would get the value of array2_h[some other index].

I guess I’m not entirely sure that I have just a plain 2D array here. I have two separately allocated arrays that need to be conditionally called based on this data structure. I need to be able to access these two different arrays allocated on my device through this ptrArray which in my mind is just a clever way of getting away from having to do an if statement on the device:

if ( index == i ): val = array1_h[some index]
else: val = array2_h[some other index]

Does this make some sense? If it does and your recommendations still hold, I will study them and see if I can make sense of them. Thanks for the quick replies.

rreddy78 · January 24, 2021, 5:26am

Hi,
It seems more like a C question than CUDA.

Here is one idea:

Assuming that both arrays use the same data type, just allocate a single array of size Ni + Np.
Then use an index mapping array which is an array which contains indexes to the unified array.
Now you can do any type of arbitrary mapping

njuffa · January 24, 2021, 7:13am

It would have been good to mention that at the start. So my understanding was mistaken: this is not a 2D matrix-like data structure. But: I does not matter what it is. Any data structure you can construct using pointers on the host, you can likewise construct on the device. And you can copy between them using an appropriate deep copy process. A list can be copied, a tree can be copied, a DAG can be copied.

christopher.fowler · January 24, 2021, 2:45pm

The arrays need to be separate because they are communicated via MPI_Send and MPI_Recv commands differently.

christopher.fowler · January 24, 2021, 2:45pm

This is what I’m going to need to do a bit of reading on. I want to just check one last thing before I give the check-mark to your solution.

I have array1_h which I then (deep?) copy to device to array1_d
I have array2_h which I then (deep?) copy to device to array1_d

I have ptrArray_h which I need to deep copy to the device to ptrArray_d.

If I want *ptrArray_d[i] to give me the value of array1_d[some index] then deep copying array(1/2)_h to array(1/2)_d is also required?

I really appreciate your help pointing me to the terms needed to solve this problem.

Topic		Replies	Views
How do I pass a double pointers array to the device? I'm getting cudaErrorIllegalAddress CUDA Programming and Performance	12	3566	January 17, 2024
2D Array CUDA Programming and Performance	16	77078	January 20, 2012
Allocating a multidimensional array onto a device variable CUDA Programming and Performance	6	1601	July 15, 2015
CUDA 2D Array Problem Need help to manipulate 2D arrays in CUDA CUDA Programming and Performance	4	26443	March 17, 2011
Pointers array CUDA Programming and Performance	7	5568	July 28, 2009
Passing a multidimensional array to kernel how to allocate space in host and pass to device? CUDA Programming and Performance	12	16217	November 22, 2014
[Problem] Misaligned loads for any index from a Array of Arrays CUDA Programming and Performance	8	1359	November 2, 2016
2D Array Not Updated CUDA Programming and Performance	6	5249	May 4, 2010
help cudaMemcpy2d Trying to modify a 2d array on cuda device CUDA Programming and Performance	8	4977	September 11, 2010
Using 2D array in CUDA CUDA Programming and Performance	7	7502	July 21, 2015

Array pointing to multiple arrays on the device

Related topics