How to cudaMalloc two-dimensional array ?

diego_ldu777 · January 13, 2019, 5:28pm

Hi everyone

I am having problems with my code when passing it to the device, now my host matrix is created without problems in the allocate_init_2Dmatrix call function and in turn it already places the values inside the matrix, when creating the ad pointer corresponding to the one that will go to the device in the solver function is not carrying anything and immediately leaves the function, I have placed a printf inside the solver to verify the value of the pointer but I do not print anything, this is my code, I have implemented it as the options I found but the problem I think lies in that to go to the kernel should be the address of a pointer float, ie a float ***, this is my code, if someone can help me I would appreciate it a lot

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#define MAX_ITER 1000000
#define MAX 100 //maximum value of the matrix element
#define TOL 0.000001

// Generate a random float number with the maximum value of max
float rand_float(int max){
  return ((float)rand()/(float)(RAND_MAX)) * max;
}


// Allocate 2D matrix
void allocate_init_2Dmatrix(float ***mat,  int n, int m){
  int i, j;
  *mat = (float **) malloc(n * sizeof(float *));
  for(i = 0; i < n; i++) {
    (*mat)[i] = (float *)malloc(m * sizeof(float));
    for (j = 0; j < m; j++)
      (*mat)[i][j] = rand_float(MAX);
  }

}


// solver
__global__ void solver(float ***matd, int n, int m){
  float diff = 0, temp;
  int done = 0, cnt_iter = 0;
 int j= blockIdx.x*blockDim.x + threadIdx.x;
 int i= blockIdx.y*blockDim.y + threadIdx.y;
printf("valor de matd:%f\n",***matd);
  while (!done && (cnt_iter < MAX_ITER)){
    diff = 0;
      if (i < n - 1 && j < m - 1){
       temp = (*matd)[i][j];
        (*matd)[i][j] = 0.2 * ((*matd)[i][j] + (*matd)[i][j - 1] + (*matd)[i - 1][j] + (*matd)[i][j + 1] + (*matd)[i + 1][j]);
        diff += abs((*matd)[i][j] - temp);
  //      printf("diff:%f\n",diff);
      }

    if (diff/n/n < TOL)
      done = 1;
    cnt_iter ++;
  }

  if (done)
    printf("Solver converged after %d iterations\n", cnt_iter);
  else
    printf("Solver not converged after %d iterations\n", cnt_iter);

}

int main(int argc, char *argv[]) {
  int n;
  float **a,**ad;
struct timeval start, end,start1, end1;
    double mtime, seconds, useconds,x,mtime1, seconds1, useconds1,y;
    gettimeofday(&start, NULL);
        dim3 DimGrid(10);
        dim3 DimBlock(128);
  if (argc < 2) {
    printf("Call this program with two parameters: matrix_size communication \n");
    printf("\t matrix_size: Add 2 to a power of 2 (e.g. : 18, 1026)\n");

    exit(1);
  }

  n = atoi(argv[1]);
  float *temph[n];
  printf("Matrix size = %d\n", n);
  allocate_init_2Dmatrix(&a, n, n);





//  cudaMalloc(&ad,n*n*sizeof(float ));
//  cudaMemcpy(ad,a,n*n*sizeof(float),cudaMemcpyHostToDevice);
// Allocate 2D array in Device
  cudaMalloc((void **)&ad,n*sizeof(float *));
 for (int i = 0; i < n; i++){
        cudaMalloc(&temph[i], n*sizeof(float));
}
  cudaMemcpy(ad,temph,n*sizeof(float *),cudaMemcpyHostToDevice);

 for (int i = 0; i < n; i++){
  cudaMemcpy(temph[i],a[i],n*sizeof(float *),cudaMemcpyHostToDevice);
}


gettimeofday(&start1, NULL);
//printf("valor de ad:%f\n",&a);
  solver<<<DimGrid, DimBlock>>>(&ad, n, n);
cudaMemcpy(a,ad,n*n*sizeof(float),cudaMemcpyDeviceToHost);
cudaFree(ad);
gettimeofday(&end1, NULL);
    seconds1 = end1.tv_sec - start1.tv_sec;
    useconds1 = end1.tv_usec - start1.tv_usec;
    mtime1 = ((seconds1)*1000+ useconds1/1000);
    y=mtime1/1000;
    printf("\nTiempo calculo de funcion solver es: %g segundos", y);
gettimeofday(&end, NULL);
    seconds = end.tv_sec - start.tv_sec;
    useconds = end.tv_usec - start.tv_usec;
    mtime = ((seconds)*1000+ useconds/1000);
    x=mtime/1000;
    printf("\nTiempo total de programa: %g segundos\n", x);

  return 0;
}

Robert_Crovella · January 14, 2019, 2:10am

In the future I suggest you start a new thread, rather than adding to this thread that is about 6.5 years old.

There are a variety of errors in your code.

You cannot pass &ad as an argument to your kernel. The address of that variable is a host pointer. Attempting to dereference a host pointer in device code is illegal. Furthermore, this construct is not needed for the 2D allocation scheme you seem to have in mind.
We need to convert your kernel code to use (matd) directly, not (*matd)
Your kernel appears to be doing jacobi iteration/relaxation, and is using the four neighbors at each point. However your bounds checking is only testing for 2 edges, you need to test for all 4 edges, to prevent out-of-bounds access when accessing the -1 and +1 neighbors.
Your device allocation code was not quite correct. When actually allocating for data for each pointer, you want to allocate for (float) not (float *) storage.
The copy of data from device to host was not correct. The copy there needs to be similar to the copy in the direction from host to device (i.e. with a loop, using your temph pointer storage).

Here’s a modified code that runs without execution errors (I have no idea if the calculations are correct or not). In the future I suggest using proper CUDA error checking and run your code with cuda-memcheck, before asking others for help:

$ cat t367.cu
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#define MAX_ITER 1000000
#define MAX 100 //maximum value of the matrix element
#define TOL 0.000001

// Generate a random float number with the maximum value of max
float rand_float(int max){
  return ((float)rand()/(float)(RAND_MAX)) * max;
}

// Allocate 2D matrix
void allocate_init_2Dmatrix(float ***mat,  int n, int m){
  int i, j;
  *mat = (float **) malloc(n * sizeof(float *));
  for(i = 0; i < n; i++) {
    (*mat)[i] = (float *)malloc(m * sizeof(float));
    for (j = 0; j < m; j++)
      (*mat)[i][j] = rand_float(MAX);
  }

}

// solver
__global__ void solver(float **matd, int n, int m, bool debug){
  float diff = 0, temp;
  int done = 0, cnt_iter = 0;
 int j= blockIdx.x*blockDim.x + threadIdx.x;
 int i= blockIdx.y*blockDim.y + threadIdx.y;
 if (debug) printf("valor de matd:%f\n",**matd);
  while (!done && (cnt_iter < MAX_ITER)){
    diff = 0;
      if ((i < n - 1) && (j < m - 1) && (i > 0) && (j > 0)){
       temp = (matd)[i][j];
        (matd)[i][j] = 0.2 * ((matd)[i][j] + (matd)[i][j - 1] + (matd)[i - 1][j] + (matd)[i][j + 1] + (matd)[i + 1][j]);
        diff += abs((matd)[i][j] - temp);
  //      printf("diff:%f\n",diff);
      }

    if (diff/n/n < TOL)
      done = 1;
    cnt_iter ++;
  }
  if (debug){
    if (done)
      printf("Solver converged after %d iterations\n", cnt_iter);
    else
      printf("Solver not converged after %d iterations\n", cnt_iter);
  }
}

int main(int argc, char *argv[]) {
  int n;
  float **a,**ad;
struct timeval start, end,start1, end1;
    double mtime, seconds, useconds,x,mtime1, seconds1, useconds1,y;
    gettimeofday(&start, NULL);
        dim3 DimGrid(10);
        dim3 DimBlock(128);
  if (argc < 2) {
    printf("Call this program with two parameters: matrix_size communication \n");
    printf("\t matrix_size: Add 2 to a power of 2 (e.g. : 18, 1026)\n");

    exit(1);
  }

  n = atoi(argv[1]);
  float *temph[n];
  printf("Matrix size = %d\n", n);
  allocate_init_2Dmatrix(&a, n, n);

// Allocate 2D array in Device
  cudaMalloc((void **)&ad,n*sizeof(float *));
 for (int i = 0; i < n; i++){
        cudaMalloc(&temph[i], n*sizeof(float));
}
  cudaMemcpy(ad,temph,n*sizeof(float *),cudaMemcpyHostToDevice);

 for (int i = 0; i < n; i++){
  cudaMemcpy(temph[i],a[i],n*sizeof(float),cudaMemcpyHostToDevice);
}

diego_ldu777 · January 15, 2019, 2:25am

Hello Robert, thank you very much for your prompt reply, I have already been able to fix the inconvenient and I have been able to pass the matrix to the GPU, the problem I have now is about the iterations since it uses Gauss-Seidel I do not know how to implement the condition so that it works Well, I’ve put the code that I have so far in a new topic as you recommended me, I’m new to this of Cuda so I hope you can guide me a bit on how to implement it

https://devtalk.nvidia.com/default/topic/1046233/cuda-programming-and-performance/gauss-seidel-iterations-gpu/

knasiotis · September 7, 2023, 2:46pm

float **cpuArray = someValid2DHostarray;

for(int i=0; i<5; i++)

{

   cudaMemcpy(h_temp[i], cpuArray[i], 3*sizeof(float), cudaMemcpyHostToDevice);

}

I know this thread is ancient however this memcpy throws "argument of type “float” is incompatible with parameter of type “void *”. I am completely lost of how this is solveable.
I hope you are having a good time all these years.

Robert_Crovella · September 7, 2023, 3:00pm

with a proper definition of h_temp. You haven’t shown that. The definition you need is something like:

float *h_temp[5];

as shown here

knasiotis · September 7, 2023, 3:06pm

you are right, I wonder why that is. I tried dynamically allocating it

Robert_Crovella · September 7, 2023, 3:10pm

You can do a dynamic allocation if you wish, then you need to start with something like

float **h_temp;

The topic of multidimensional array allocation has a lot of variety and has been covered in many places. here is a summary post.

Topic		Replies	Views
2D Array Not Updated CUDA Programming and Performance	6	5232	May 4, 2010
2d array testing in very simple code using CUDA CUDA Programming and Performance	29	30396	November 15, 2010
Can we do malloc inside a __global__ function CUDA Programming and Performance	26	9628	February 21, 2010
cudaMalloc error in big loop CUDA Programming and Performance	12	15583	May 21, 2008
Allocating an array of pitched arrays CUDA Programming and Performance	13	6485	September 30, 2011
cudaMalloc causes segmentation fault 2 Mo is far from my 1,2 Go card memory limit CUDA Programming and Performance	7	7460	June 28, 2011
using cudaMalloc and cudaFree within a loop unspecified launch failure! CUDA Programming and Performance	21	37661	April 23, 2009
Strange memory gremlins Getting pwned by pointers CUDA Programming and Performance	9	12170	July 1, 2009
CUDA 2D Array Problem Need help to manipulate 2D arrays in CUDA CUDA Programming and Performance	4	26435	March 17, 2011
Dynamic Heap initialization CUDA Programming and Performance	12	252	June 24, 2024

How to cudaMalloc two-dimensional array ?

Related topics