Allocating an array of pitched arrays

mjmawson · September 28, 2011, 3:20pm

Is it possible to allocate a 2D array using looped mallocs across an allocated array, where the “2nd” dimension allocates a pitched 2d array instead of a linear array? It’s basically to store several 2D pitched arrays. I tried declaring and freeing them recursively as I need them but this process takes up more time than the benefit of having the pitched memory.

Something like this?

size_t dpitchBytes;

int dim[NumberofArrays];//This contains the size for each pitched array

/*  allocate storage for an array of pointers */

  a = malloc(NumberOfArrays * sizeof(double *));

/* for each pointer, allocate storage for apitched array */

  for (i = 0; i < NumberOfArrays; i++) {

    cudaMallocPitch((void**) &a[i], &d_pitchBytes, dim*sizeof(double), dim);

  }

mjmawson · September 28, 2011, 4:50pm

Update: This is my first attempt at putting the pitched arrays within another array.

size_t d_pitchBytes;

size_t memcpyPitch;

double **d_u;

double **d_res;

double **d_rhs

int numGrids=12;

int dimU[numGrids];//These values are initialised elsewhere, but the array is included for completeness

	//Allocate Device Memory

	cudaMalloc((void**) &d_u, numGrids*sizeof(double));

	cudaMalloc((void**) &d_rhs, numGrids*sizeof(double));

	cudaMalloc((void**) &d_res, numGrids*sizeof(double));

	for(i=0;i<numGrids;i++){		

	cudaMallocPitch((void**) &d_u[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

	if(i==0){

		memcpyPitch=d_pitchBytes;

	

	}

	cudaMallocPitch((void**) &d_rhs[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

	cudaMallocPitch((void**) &d_res[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

	Pitch[i]=d_pitchBytes/sizeof(double);

	}

It throws out a cudaError_enum exception when performing the MallocPitch. Any suggestions would be appreciated. Thanks.

MarkusM · September 29, 2011, 7:50am

You can’t access d_rhs[i] or d_res[i] from the host since they are in device memory. You can get it to work by allocating like in your first post and afterwards copying the device pointer array to the device. But as usual pointers to pointers are a bad idea in CUDA. Do your pitched arrays have different lengths or not? If they have the same length you should be able to flatten them into a 1D array as well with dim*NumberofArrays as the y-dimension. (You could also take a look at cudaMalloc3D, which also pitches the allocation.)

mjmawson · September 29, 2011, 10:04am

The arrays are all different sizes. I was flattening them all into a 1D array but I wanted to pad the arrays. I have been previously mallocing and freeing each array in recursive calls of a function in the program. This allowed me to use padding which increased the kernel performances but was mitigated by the constant allocation and freeing of arrays. I want to see if having to dereference the pointers will be faster/slower than constantly mallocing/freeing arrays on the device.

Latest attempt at the code, still give an enum error when performing the pitched malloc.

int dimU[numGrids];//These values are initialised elsewhere, but the array is included for completeness

	double **d_u; 

	double **d_rhs;

	double **d_res;

	double *d_uMalloc;

	double *d_rhsMalloc;

	double *d_resMalloc;

        size_t d_pitchBytes;

        size_t memcpyPitch;

        int *dimU;

        int *Pitch;

Pitch=(int *)malloc(numGrids*sizeof(int));

        dimU=(int *)malloc(numGrids*sizeof(int));

	cudaMalloc((void**) &d_u, numGrids*sizeof(double *));

	cudaMalloc((void**) &d_res, numGrids*sizeof(double *));

	cudaMalloc((void**) &d_rhs, numGrids*sizeof(double *));

	d_uMalloc=(double *)malloc(numGrids*sizeof(double));

	d_rhsMalloc=(double *)malloc(numGrids*sizeof(double));

	d_resMalloc=(double *)malloc(numGrids*sizeof(double));

	for(i=0;i<numGrids;i++){		

	cudaMallocPitch((void**) &d_uMalloc[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

	if(i==0){

		memcpyPitch=d_pitchBytes;

		}

	cudaMallocPitch((void**) &d_rhsMalloc[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

	cudaMallocPitch((void**) &d_resMalloc[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

	Pitch[i]=d_pitchBytes/sizeof(double);

	}

SpongePatoche · September 29, 2011, 10:10am

Hi,

I’m not sure I understand but I have an array of arrays.

I’m not an expert of array 2d but I will make something like that :

// vars

int numGrids = 12;

double **d_res;           // device array of device arrays 2D

double **h_d_tmp;         // host array of device arrays 2D 

int pitchBytes[numGrids];

int dimU[numGrids];

// init host/device memory

h_d_tmp = new double*[numGrids];

for (int i=0 ; i < numGrids ; i++)

{

     cudaMallocPitch((void**) &h_d_tmp[i], &pitchBytes[i], dimU[i] * sizeof(double), dimU[i]);

}

cudaMalloc((void**) &d_res, numGrids * sizeof(double*));

cudaMemcpy(d_res, h_d_tmp, numGrids * sizeof(double*), cudaMemcpyHostToDevice);

// make something by giving d_res to kernel

/// ...

// free host/device memory

for (int i=0 ; i < numGrids ; i++)

{

     cudaFree(h_d_tmp[i]);

}

cudaFree(d_res);

delete[] h_d_tmp;

mjmawson · September 29, 2011, 10:19am

I think it’s the malloc of the host array that’s screwing me up. Should it be:

double **d_uMalloc;

d_uMalloc=(double *)malloc(numGrids*sizeof(double));

Or am I using too many/few dereferencing operators?

mjmawson · September 29, 2011, 2:41pm

This is what I’ve got now. Allocations seem to work but I get enum errors at the memcpy2d.

int dimU[numGrids];//These values are initialised elsewhere, but the array is included for completeness

        double **d_u; 

        double **d_rhs;

        double **d_res;

        double **d_uMalloc;

        double **d_rhsMalloc;

        double **d_resMalloc;

        size_t h_pitchBytes = dimension*sizeof(double);

	size_t memcpyPitch;

	

	cudaMalloc((void**) &d_u, numGrids*sizeof(double *));

	cudaMalloc((void**) &d_res, numGrids*sizeof(double *));

	cudaMalloc((void**) &d_rhs, numGrids*sizeof(double *));

	d_uMalloc=(double**)malloc(numGrids*sizeof(double*));

	d_rhsMalloc=(double**)malloc(numGrids*sizeof(double*));

	d_resMalloc=(double**)malloc(numGrids*sizeof(double*));

	for(i=0;i<numGrids;i++){		

	cudaMallocPitch((void**) &d_uMalloc[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

	if(i==0){

		memcpyPitch=d_pitchBytes;

		printf("%i \n",memcpyPitch);

	}

	cudaMallocPitch((void**) &d_rhsMalloc[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

	cudaMallocPitch((void**) &d_resMalloc[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

	Pitch[i]=d_pitchBytes/sizeof(double);

	}

	printf("%s \n","Device Memory Allocation Completed");

	

cudaMemcpy(d_u,d_uMalloc,numGrids*sizeof(double *), cudaMemcpyHostToDevice);	

cudaMemcpy(d_res,d_resMalloc,numGrids*sizeof(double *), cudaMemcpyHostToDevice);	

cudaMemcpy(d_rhs,d_rhsMalloc,numGrids*sizeof(double *), cudaMemcpyHostToDevice);	

cudaMemcpy2D(d_u[0], memcpyPitch, h_u, h_pitchBytes,dimU[0]*sizeof(double), dimU[0], cudaMemcpyHostToDevice);

cudaMemcpy2D(d_rhs[0], memcpyPitch, h_rhs, h_pitchBytes,dimU[0]*sizeof(double), dimU[0], cudaMemcpyHostToDevice);

cudaMemcpy2D(d_res[0], memcpyPitch, h_rhs, h_pitchBytes,dimU[0]*sizeof(double), dimU[0], cudaMemcpyHostToDevice);

SpongePatoche · September 29, 2011, 3:00pm

This is what I’ve got now. Allocations seem to work but I get enum errors at the memcpy2d.

...

		

cudaMemcpy2D(d_u[0], memcpyPitch, h_u, h_pitchBytes,dimU[0]*sizeof(double), dimU[0], cudaMemcpyHostToDevice);

cudaMemcpy2D(d_rhs[0], memcpyPitch, h_rhs, h_pitchBytes,dimU[0]*sizeof(double), dimU[0], cudaMemcpyHostToDevice);

cudaMemcpy2D(d_res[0], memcpyPitch, h_rhs, h_pitchBytes,dimU[0]*sizeof(double), dimU[0], cudaMemcpyHostToDevice);

You can’t access to device memory : d_u[0] , d_rhs[0] , d_res[0] are impossible.

You can manipulate (assign, copy) pointer of device memory but don’t access of their content.

But what to serve these last 3 lines ???

mjmawson · September 29, 2011, 3:03pm

I want to copy host arrays into d_u[0], d_rhs[0] and d_res[0]. Is it possible to do this? If d_u, d_res and d_rhs were 1D I could.

SpongePatoche · September 29, 2011, 3:15pm

int dimU[numGrids];//These values are initialised elsewhere, but the array is included for completeness

        double **d_u; 

        double **d_rhs;

        double **d_res;

        double **d_uMalloc;

        double **d_rhsMalloc;

        double **d_resMalloc;

        size_t h_pitchBytes = dimension*sizeof(double);

        size_t memcpyPitch;

cudaMalloc((void**) &d_u, numGrids*sizeof(double *));

        cudaMalloc((void**) &d_res, numGrids*sizeof(double *));

        cudaMalloc((void**) &d_rhs, numGrids*sizeof(double *));

        d_uMalloc=(double**)malloc(numGrids*sizeof(double*));

        d_rhsMalloc=(double**)malloc(numGrids*sizeof(double*));

        d_resMalloc=(double**)malloc(numGrids*sizeof(double*));

        for(i=0;i<numGrids;i++)

        {                

             cudaMallocPitch((void**) &d_uMalloc[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

             cudaMemcpy2D(d_uMalloc[i], memcpyPitch, h_u[i], h_pitchBytes,dimU[i]*sizeof(double), dimU[i], cudaMemcpyHostToDevice);

             if(i==0){

                memcpyPitch=d_pitchBytes;

                printf("%i \n",memcpyPitch);

             }

             cudaMallocPitch((void**) &d_rhsMalloc[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

             cudaMallocPitch((void**) &d_resMalloc[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

             cudaMemcpy2D(d_rhsMalloc[i], memcpyPitch, h_rhs[i], h_pitchBytes,dimU[i]*sizeof(double), dimU[i], cudaMemcpyHostToDevice);

             cudaMemcpy2D(d_resMalloc[i], memcpyPitch, h_res[i], h_pitchBytes,dimU[i]*sizeof(double), dimU[i], cudaMemcpyHostToDevice);

             Pitch[i]=d_pitchBytes/sizeof(double);

        }

        printf("%s \n","Device Memory Allocation Completed");

cudaMemcpy(d_u,d_uMalloc,numGrids*sizeof(double *), cudaMemcpyHostToDevice);    

cudaMemcpy(d_res,d_resMalloc,numGrids*sizeof(double *), cudaMemcpyHostToDevice);        

cudaMemcpy(d_rhs,d_rhsMalloc,numGrids*sizeof(double *), cudaMemcpyHostToDevice);

Ok sorry, the day was long …

mjmawson · September 29, 2011, 3:24pm

I want to set the values of the pitched arrays at d_u[0], d_rhs[0] and d_res[0] to the values stored in the host arrays h_u, h_rhs and h_res.

mjmawson · September 29, 2011, 3:39pm

int dimU[numGrids];//These values are initialised elsewhere, but the array is included for completeness

        double **d_u; 

        double **d_rhs;

        double **d_res;

        double **d_uMalloc;

        double **d_rhsMalloc;

        double **d_resMalloc;

        size_t h_pitchBytes = dimension*sizeof(double);

        size_t memcpyPitch;

cudaMalloc((void**) &d_u, numGrids*sizeof(double *));

        cudaMalloc((void**) &d_res, numGrids*sizeof(double *));

        cudaMalloc((void**) &d_rhs, numGrids*sizeof(double *));

        d_uMalloc=(double**)malloc(numGrids*sizeof(double*));

        d_rhsMalloc=(double**)malloc(numGrids*sizeof(double*));

        d_resMalloc=(double**)malloc(numGrids*sizeof(double*));

        for(i=0;i<numGrids;i++)

        {                

             cudaMallocPitch((void**) &d_uMalloc[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

             cudaMemcpy2D(d_uMalloc[i], memcpyPitch, h_u[i], h_pitchBytes,dimU[i]*sizeof(double), dimU[i], cudaMemcpyHostToDevice);

             if(i==0){

                memcpyPitch=d_pitchBytes;

                printf("%i \n",memcpyPitch);

             }

             cudaMallocPitch((void**) &d_rhsMalloc[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

             cudaMallocPitch((void**) &d_resMalloc[i], &d_pitchBytes, dimU[i]*sizeof(double), dimU[i]);

             cudaMemcpy2D(d_rhsMalloc[i], memcpyPitch, h_rhs[i], h_pitchBytes,dimU[i]*sizeof(double), dimU[i], cudaMemcpyHostToDevice);

             cudaMemcpy2D(d_resMalloc[i], memcpyPitch, h_res[i], h_pitchBytes,dimU[i]*sizeof(double), dimU[i], cudaMemcpyHostToDevice);

             Pitch[i]=d_pitchBytes/sizeof(double);

        }

        printf("%s \n","Device Memory Allocation Completed");

cudaMemcpy(d_u,d_uMalloc,numGrids*sizeof(double *), cudaMemcpyHostToDevice);    

cudaMemcpy(d_res,d_resMalloc,numGrids*sizeof(double *), cudaMemcpyHostToDevice);        

cudaMemcpy(d_rhs,d_rhsMalloc,numGrids*sizeof(double *), cudaMemcpyHostToDevice);

Ok sorry, the day was long …

Thanks. If I wanted to pass one of the pitched arrays to a kernel would I pass d_u[i] or d_uMalloc[i]? Thanks for your help. I’m still getting used to device and host pointers.

SpongePatoche · September 29, 2011, 4:24pm

It depends on what element you want to spend :

// You can pass

d_u / d_res / d_rhs

d_uMalloc[X] / d_rhsMalloc[X] / d_resMalloc[X]

// You can't pass

d_u[X] / d_rhs[X] / d_res[X]               // you can't access to pointer in device memory

d_uMalloc / d_rhsMalloc / d_resMalloc      // you can't pass host array

And for free device memory you need make like this :

for (int i=0 ; i < numGrids ; i++)

{

     cudaFree(d_uMalloc[i]);

     cudaFree(d_rhsMalloc[i]);

     cudaFree(d_resMalloc[i]);

}

cudaFree(d_u);

cudaFree(d_rhs);

cudaFree(d_res);

free(d_uMalloc);

free(d_rhsMalloc);

free(d_resMalloc);

mjmawson · September 30, 2011, 11:21am

Ok, so let’s say I have a function that needs to be able to pass any of the d_uMalloc arrays to a kernel within said function. Would I have to declare an array of pointers that points to each d_uMalloc for every x, and then pass that array to my function?

Topic		Replies	Views
Pointers array CUDA Programming and Performance	7	5669	July 28, 2009
2D array allocation and copy ? CUDA Programming and Performance	1	3932	July 14, 2010
2D array & Memory space Mostly about cudaMallocPitch & cudaMemcpy2D CUDA Programming and Performance	1	1531	October 15, 2009
Allocating memory for 2D Array CUDA Programming and Performance	10	3717	September 4, 2009
Very confused with 2d arrays CUDA Programming and Performance	8	13069	February 17, 2011
cudaMallocPitch CUDA Programming and Performance	0	2301	August 6, 2007
cudaMallocPitch CUDA Programming and Performance	5	4564	October 5, 2010
Copying 2D array from host to device CUDA Programming and Performance	7	7379	July 27, 2010
CUDA 2D Array Problem Need help to manipulate 2D arrays in CUDA CUDA Programming and Performance	4	26534	March 17, 2011
pitch / spitch CUDA Programming and Performance	7	5190	February 3, 2012

Allocating an array of pitched arrays

Related topics