Guide: cudaMalloc3D and cudaArray's

spadflyer12 · July 10, 2011, 12:29am

Hi All,

I’m writing this short guide as a reference for those who wish to use cudaMalloc3D with cudaArray’s allocated using cudaMalloc3DArray. This includes using the 3D textures and 2DLayered textures bound to 3D cudaArrays. After looking around on google for a bit and not finding much I figured that others could probably use this information.

In the following code I use a type called ‘cudaMatrix’ extensively. ‘cudaMatrix’ is simply a front end for managing linear memory allocated with cudaMalloc3D(). I’ve included a header file that defines this class and also includes some working examples of how to use cudaMalloc3D() and cudaMemcpy3D().

Ok, so the root of the problem: Populating a 3D texture with device generated data without having to copy the data back to the host.

Step 1: Setup data in linear memory allocated with cudaMalloc3D.

Information on this can be found in this guide here. Or you can use the cudaMatrixT front end, which makes this really easy.

cudaMatrixT<float> initial_data(nx,ny,nz);

Elements in the cudaMatrix object can be accessed simply through the overloaded parentheses operator. For the purposes of this example my data is just a bunch of random floats.

__global__

void fill_kernel(cudaMatrix<float> data,int3 dims,cudaMatrixT<curandState>random_states)

{

	unsigned int idx = threadIdx.x;

	unsigned int idy = threadIdx.y;

	unsigned int idz = threadIdx.z;

	unsigned int gidx = blockIdx.x*blockDim.x+idx;

	unsigned int gidy = blockIdx.y*blockDim.y+idy;

	unsigned int gidz  = blockIdx.z*blockDim.z+idz;

	if((gidx < dims.x)&&(gidy < dims.y)&&(gidz < dims.z))

	{

		data(gidx,gidy,gidz) = (curand_uniform(&random_states(gidx,gidy,gidz))*100);

	}

}

Step 2: Allocate a cudaArray with cudaMalloc3DArray()

cudaChannelFormatDesc desc = cudaCreateChannelDesc<float>();

	cudaExtent extent;

	extent.width = nx; // Note, for cudaArrays the width field is the width in elements, not bytes

	extent.height = ny;

	extent.depth = nz;

	cudaArray *array = 0;

	cudaMalloc3DArray(&array,&desc,extent,cudaArrayLayered);

Step 3: Copy data in linear memory to a 3D cudaArray using cudaMemcpy3D

cudaPitchedPtr matrixPtr = initial_data.getptr(); // This is the pointer returned from cudaMalloc3D()

	cudaMemcpy3DParms params = {0}; // Initialize to 0

	params.srcPtr = matrixPtr;

	params.dstArray = array;

	params.kind = cudaMemcpyDeviceToDevice;

	params.extent = extent; // This is the extent used to allocate the cudaArray 'array'

	cudaMemcpy3D(&params);

Step 4: Bind the cudaArray to a texture reference

texture<float,cudaTextureType2DLayered,cudaReadModeElementType> texref;

cudaBindTextureToArray(texref,array); // This is using the high-level api

Step 5: Access the data through the texture reference interface

__global__

void check_kernel(cudaMatrixf data,int3 dims)

{

	unsigned int idx = threadIdx.x;

	unsigned int idy = threadIdx.y;

	unsigned int idz = threadIdx.z;

	unsigned int gidx = blockIdx.x*blockDim.x+idx;

	unsigned int gidy = blockIdx.y*blockDim.y+idy;

	unsigned int gidz  = blockIdx.z*blockDim.z+idz;

	float mydata;

	float texdata;

	if((gidx < dims.x)&&(gidy < dims.y)&&(gidz < dims.z))

	{

		texdata = tex2DLayered(texref,gidx,gidy,gidz);

		mydata = data(gidx,gidy,gidz);

		printf(" mydata = %f, texdata = %f @ %i, %i\n",mydata,texdata,gidx,gidy);

	}

}

And that is how you use cudaMalloc3DArray(), cudaMemcpy3D() with an array as the destination, and the bind it to a texture reference. I hope that this was useful. If anyone has any suggestions on how I could improve this or if I did something terribly wrong I’d greatly appreciate the feedback.

**Note, cudamatrix.h and cudamatrix_types.h should be .cuh, but the forum wouldn’t let me upload those file types.

cudamatrix.h (14.1 KB)

cudamatrix_types.h (574 Bytes)

texture_memory.cu (3.55 KB)

Topic		Replies	Views
c++array, cudaarray, texture CUDA Programming and Performance	1	4546	October 18, 2008
cudaMemcpy3D for allocating arrays for textures only? CUDA Programming and Performance	3	1183	March 2, 2011
Binding device memory to 3D texture CUDA Programming and Performance	2	7867	September 25, 2009
Writing to a 3D cudaArray CUDA Programming and Performance	1	8089	March 21, 2010
Transferring multidimensional arrays to GPU CUDA Programming and Performance	1	4541	May 15, 2009
Sample code for linear memory -> 3d texture CUDA Programming and Performance	1	4288	October 6, 2008
Writing to 3D texture CUDA Programming and Performance	10	20013	December 28, 2010
Has anyone got a WORKING example of 3D matrices in CUDA? CUDA Programming and Performance	2	3309	September 7, 2010
3D texture CUDA Programming and Performance	0	993	October 17, 2008
CUDA Array cudaMallocArray Vs CudaAllocArray3D Is there a guideline on selecting MallocArray Vs Mall CUDA Programming and Performance	1	3301	June 3, 2009

Guide: cudaMalloc3D and cudaArray's

Related topics