cuMemallocPitch for 3D allocations?

revane · June 21, 2008, 1:45pm

cuMemcpy2D has the restriction that any provided pitches must cause row beginnings of a 2D allocation in linear memory to be aligned properly. cuMemallocPitch allocates memory to satisfy this constraint for us. We can also use cuMemcpy2DUnaligned if we know the row pitch may not satisfy the constraint.

My question is this: how do we satisfy the pitch restraints for 3D allocations? (I’m assuming cuMemcpy3D has the same pitch restrictions as cuMemcpy2D). There’s no 3D equivalent to cuMemallocPitch (unless it’s height parameter can simply be height * depth).

Aside: Is cuMemallocPitch just a small wrapper around cuMemcpy2D that does row padding? What are the row alignment requirements?

Ailleur · June 21, 2008, 1:50pm

Im not sure if this is any help at all but i have worked with cudaMalloc3DArray:

const cudaExtent volumeSize = make_cudaExtent(DATA_W, DATA_H, DATA_D);

	CUDA_SAFE_CALL( cudaMalloc3DArray(&a_Data, &floatTex, volumeSize) );

cudaPitchedPtr pagelockedPtr;

 Â  Â pagelockedPtr.pitch = volumeSize.width*sizeof(float);

 Â  Â pagelockedPtr.xsize = volumeSize.width;

 Â  Â pagelockedPtr.ysize = volumeSize.height;

 Â  Â size_t size = volumeSize.width*volumeSize.height*volumeSize.depth*sizeof(float);

 Â  Â CUDA_SAFE_CALL( cudaMallocHost(&(pagelockedPtr.ptr), size) );

 Â  Â memcpy(pagelockedPtr.ptr, h_Data, size);

copyParams.srcPtr Â  = pagelockedPtr;

copyParams.dstArray = a_Data;

 Â  Â copyParams.extent Â  = volumeSize;

 Â  Â copyParams.kind Â  Â  = cudaMemcpyHostToDevice;

 Â  Â CUDA_SAFE_CALL( cudaMemcpy3D(&copyParams) );

This is taken and modified from the texture3d sdk exemple. It uses page locked memory but there is also a branch for non page locked memory in the sample which i have not copied here, you can check it out but i needed page locked memory for my program to work.

I have not paid attention to aligning anything and the dimensions are not powers of 2.

revane · June 23, 2008, 2:24pm

Hi,

I’m actually more interested in allocations of linear device memory than host memory. In my experiments it appears that alignment is not a concern when transferring to and from host memory. That is, the transfers don’t fail and the data is copied correctly. I have however noticed a performance penalty if the host memory doesn’t have rows aligned properly.

I’ve been suspecting that the answer to my “aside” is that the alignment restrictions are that rows need to be on a multiple of a power-of-two boundary. That is, allocations for power-of-two textures have width_in_bytes == pitch.

Topic		Replies	Views
Problem with 2D memory copy using pitch CUDA Programming and Performance	6	6471	November 20, 2011
What are row alignments for 2D arrays used for? CUDA Programming and Performance	1	712	October 11, 2019
cudaMemcpy3D behaviour CUDA Programming and Performance	5	1791	March 20, 2009
cudaMalloc3D and friends proper use for whatever data type CUDA Programming and Performance	6	5923	July 14, 2010
What is the stream-ordered equivalent of cudaMallocPitch? CUDA Programming and Performance cuda	2	1318	September 18, 2021
Understanding Memory Pitch Alignment CUDA Programming and Performance	9	11993	October 13, 2015
How to determine the base adress alignment and pitch alignment used by 'cudaMallocPitch' ? CUDA Programming and Performance	4	2511	June 9, 2016
cudamemcpy2Dasync + stream create stream for 2D array CUDA Programming and Performance	5	3819	May 27, 2009
CudaMallocPitch and CudaMemcpy2D CUDA Programming and Performance	7	5574	August 3, 2015
Allocating 2D Grid Access Performance CUDA Programming and Performance	3	500	November 8, 2012

cuMemallocPitch for 3D allocations?

Related topics