cuMemallocPitch for 3D allocations?

cuMemcpy2D has the restriction that any provided pitches must cause row beginnings of a 2D allocation in linear memory to be aligned properly. cuMemallocPitch allocates memory to satisfy this constraint for us. We can also use cuMemcpy2DUnaligned if we know the row pitch may not satisfy the constraint.

My question is this: how do we satisfy the pitch restraints for 3D allocations? (I’m assuming cuMemcpy3D has the same pitch restrictions as cuMemcpy2D). There’s no 3D equivalent to cuMemallocPitch (unless it’s height parameter can simply be height * depth).

Aside: Is cuMemallocPitch just a small wrapper around cuMemcpy2D that does row padding? What are the row alignment requirements?

Im not sure if this is any help at all but i have worked with cudaMalloc3DArray:

const cudaExtent volumeSize = make_cudaExtent(DATA_W, DATA_H, DATA_D);

	CUDA_SAFE_CALL( cudaMalloc3DArray(&a_Data, &floatTex, volumeSize) );

cudaPitchedPtr pagelockedPtr;

 Â  Â pagelockedPtr.pitch = volumeSize.width*sizeof(float);

 Â  Â pagelockedPtr.xsize = volumeSize.width;

 Â  Â pagelockedPtr.ysize = volumeSize.height;

 Â  Â size_t size = volumeSize.width*volumeSize.height*volumeSize.depth*sizeof(float);

 Â  Â CUDA_SAFE_CALL( cudaMallocHost(&(pagelockedPtr.ptr), size) );

 Â  Â memcpy(pagelockedPtr.ptr, h_Data, size);

copyParams.srcPtr   = pagelockedPtr;

copyParams.dstArray = a_Data;

   Â copyParams.extent   = volumeSize;

   Â copyParams.kind   Â  = cudaMemcpyHostToDevice;

 Â  Â CUDA_SAFE_CALL( cudaMemcpy3D(&copyParams) );

This is taken and modified from the texture3d sdk exemple. It uses page locked memory but there is also a branch for non page locked memory in the sample which i have not copied here, you can check it out but i needed page locked memory for my program to work.

I have not paid attention to aligning anything and the dimensions are not powers of 2.


I’m actually more interested in allocations of linear device memory than host memory. In my experiments it appears that alignment is not a concern when transferring to and from host memory. That is, the transfers don’t fail and the data is copied correctly. I have however noticed a performance penalty if the host memory doesn’t have rows aligned properly.

I’ve been suspecting that the answer to my “aside” is that the alignment restrictions are that rows need to be on a multiple of a power-of-two boundary. That is, allocations for power-of-two textures have width_in_bytes == pitch.