copying memory to and from 3D pitched pointers

Maruko · July 26, 2011, 6:32pm

Hi, I wrote the following two functions for copying memory to/from 3D pitched pointers (modified from older functions I found on this forum):

void copy3DHostToPitchedPtr(float *_src, cudaPitchedPtr _dst, int width, int height, int depth)

{

  cudaExtent copy_extent = make_cudaExtent(width*sizeof(float),height,depth);

  cudaMemcpy3DParms copyParams = {0};

  float *h_source = _src;

  copyParams.srcPtr = make_cudaPitchedPtr((void*)h_source, copy_extent.width, copy_extent.width/sizeof(float), copy_extent.height);

  copyParams.dstPtr = _dst;

  copyParams.kind = cudaMemcpyHostToDevice;

  copyParams.extent = copy_extent;

CUDA_SAFE_CALL(cudaMemcpy3D(&copyParams));

  CUT_CHECK_ERROR("Host -> Device Memcpy failed\n");

}

and:

void copy3DPitchedPtrToHost(cudaPitchedPtr _src, float *_dst, int width, int height, int depth)

{

  cudaExtent copy_extent = make_cudaExtent(width*sizeof(float),height,depth);

  cudaMemcpy3DParms copyParams = {0};

  float *h_dest = _dst;

  copyParams.srcPtr = _src;

  copyParams.dstPtr = make_cudaPitchedPtr((void*)h_dest, width*sizeof(float), width, height);

  copyParams.kind = cudaMemcpyDeviceToHost;

  copyParams.extent = copy_extent;

CUDA_SAFE_CALL(cudaMemcpy3D(&copyParams));

  CUT_CHECK_ERROR("Device -> Host Memcpy failed\n");

}

I also wrote a small test kernel that takes a 3d pitched pointer and populates it with numbers related to its element indexes:

__global__ void testKernel(cudaPitchedPtr v, int width, int height, int depth) {

	//calculate x y z coordinates just for test purposes

	unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;

	unsigned int y = blockIdx.x*blockDim.y + threadIdx.y;

	unsigned int z = blockIdx.x*blockDim.z + threadIdx.z;

	char* devPtr = (char *) v.ptr;

	size_t pitch = v.pitch;

	size_t slicePitch = pitch*height;

	char* slice = devPtr + z*slicePitch;

	float* row = (float*)(slice + y*pitch);

	

	//overwrite element

	row[x] = x + y + z;

}

However, when I execute the main function I get an error while copying back from device to host (‘invalid configuration argument’). I also find quite weird the fact that if I do not call my test kernel, but simply copy the memory host->device and then device->host straight away I get no errors and a correct memory transfer. Here is the main function:

void cudaTest(float *v, int width, int depth, int height) {

        cudaPitchedPtr vGPU_;

	width_ = width;

	height_ = height;

	depth_ = depth;

//allocate memory

        cudaExtent pExtent = make_cudaExtent(width*sizeof(float),height,depth);

	CUDA_SAFE_CALL(cudaMalloc3D(vGPU_,pExtent));

	

        //copy host->device

        copy3DHostToPitchedPtr(v,vGPU_,width_,height_,depth_);

//kernel execution

	dim3 threadBlock(8,8,2);

	dim3 gridBlock(iDivUp(width_,threadBlock.x), iDivUp(height_,threadBlock.y), iDivUp(depth_,threadBlock.z));

	testKernel<<<gridBlock,threadBlock>>>(vGPU_,width_,height_,depth_);

//copy device->host

        copy3DPitchedPtrToHost(vGPU_,v,width_,height_,depth_);

}

I am not too sure what I am doing wrong with such a simple code. I am using a GeForce GTX 280 (1.3 computing capabilities) and CUDA 4.0.

spadflyer12 · July 26, 2011, 10:43pm

“Invalid Configuration Argument” is a kernel launch error. cudaMemcpy and its derivatives will return kernel launch errors.

In other words, test to make sure that your kernel is launching correctly. You likely have a grid dimension that is 0.

Maruko · July 27, 2011, 1:35am

Thanks for the reply. When I get the error I also get a line number, which points to the cudaMemcpy3D call in the copy3DPitchedPtrToHost function. I double checked the grid size just to make sure and the values appear to be legal. Could it be something in the size of the host destination pitched pointer?

spadflyer12 · July 27, 2011, 4:50am

void copy3DPitchedPtrToHost(cudaPitchedPtr _src, float *_dst, int width, int height, int depth)

{

  cudaExtent copy_extent = make_cudaExtent(width*sizeof(float),height,depth);

  cudaMemcpy3DParms copyParams = {0};

  float *h_dest = _dst;

  copyParams.srcPtr = _src;

  copyParams.dstPtr = make_cudaPitchedPtr((void*)h_dest, width*sizeof(float), width, height);

  copyParams.kind = cudaMemcpyDeviceToHost;

  copyParams.extent = copy_extent;

CUDA_SAFE_CALL(cudaMemcpy3D(&copyParams));

  CUT_CHECK_ERROR("Device -> Host Memcpy failed\n");

}

Hmm, well you can still get a kernel launch error using CUDA_SAFE_CALL because CUDA_SAFE_CALL goes off of what cudaMemcpy3D spits out. cudaMemcpy3D will spit out an error from your kenel launch and CUDA_SAFE_CALL will think that the error originated from cudaMemcpy3D. The error that you say that you are getting only comes from a kernel launch failure. cudaMemcpy3D will return: cudaSuccess, cudaErrorInvalidValue, cudaErrorInvalidDevicePointer, cudaErrorInvalidPitchValue, cudaErrorInvalidMemcpyDirection, or an error from any other asynchronous call (your kernel).

Try the following changes to your main code:

int main(void)

{

   cudaError status;

// host code....

// kernel launch ....

   // Immediately following your kernel launch: 

   cudaDeviceSynchronize();

   status = cudaGetLastError();

   if(status != cudaSuccess){fprintf(stderr, "radix sort %s\n", cudaGetErrorString(status));}

// rest of your code...

If that doesn’t work here is what I do for my cudaMemcpy3DToHost parameters:

cudaMemcpy3DParms p = { 0 };

//h_extent is the extent used when allocating the 3D device memory

int x = h_extent.width/sizeof(const float);

int y = h_extent.height;

int z = h_extent.depth;

p.srcPtr.ptr = h_pitchedptr.ptr;

p.srcPtr.pitch = h_pitchedptr.pitch;

p.srcPtr.xsize = x;

p.srcPtr.ysize = y;

p.dstPtr.ptr = ((void**)src);

p.dstPtr.pitch = x*sizeof(const float);

p.dstPtr.xsize = x;

p.dstPtr.ysize = y;

p.extent.width = x*sizeof(const float);

p.extent.height = y;

p.extent.depth = z;

p.kind = cudaMemcpyDeviceToHost;

But like I said, I’m pretty confident that the problem is in your kernel.

Maruko · July 27, 2011, 11:09am

You are absolutely spot on. It seems that the problem is the z size of my grid block, which would throw out that error whenever the third dimension is > 1. I looked around the forum and it seems that despite the fact that gridBlock is a 3d vector, the only legal value for the third dimension is 1. However, now it is unclear to me how to generate the indexes within a kernel if I have a 3D thread block and a 2d grid…

spadflyer12 · July 27, 2011, 5:55pm

Yeah the problem is probably that compute capability 1.3 does not support 3D grids.

What you can do is use the y-dimension of your grid for both the y and z indices.

gidy = blockDim.y*blockIdx.y+threadIdx.y;

y_index = gidy/height;

z_index = gidy % height;

Maruko · August 2, 2011, 4:08pm

I changed platform, so now I’m using a Quadro 5000, that should have 2.0 computing capabilities on Win7 64. I left the code unchanged in its structure, however now I get a cudaErrorUnknown immediately after the kernel call, despite the 3D sizes of gridBlock and threadBlock being correct (48,36,4) and (8,8,8) respectively. Is there any way of finding out what the problem might be?

Topic		Replies	Views
Problem with cudaPitchedPtr Can not copy or write to memory pointed to by cudaPitchedPtr CUDA Programming and Performance	1	999	January 1, 2011
Error when attempting to use cudaMemCpy() CUDA Programming and Performance	5	56	August 9, 2024
Copying 2D array from host to device CUDA Programming and Performance	7	7248	July 27, 2010
Copying to a 3D cuda array cudaMemcpyToArray returns cudaErrorInvalidValue CUDA Programming and Performance	4	17816	June 9, 2010
CUDA 4.0 cudaMemcpy3D invalid argument Error copying from Device to Host CUDA Programming and Performance	0	1561	August 14, 2011
Problem about cudaMemcpy3D() CUDA Programming and Performance	9	7097	October 26, 2008
CUDA 2D Array Problem Need help to manipulate 2D arrays in CUDA CUDA Programming and Performance	4	26438	March 17, 2011
Getting an Error Using CudaMalloc3d CUDA Programming and Performance	10	2497	December 10, 2015
Matlab crashes when I use 3d texture in CUDA code CUDA Programming and Performance	4	1270	January 10, 2011
Working with 3D arrays: cudaMemcpy3D(), pointer problem? Can't access 3D copied 3d array with th CUDA Programming and Performance	2	8827	April 10, 2010

copying memory to and from 3D pitched pointers

Related topics