cudaMallocPitch() + cudaMemcpy2D() Want to check if the copy of 2D data between host and dev is work

Greetings,

I’m having some trouble to understand if I got something wrong in my programming or if there’s an unclear issue (to me) on copying 2D data between host and device. I’m using cudaMallocPitch() to allocate memory on device side. I want to check if the copied data using cudaMemcpy2D() is actually there.

Here is the example code (running in my machine):

#include <iostream>

using namespace std;

#define CUDA_SAFE(x) do { if (( x ) != cudaSuccess) { \

						 cout << "Error at " << __FILE__ << " : " \

							  << __LINE__ << endl \

							  << cudaGetErrorString( x ) << endl;\

						 return -1 ;}} while (0)

			 

int main( int argc, char** argv) 

{

	size_t i, j, Row, Col, pitch;

	float **devP, **hostP;

	

	Row = 8;

	Col = 16;

	hostP = new float*[Row];

	for (i = 0; i < Row; i++) hostP[i] = new float[Col];

	//-- allocate device memory

	CUDA_SAFE(

	 cudaMallocPitch(&devP, &pitch, Col*sizeof(float), Row)

	);

 	

 	//-- initialize host matrix (Row X Col)

	for (i = 0; i < Row; i++)

		for (j = 0; j < Col; j++)

			hostP[i][j] = (float)i+j;

	

	//-- print host information

	cout << " Before ========= " << endl;

	for (i = 0; i < Row; i++)

	{

		cout << "[" << i << "] ";

		for (j = 0; j < Col; j++)

			cout << hostP[i][j] << " ";

		cout << endl;

	}

	//-- copy host matrix to device

	CUDA_SAFE(

	 cudaMemcpy2D(devP, pitch, hostP, Col*sizeof(float), 

				  Col*sizeof(float), Row, cudaMemcpyHostToDevice)

	);

	

	//-- destroy host information

	for (i = 0; i < Row; i++)

		for (j = 0; j < Col; j++)

			hostP[i][j] = 99.0f;

	//-- copy back device to host matrix

	CUDA_SAFE(

	 cudaMemcpy2D(hostP, Col*sizeof(float), devP, pitch, 

				   Col*sizeof(float), Row, cudaMemcpyDeviceToHost)

	);

	//-- print updated host information

	cout << " After ========= " << endl;

	for (i = 0; i < Row; i++)

	{

		cout << "[" << i << "] ";

		for (j = 0; j < Col; j++)

			cout << hostP[i][j] << " ";

		cout << endl;

	}

	//-- free memory (device and host)

	cudaFree(devP);

	delete [] hostP;

return 0;

}

This is what I get:

Before sending to device

Before ========= 

[0] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 

[2] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 

[3] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 

[4] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

[5] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

[6] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 

[7] 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

After destroying host side information and copying device’s backup

After ========= 

[0] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 

[2] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 

[3] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 

[4] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

[5] 5 6 7 8 9 10 11 12 99 99 99 99 99 99 99 99 

[6] 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 

[7] 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99

If one changes Row and/or Col values, he’ll see that there is no visible pattern to this behavior. At least no pattern I could find… :-(

If somebody just could help me, I think I’m blind to the reason my code gets this behavior…

At least, could somebody run this code in his machine and see if it presents the same behavior?

I’m using:

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2012 NVIDIA Corporation

Built on Thu_Apr__5_00:24:31_PDT_2012

Cuda compilation tools, release 4.2, V0.2.1221

Your host storage is not contiguous, being comprised of individually allocated column vectors, each one pointed to by an element of the hostP vector, which is a vector of pointers.

cudaMemcpy2D() expects contiguously stored data, and it expects a pointer to the start of the data to be copied. Your code passes a pointer to the array of column pointers (namely hostP), which is not the same thing.

If you want to retain the host-side storage as is, you will have to copy column vectors individually with a 1D copy. I would suggest changing the host-side allocation to a contiguously stored 2D matrix.

1 Like

Thank you, njuffa!! :rofl:

I’ve never though about contiguously stored data while creating dynamic 2d arrays with the C++ operator new (I was already aware that vector class couldn’t deal with contiguously stored data)… This way I’d never found the error in my code… Just in case, I’ve searched a little bit about this subject and I came out with this solution:

//-- dynamic 2d arrays with contiguously stored data

//-- this is a simple template class, it could be improved :-)

template <class T> class array2d 

{

    T *data;

    size_t R, C;

  public:

    array2d (size_t row, size_t col)

    {

      data = new T [row*col];

      R = row;

      C = col;

    }

    T* operator[] (size_t row) { return &(data[row*C]);	}

    T* operator& () { return &(data[0]); }

    ~array2d () { delete [] data; }

};

Inside main():

//-- define and allocate host memory 2D matrix

    array2d<float> hostP(Row,Col);

Using like this, one could refer to hostP elements in the two dimension address way, as [font=“Courier New”]hostP[i][j][/font] already done in my code.

To use as an argument to cudaMemcpy2D(), update to:

//-- copy host matrix to device

    CUDA_SAFE(

     cudaMemcpy2D(devP, pitch, &hostP, Col*sizeof(float), 

                  Col*sizeof(float), Row, cudaMemcpyHostToDevice)

    );

and…

//-- copy back device to host matrix

    CUDA_SAFE(

     cudaMemcpy2D(&hostP, Col*sizeof(float), devP, pitch, 

	          Col*sizeof(float), Row, cudaMemcpyDeviceToHost)

    );

Those are all modifications I needed to do. Now it’s working 100%!

Thank you once again! It was great to get in touch with you! :thumbup: