cudaMemcpy2D: What's wrong in this code?

jarepau · December 20, 2011, 9:30pm

Hi everybody

I’m trying to pass a 2D matrix of 0’s to device to calculate in the gpu a simple operation:

for example if I have a matrix[height][width] where heigh = 2 an width = 2

matrix[0][0] = 0

matrix[0][1] = 1

matrix[1][0] = 2

matrix[1][1] = 3

Finally I want copy the results from device to host and print results. The code works fine if height = widht, but it doesn’t work if height != width

Any Ideas?

Thank you. I’m desperate :(

#include <stdio.h>

#include <stdlib.h>

#include <cuda.h>

#include <curand_kernel.h>

__global__ void MyKernel(float** dev_matrix, size_t pitch, int width, int height)

{   

	int number = 0;

	for (int i = 0; i < height; ++i) 

	{

		float* row = (float*)((char*)dev_matrix + i*pitch);

		for (int j = 0; j < width; ++j) 

		{	

			row[j] = number; 

			number++;

		}

	}

}

int main (int argc , char * argv [])

{

	int width = 4, height = 2, i, j;

	float matrix[width][height];

	float **dev_matrix;

	size_t pitch;

printf("\nMATRIX MANIPULATION\n");

	for (i = 0; i < height; i++) 

		for (j = 0; j < width; j++)

			matrix[i][j] = 0.0;

	printf("Matrix in host memory\n");

	for (i = 0; i < height; i++) 

	{	

		for (j = 0; j < width; j++)

			printf("%f   ", matrix[i][j]);

		printf("\n");

	}

	cudaMallocPitch(&dev_matrix, &pitch, width * sizeof(float), height);

	cudaMemcpy2D(dev_matrix, pitch, matrix, width * sizeof(float), width * sizeof(float), height, cudaMemcpyHostToDevice);  

	MyKernel<<<1, 1>>>(dev_matrix, pitch, width, height);

	cudaMemcpy2D(matrix, width * sizeof(float), dev_matrix, pitch, width * sizeof(float), height, cudaMemcpyDeviceToHost);

	printf("Matrix after calculate elements in the gpu\n");

	for (i = 0; i < height; i++) 

	{	

		for (j = 0; j < width; j++)

			printf("%f   ", matrix[i][j]);

		printf("\n");

	}

cudaFree(dev_matrix);

	return 0;

}

jarepau · December 20, 2011, 10:40pm

any ideas?

djmj1000 · December 21, 2011, 4:16am

If you want to move only 0’s to the device you do not need a memory copy operation, you need a memory set operation, which is much faster.

If you are 100% sure each element is processed you do not even need a memory set operation, the allocation is just enough, since you write the output of every element.

About memCpy2D i cannot help because I never used 2D and 3D before since the same can be realized with simple memCpy and an array of pointer pointing to the individual rows

jarepau · December 21, 2011, 9:40am

Thank you for the reply

This is an example. In the real code I move random numbers from the host to device.

I think the problem is in the CudaMemcpy2D. More ideas, please?

ikidntu · December 21, 2011, 6:21pm

float matrix[width][height];

but

for (i = 0; i < height; i++) 

                for (j = 0; j < width; j++)

                        matrix[i][j] = 0.0;

printf("Matrix in host memory\n");

        for (i = 0; i < height; i++) 

        {       

                for (j = 0; j < width; j++)

                        printf("%f   ", matrix[i][j]);

                printf("\n");

        }

I think you need to reverse your width and height in the loop or in the declaration

jarepau · December 21, 2011, 7:30pm

Oh my god…thank you so much.
English is not my languaje, and I’ve confused wiht width and height.
Thanks again!

Topic		Replies	Views
copying memory, devicetohost and hosttodevice CUDA Programming and Performance	5	4096	June 25, 2009
[newbie] copy 2d array of chars host to device CUDA Programming and Performance	0	967	May 31, 2011
2D matrix through 1D array CUDA Programming and Performance	4	1124	July 26, 2011
copy pointer to pointer on device CUDA Programming and Performance	8	3225	April 16, 2009
CUDA 2D Array Problem Need help to manipulate 2D arrays in CUDA CUDA Programming and Performance	4	26438	March 17, 2011
Help with cuda 2d array CUDA Programming and Performance	6	7452	September 29, 2014
tex2D help CUDA Programming and Performance	1	2403	June 4, 2011
cudaMallocPitch + cudaMemcpy2D results in 0 ! I use mallocpitch and memcpy2D to copy a matrix to CUDA Programming and Performance	0	2549	June 23, 2011
trouble with cudaMemcpy2D I cant get a matrix to copy into 2D pitched memory CUDA Programming and Performance	1	911	July 13, 2009
copying memory to and from 3D pitched pointers CUDA Programming and Performance	6	6815	August 2, 2011

cudaMemcpy2D: What's wrong in this code?

Related topics