help cudaMemcpy2d Trying to modify a 2d array on cuda device

Hello!

I’m trying to make a 2d array, copy to cuda device increase every element by 1.0 and copy back to host memory, but the code dies in cudaMemcpy2d. What did i do wrong?

[codebox]// example1.cpp : Defines the entry point for the console application.

//

//#include “stdafx.h”

#include <stdio.h>

#include <cuda.h>

#include <cutil.h>

// Kernel that executes on the CUDA device

global void mod_array(float **d, int x, int y, int dimA)

{

int idx = blockIdx.x * blockDim.x + threadIdx.x;

int row = idx/x;

int column = idx%x;

if (idx<dimA)

  d[row][column] = d[row][column] +1.f;

}

// main routine that executes on the host

int main(void)

{

float **host_array2d, **device_array2d;

int size_X = 160;

int size_Y = 120;

int dim = size_X * size_Y;

int numThreadsPerBlock = 4;

int n_blocks = dim/numThreadsPerBlock + (dim%numThreadsPerBlock == 0?0:1);

size_t memSize = size_X * sizeof(float);

host_array2d = (float **) malloc(memSize);

// Initialize host array

for (int i=1; i<size_X-1; i++){

   host_array2d[i] =(float*) malloc(size_Y * sizeof(float));

   for (int j=1; j<size_Y-1; j++){

	   host_array2d[i][j]=((float)i * (float)size_X)+(float)j;

   }

}

size_t d_pitch ;

cudaMallocPitch( (void **) &device_array2d, &d_pitch, size_Y * sizeof(float), size_X);

printf(" d_pitch = %d \n", d_pitch);

//copy host_array to device_memory

cudaMemcpy2D( device_array2d, d_pitch, host_array2d, size_Y*sizeof(float), size_Y*sizeof(float), size_X, cudaMemcpyHostToDevice ); 

// Do calculation on device:

mod_array <<< n_blocks, numThreadsPerBlock >>> (device_array2d, size_Y, size_X, dim);

// Retrieve result from device and store it in host array

cudaMemcpy2D( host_array2d, size_Y*sizeof(float), device_array2d, d_pitch, size_Y*sizeof(float), size_X, cudaMemcpyDeviceToHost ); 



// Cleanup

free(host_array2d); 

cudaFree(device_array2d);

}

[/codebox]

Creating a 2D array the way you are is not done that way in CUDA. For one, on the following line

cudaMallocPitch( (void **) &device_array2d,  &d_pitch, size_Y * sizeof(float), size_X);

you need a void** type, but when you give it device_array2d (which is already a double pointer), you’re using ‘&’ which makes it a reference, of a reference, of a reference, or in simple terms a 3D array. I’m surprised it compiled or didn’t give you a warning about that. There are quite a few problems with your code as far as 2D arrays go, and if you’re wanting a true 2D device array (without using cudaMalloc2d() and the likes) then there is a lot more involved.

If all you want is for an easy way to make, transfer, and access 2D arrays on the host I made a class that deals with single and double array memory for both host and device. Here is the link, CUDAMemory. It includes an example main.cpp file so I won’t show an example here unless you need it.

Creating a 2D array the way you are is not done that way in CUDA. For one, on the following line

cudaMallocPitch( (void **) &device_array2d,  &d_pitch, size_Y * sizeof(float), size_X);

you need a void** type, but when you give it device_array2d (which is already a double pointer), you’re using ‘&’ which makes it a reference, of a reference, of a reference, or in simple terms a 3D array. I’m surprised it compiled or didn’t give you a warning about that. There are quite a few problems with your code as far as 2D arrays go, and if you’re wanting a true 2D device array (without using cudaMalloc2d() and the likes) then there is a lot more involved.

If all you want is for an easy way to make, transfer, and access 2D arrays on the host I made a class that deals with single and double array memory for both host and device. Here is the link, CUDAMemory. It includes an example main.cpp file so I won’t show an example here unless you need it.

Of course no & needed i typed my code not copyed. Sorry. But the rest is like my source, i checked.

About memcopy. I used 1 dimensional arrays in my programs fine thx, and i do not wish to flatten my 2d array. The code still not getting through cudaMallocPitch line. Can u tell me whats next using 2d memcopy?

Of course no & needed i typed my code not copyed. Sorry. But the rest is like my source, i checked.

About memcopy. I used 1 dimensional arrays in my programs fine thx, and i do not wish to flatten my 2d array. The code still not getting through cudaMallocPitch line. Can u tell me whats next using 2d memcopy?

Basically to make a true 2 dimensional array on the device you have to make a single array on the device, then make each part of the second dimension on the host and then transfer it to the device. I suggest you take a look at the link I provided. It does all 2D memory management for you between the host and device, and you can get a raw 2D device pointer to use in kernels. It is very similar to thrust except it supports 2 dimensional arrays.

Basically to make a true 2 dimensional array on the device you have to make a single array on the device, then make each part of the second dimension on the host and then transfer it to the device. I suggest you take a look at the link I provided. It does all 2D memory management for you between the host and device, and you can get a raw 2D device pointer to use in kernels. It is very similar to thrust except it supports 2 dimensional arrays.

Thank you for your help. External Media

Thank you for your help. External Media