copy pointer to pointer on device

I’ve got a question about memory copy :

Is a matrix defined for example as float** on host can be copied as float * (in a contiguous way) on device ?
I’m not sure how to use memCpy to do so…

Thanks !

So do you mean that you have something like a 2D array on your host side, but you want it flattened out as a 1D array on your Device side?

I have nerver try before, but I think that it can copies if your pointer address and copy data size are correctly.

Let me know if you have tried.

Thank you very much.

:)

That’s it ! Do you have any idea of how to do it ?

Do I understand correctly?

:)
test.zip (544 Bytes)

that’s it, but I’d like to do it dynamically.

in your test code, if I replace :

[codebox]int host[2][10][/codebox]

by

[codebox]int ** host;

host=(int**)malloc(2sizeof(int));

for(int i=0;i<2;i++)

host[i]=(int*)malloc(10*sizeof(int));[/codebox]

it launches an error when it tries to access host after the kernel and the memCpy…

Let try it

[codebox]

global void Kernel(int* dev)

{

int i = blockIdx.x * blockDim.x + threadIdx.x;

dev[i] = 1;

}

int main(int argc, char *argv)

{

int **host;

host = (int**)malloc(20*sizeof(int));

host[0] = (int*)malloc(10*sizeof(int));

host[1] = (int*)malloc(10*sizeof(int));

for (int i = 0; i < 10; i++) {

  host[0][i] = i;

  host[1][i] = i;

}

int *dev = NULL;

cudaMalloc((void**)&dev, sizeof(int) * 10*2);

cudaMemcpy(dev, host[0], sizeof(int) * 10 * 2, cudaMemcpyHostToDevice);

dim3 block(16, 1, 1);

dim3 grid((10 * 2 + block.x - 1)/block.x, 1, 1);

Kernel<<<grid, block>>>(dev);

cudaMemcpy(host[0], dev, sizeof(int) * 10 * 2, cudaMemcpyDeviceToHost);

for (int j = 0; j < 2; j++) {

  for (int i = 0; i < 10; i++) printf("%u\n", host[j][i]);

}

printf(“CUDA error: %s\n”, cudaGetErrorString(cudaGetLastError()));

cudaFree(dev);

free(host);

getchar();

return 0;

}

[/codebox]

Hope that useful.

:)

Thanks for your help, but your code does not work properly >.<

When you’re doing :

cudaMemcpy(dev, host[0], sizeof(int) * 10 * 2, cudaMemcpyHostToDevice);

that only copies the first line of host into dev.

if you want to be sure about it, try :

[codebox] cudaMemcpy(dev, host[0], sizeof(int) * 10 * 2, cudaMemcpyHostToDevice);

for (int i = 0; i < 10; i++) {

  host[0][i] = 0;

  host[1][i] = 0;

}

cudaMemcpy(host[0],dev, sizeof(int) * 10 * 2, cudaMemcpyDeviceToHost);

for (int j = 0; j < 2; j++) {

for (int i = 0; i < 10; i++) printf("%u\n", host[j][i]);

}

printf(“CUDA error: %s\n”, cudaGetErrorString(cudaGetLastError()));[/codebox]

output will be :

[codebox]0 1 2 3 4 5 6 7 8 9

0 0 0 0 0 0 0 0 0 0[/codebox]

instead of :

[codebox]0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9[/codebox]

And if you replace host[0] by host, that doesn’t work either…

Any other idea ?

I am so sorry about my carelessness :(
I didn’t test it carefully before posted.

To get the reason why, I have changed this code
global void Kernel(int* dev)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
dev[i] = i;
}

and
for (int i = 0; i < 10; i++) {
host[0][i] = i;
host[1][i] = i;
}

out expected value
host[0][i] = {0,1,2,3,4,5,6,7,8,9}
host[1][i] = {10,11,12,13,14,15,16,17,18,19}
but output is
host[0][i] = {0,1,2,3,4,5,6,7,8,9}
host[1][i] = {16,17,18,19,4,5,6,7,8,9}

It is mean that when allocate a dynamic 2D array on host memory (like **host)
It not warranties that host[0] and host[1] allocate in a contiguous space.

My suggestion is
you should use 2D global memory with cudaMallocPitch.
It is a good pattern to access data store in global memory
or if you continue try to use 2D on host and 1D on device.
I think, static allocate 2D memory on host may lets your program work properly.
:)