I’ve got a question about memory copy :
Is a matrix defined for example as float** on host can be copied as float * (in a contiguous way) on device ?
I’m not sure how to use memCpy to do so…
Thanks !
I’ve got a question about memory copy :
Is a matrix defined for example as float** on host can be copied as float * (in a contiguous way) on device ?
I’m not sure how to use memCpy to do so…
Thanks !
So do you mean that you have something like a 2D array on your host side, but you want it flattened out as a 1D array on your Device side?
I have nerver try before, but I think that it can copies if your pointer address and copy data size are correctly.
Let me know if you have tried.
Thank you very much.
:)
That’s it ! Do you have any idea of how to do it ?
Do I understand correctly?
:)
test.zip (544 Bytes)
that’s it, but I’d like to do it dynamically.
in your test code, if I replace :
[codebox]int host[2][10][/codebox]
by
[codebox]int ** host;
host=(int**)malloc(2sizeof(int));
for(int i=0;i<2;i++)
host[i]=(int*)malloc(10*sizeof(int));[/codebox]
it launches an error when it tries to access host after the kernel and the memCpy…
Let try it
[codebox]
global void Kernel(int* dev)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
dev[i] = 1;
}
int main(int argc, char *argv)
{
int **host;
host = (int**)malloc(20*sizeof(int));
host[0] = (int*)malloc(10*sizeof(int));
host[1] = (int*)malloc(10*sizeof(int));
for (int i = 0; i < 10; i++) {
host[0][i] = i;
host[1][i] = i;
}
int *dev = NULL;
cudaMalloc((void**)&dev, sizeof(int) * 10*2);
cudaMemcpy(dev, host[0], sizeof(int) * 10 * 2, cudaMemcpyHostToDevice);
dim3 block(16, 1, 1);
dim3 grid((10 * 2 + block.x - 1)/block.x, 1, 1);
Kernel<<<grid, block>>>(dev);
cudaMemcpy(host[0], dev, sizeof(int) * 10 * 2, cudaMemcpyDeviceToHost);
for (int j = 0; j < 2; j++) {
for (int i = 0; i < 10; i++) printf("%u\n", host[j][i]);
}
printf(“CUDA error: %s\n”, cudaGetErrorString(cudaGetLastError()));
cudaFree(dev);
free(host);
getchar();
return 0;
}
[/codebox]
Hope that useful.
:)
Thanks for your help, but your code does not work properly >.<
When you’re doing :
cudaMemcpy(dev, host[0], sizeof(int) * 10 * 2, cudaMemcpyHostToDevice);
that only copies the first line of host into dev.
if you want to be sure about it, try :
[codebox] cudaMemcpy(dev, host[0], sizeof(int) * 10 * 2, cudaMemcpyHostToDevice);
for (int i = 0; i < 10; i++) {
host[0][i] = 0;
host[1][i] = 0;
}
cudaMemcpy(host[0],dev, sizeof(int) * 10 * 2, cudaMemcpyDeviceToHost);
for (int j = 0; j < 2; j++) {
for (int i = 0; i < 10; i++) printf("%u\n", host[j][i]);
}
printf(“CUDA error: %s\n”, cudaGetErrorString(cudaGetLastError()));[/codebox]
output will be :
[codebox]0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0[/codebox]
instead of :
[codebox]0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9[/codebox]
And if you replace host[0] by host, that doesn’t work either…
Any other idea ?
I am so sorry about my carelessness :(
I didn’t test it carefully before posted.
To get the reason why, I have changed this code
global void Kernel(int* dev)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
dev[i] = i;
}
and
for (int i = 0; i < 10; i++) {
host[0][i] = i;
host[1][i] = i;
}
out expected value
host[0][i] = {0,1,2,3,4,5,6,7,8,9}
host[1][i] = {10,11,12,13,14,15,16,17,18,19}
but output is
host[0][i] = {0,1,2,3,4,5,6,7,8,9}
host[1][i] = {16,17,18,19,4,5,6,7,8,9}
It is mean that when allocate a dynamic 2D array on host memory (like **host)
It not warranties that host[0] and host[1] allocate in a contiguous space.
My suggestion is
you should use 2D global memory with cudaMallocPitch.
It is a good pattern to access data store in global memory
or if you continue try to use 2D on host and 1D on device.
I think, static allocate 2D memory on host may lets your program work properly.
:)