How to copy chunks of large host memory to device

I have a very large set of very large arrays stored on the host and wish to process them by partitioning these arrays and copying them onto devices so that kernels can process them, but I’m having trouble striding along the host arrays.

when I use a stride counter as below to stride along the host arrays I get a “invalid device pointer” run time error i.e. the compiler says the syntax is ok

[codebox]cudaMemcpy(d_xi, h_x[index], DATABLOCKSIZE*sizeof(float4), cudaMemcpyHostToDevice);[/codebox]

where index = blockid*DATABLOCKSIZE

and when I use the following address mode (which has worked before but in a different context) I get a “cannot convert float4 to const void*” compile time error.

[codebox]cudaMemcpy(&d_xi, &h_x[index], DATABLOCKSIZE*sizeof(float4), cudaMemcpyHostToDevice);[/codebox]

So what is the precise syntax required?



will dereference h_x at index. Try either


or naturally

h_x + index


In both cases you’ll get the address of the value at h_x[index].


is correct, but


is the address of the pointer.

BTW: &d_xi[0] is the address of d_x at index 0 and therefeore the same as d_xi.