cudaMemcpy for very large float arrays


For my application I need to pass a very large float array to the device. I want to know what is the maximum number of float elements that I can copy?

I know that currently the size of the parameter list of a kernel is 256bytes.

unsigned int array_size = <very_large_number>;

float *h_array1 = (float*)malloc(sizeof(float)*array_size);

//Initialize the h_array1 here




float *d_array1;

CUDA_SAFE_CALL(cudaMemcpy((void**)&d_array1, &h_array1, array_size, cudaMemcpyHostToDevice));

//Execute kernel etc.



What can be the maximum value of array_size above?

And, secondly is there a way to determine this number at runtime depending upon the type of card and its DRAM size?

Any help on this would be appreciated :thumbup:

Check out Section E.8.1 in the Programming Guide, cuMemGetInfo().

Also D.1.4 cudaGetDeviceProperties().