cudaMemcpy for very large float arrays

Hi,

For my application I need to pass a very large float array to the device. I want to know what is the maximum number of float elements that I can copy?

I know that currently the size of the parameter list of a kernel is 256bytes.

unsigned int array_size = <very_large_number>;

float *h_array1 = (float*)malloc(sizeof(float)*array_size);

//Initialize the h_array1 here

..

..

..

float *d_array1;

CUDA_SAFE_CALL(cudaMemcpy((void**)&d_array1, &h_array1, array_size, cudaMemcpyHostToDevice));

//Execute kernel etc.

..

..

What can be the maximum value of array_size above?

And, secondly is there a way to determine this number at runtime depending upon the type of card and its DRAM size?

Any help on this would be appreciated External Image

Check out Section E.8.1 in the Programming Guide, cuMemGetInfo().

Also D.1.4 cudaGetDeviceProperties().