cudaMemcpy max bytes size?

when i have 0xffffffff floats in an array, cudaMemcpy fails. Does it have a maximum size that its allowed to copy? cudaMalloc is able to handle that size but not cudaMemcpy?
Thanks in advance

int main(void)
{
float* a_h, b_h; // pointers to host memory
float
a_d; // pointer to device memory
int i, N = 0xffffffff;

// allocate arrays on host
a_h = (float*)malloc(sizeof(float)*N);
b_h = (float*)malloc(sizeof(float)*N);

//allocate array on device
CUDA_SAFE_CALL(cudaMalloc((void**)&a_d, sizeof(float)*N));

// initialization of host data
for (i=0; i<N; i++) a_h[i] = (float)i;

//copy data from host to device
CUDA_SAFE_CALL(cudaMemcpy(a_d, a_h, sizeof(float)*N, cudaMemcpyHostToDevice));

// do calculation on host
incrementArrayOnHost(a_h, N);

//check assert to see if we get results expected
for (i=0; i<N; i++) assert(a_h[i] == i+1);

/*
// do calculation on device:
// Part 1 of 2. Compute execution configuration
int blockSize = 4;
int nBlocks = N/blockSize + (N%blockSize == 0?0:1);

// Part 2 of 2. Call incrementArrayOnDevice kernel
incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N);

// Retrieve result from device and store in b_h
cudaMemcpy(b_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);

// check results
for (i=0; i<N; i++) assert(a_h[i] == b_h[i]);

*/

// cleanup
free(a_h); free(b_h); CUDA_SAFE_CALL(cudaFree(a_d));

system("pause");

}

wait, how are you going to copy 16 gigs worth of floats anywhere…?

Even if you wanted to copy 16GB of data, the expression sizeof(float)*N
will not give you that result because N is really -1 signed integer.

Are you sure? What super GPU do you have with more than 16 GiB of memory?

Perhaps you are compiling in release mode, thus the CUDA_SAFE_CALL you have around the cudaMalloc is ignoring the error it is most certainly generating.

sorry guys, i meant to say cudaMalloc isnt able to work, also, is it really trying to allocate 16gb?

I think that size_t is unsigned int on 32-bit platforms and unsigned long long int on 64-bit platforms, so it’s 4*2^30 floats multiplied by sizeof(float). That’s 4 billion times 4, so yeah, 16 gigs. Of course it doesn’t work.

(ps I doubt your malloc calls are working either…)

64 bit compile:

#include <iostream>

int main()

 {

	 int x = 0xffffffff;

	 std::cerr << (sizeof(float)*x) << "\n";

	 return 0;

 }

Output:

18446744073709551612

Imagine a GPU with so much memory, the decimal system is useless upon it…