Allocating arrays bigger than 2gb in size

The system I have access to is an Ubuntu 64b, with CUDA Toolkit 5.5 and a Tesla C2075/2050.
What do I need to do in order to allow something like this:

__device__ int v[1250000000];//5gb

 __global__ void kernel(){
       do_something_with(v[]);
}

A cudaMalloc in a global scope device pointer could work too.
Is there nvcc arguments for this?
Thanks.

Is there a specific reason not to use this:

int *v = 0;
cudaMalloc ((void**)&v, 1250000000*sizeof(v[0]));

__global__ void kernel (int *v)
{
    do_something_with (v);
}

But it’s not possible to use cudaMalloc in global scope device variables. You’d need to copy the pointer to it.
Anyway, so normally, I can just use the cudaMalloc with the size that I want without any special flag in the compiler ?

Obviously the call to cudaMalloc() would occur inside some function, such as main(). cudaMalloc() can allocate any size memory block as long as there is sufficient physical memory available and there is a contiguous piece of heap memory of at least the requested size. Heap fragmentation after repeated malloc/free cycles could reduce the maximum allocatable size, but if you allocate right at the start of the application, there is no fragmentation. Obviously use of a 32-bit platform would preclude using the full amount of memory on a C2050, but since you are on 64-bit Linux, you are fine.

Note that the CUDA driver and CUDA runtime require a certain amount of GPU memory for themselves, and ECC also requires additional memory. If I recall correctly, on a C2050, 12.5% of the memory are needed for ECC, and about 100MB for the CUDA stack. The balance should be allocatable through cudaMalloc().

The use of global memory objects runs counter to any modern software engineering best practices I am aware of.

OK, thanks.
You’re saying that the best practice would be to pass the globally used object as a parameter to the kernel and every device function that would use it?

Yes. This approach gives each function an explicit comprehensive interface to the outside world. I also advocate the use of the const qualifier for pointers to read-only objects [this is consistent with the approach taken by the C/C++ standard library functions]. For performance reasons, I further recommend use of the restrict qualifier for pointer arguments where applicable.

The bottom line is one would want to give the compiler as much information as possible about data objects to facilitate both correctness and performance.