maximum data size of cudaMalloc?

cudaMalloc threw out an stack overflow error when trying to allocate video RAM for a 70000-dimensional float array (roughly 2.6M data). How did it suppose to happen? My GF310M video card has 1G video RAM. I know it is way much harder to allocate a block of continuous RAM space than scattering fragments. But this is just too far from being acceptable (2.6M vs 1G). I’m 80% sure that this problem has been notified by someone before, but I can’t find any related old post anywhere.

I’m using VS 2008 SP1/CUDA 4.0 for my current work, which requires storage of massive vector. I’ve just start it for 2 days, so sorry if it is a stupid question.