cudaMalloc error

I want to transfer some image data to GPU, proccess it and get back. I’m using GPU VSIPL library, and having this error

wrap_cudaMalloc() failed to allocate block of 99532800 bytes on device

in this code after copy to GPU by vsip_mcopyfrom_user_f

vsip_mview_f *imgc;

	float *imgm;

	imgc = vsip_mcreate_f(height, width, VSIP_ROW, VSIP_MEM_NONE);

	vsip_mcopyfrom_user_f(imgm,VSIP_ROW,imgc);

	vsip_mcopyto_user_f(imgc,VSIP_ROW,imgm);

	vsip_malldestroy_f(imgc);

size of image data is 3heightwidth = 3(RGB channels) * 2160(height) * 3840(width) * 4(sizeof float) = 99532800 bytes

Smaller size are correctly transfered, I think the edge is about 3 000 000 bytes. And this error was even when I used simple CUDA functions

Also sometimes I have an error such as “no CUDA-capable device is available” and need to reboot my OS.

KUbuntu 9.04, CUDA 2.3, GeForce 9600 GT (512 MB) - version190.18