Assuming this is a Windows 10 system using the default WDDM driver, the maximum allocation via
cudaMalloc will be about 81% of the GPU memory. With 3GB of GPU memory, this is about 2,600,000,000 bytes.
For GPUs with larger GPU memory, this percentage can be a tad higher. For example, on my Quadro RTX 4000 with 8 GB of GPU memory, the maximum allocation size via
cudaMalloc is 7,060,320,000 bytes, or 82.2% of total physical memory.
If the programmer requests more memory than can be allocated,
cudaMalloc() informs the programmer of that fact via the returned status code. That’s different from silently ignoring the request and is the best it can do.
NVIDIA offers GPU with all kinds of memory sizes up to 80 GB, so you might want to use a different one. I read that cloud services offer large GPU instances at reasonable prices, or you could look into buying more capable hardware, possibly previously owned if your budget is tight.
You might also want to ponder whether your data could be stored more efficiently, e.g. by using half precision instead of single precision.