Is storing large data (about 2 GB) in GPU memory feasible

Hi, I have compiled a algorithm to process some image data. there will be data about 2gb stored in GPU global memory before the entire processing finish. i run this program in some gpu device, e.g. GTX 1060 (6GB) and Tesla M4 (4GB). the out data was normal. but when i use the same code run in a relatively new one, e.g. GTX 1660 (6GB) and RTX 2060 (8GB), The output images were blurred.
what’s wrong in this? doesn’t the newer design of gpu architecture prefer the user takes too much space in global memory?


yes, storing 2GB of memory is feasible.

Your problem lies somewhere else.

but why can the program be executed correctly on some GPU card and it had problem when run on some others.
what direction should i go to seek the problem.

in my program, the space for 3d image was constructed first, and then the data was transferred to gpu memory while gpu launched kernels concurrently. after all the data was processed. image rotation be implemented, finally the 3d images was transferred back to motherboard memory.
register memory was used in kernels. would it cause problems?

sorry, I can’t explain the behavior of code you haven’t shown

I have not yet written the canonical “here’s how to debug any CUDA program/issue”, but you will find lots of good debugging tips with a bit of searching. I usually suggest people start with using proper CUDA error checking, and run their codes with compute-sanitizer.

Yes, mapped memory could be an issue, if you start to check the results in host code before device code has finished executing. This can usually be solved with proper placement of a cudaDeviceSynchronize() call, in host code, after launching all your CUDA kernels.