Ways to reduce GPU memory usage

Huang_Wei · February 8, 2016, 7:27pm

Hello,

I am working on code needs lots memory.
As I am running with MPI, I know we can increase number of MPI tasks
to reduce the memory print. But as I am limited with MPI tasks, I am wondering if there is other ways to reduce the memory usage.

Below is some error message when I run my code.

Thanks,

Wei

Out of memory allocating 4596144 bytes of device memory
total/free CUDA memory: 2147155968/102400
Present table dump for device[1]: NVIDIA Tesla GPU 0, compute capability 2.1
host:0x1639ff8 device:0x4203a8d000 size:4 presentcount:5 line:10766 name:f_qg
host:0x1651180 device:0x4201d20700 size:400 presentcount:1 line:-1 name:_rrtmg_lw_rtrnmc_21
host:0x165e640 device:0x4201d20400 size:760 presentcount:1 line:-1 name:_module_ra_rrtmg_lw_21
host:0x1e08f88 device:0x4203a8be00 size:8 presentcount:1 line:10766 name:asdir
host:0x1e08f90 device:0x4203a8bc00 size:8 presentcount:1 line:10766 name:asdif
host:0x1e08f98 device:0x4203a8ba00 size:8 presentcount:1 line:10766 name:aldir
…

MatColgrove · February 8, 2016, 10:12pm

Hi Wei,

First make sure that there isn’t an error where you’re using an uninitialized variable or wrong size for the array size in an OpenACC data clause.

If you really are running out of memory, then the easiest thing to do is get a another card with more memory. Yours only has 2GB which is relatively small. A Tesla K40 has 12GB and a K80 has 24, divided between two GPUs (12 each)

Sans that or adding more MPI ranks, you’ll need to block your code so that only a portion of the computation is being performed on the device at any given time. The block size would be proportional to the amount of available memory. Besides the extra programming effort, you will most likely suffer a performance hit.

Mat