I am working on code needs lots memory.
As I am running with MPI, I know we can increase number of MPI tasks
to reduce the memory print. But as I am limited with MPI tasks, I am wondering if there is other ways to reduce the memory usage.
Below is some error message when I run my code.
Out of memory allocating 4596144 bytes of device memory
total/free CUDA memory: 2147155968/102400
Present table dump for device: NVIDIA Tesla GPU 0, compute capability 2.1
host:0x1639ff8 device:0x4203a8d000 size:4 presentcount:5 line:10766 name:f_qg
host:0x1651180 device:0x4201d20700 size:400 presentcount:1 line:-1 name:_rrtmg_lw_rtrnmc_21
host:0x165e640 device:0x4201d20400 size:760 presentcount:1 line:-1 name:_module_ra_rrtmg_lw_21
host:0x1e08f88 device:0x4203a8be00 size:8 presentcount:1 line:10766 name:asdir
host:0x1e08f90 device:0x4203a8bc00 size:8 presentcount:1 line:10766 name:asdif
host:0x1e08f98 device:0x4203a8ba00 size:8 presentcount:1 line:10766 name:aldir