I am running a CUDA program with FFT of large matrices. The total memory used on gpu is 1.4 GB, but for some reason on cpu it uses 28 GB. On my host I only have 12GB or ram so I do not understand what does it mean to use so much virtual memory.
My program contains lots of FFT some simple kernels and from time to time memcopy between cpu and gpu to check if convergence was obtained.
The biggest arrays on host I allocate thime like this:
static cufftDoubleReal hbff[totsize_pad],hpsi[totsize_pad]; static double hqq[totsize_invspa],hqx[totsize_invspa],hqy[totsize_invspa],hqz[totsize_invspa],hccc[totsize_invspa];
Is my program slower because of this ? Here is an extract from top:
19069 aaa 20 0 28.7g 684m 84m R 100.0 5.8 19:10.59 a.out
14463 ggggg 20 0 28.1g 24m 22m R 99.7 0.2 1254:19 BD-Simu-GPU1
I used Fedora linux.
MemTotal: 12186512 kB
MemFree: 228496 kB
Buffers: 560720 kB
Cached: 9847316 kB
SwapCached: 0 kB
SwapTotal: 8191992 kB
SwapFree: 8191992 kB
CommitLimit: 14285248 kB
Committed_AS: 853188 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 421616 kB
VmallocChunk: 34359311356 kB