Hey,
we usually disable overcommitmend by using prlimit64 and limiting the memory to the available physical memory. This does not work when we compile our program with cuda support enabled. Right after startup it already uses 16GB virt RAM on our Tx2 (with only 8GB RAM in total).
Where is this huge usage coming from? Any ideas how to disable over-commitment?
Sorry for the late response, have you managed to get issue resolved or still need the support? Thanks
Hi,
Could you share more details about your use case?
A sample to reproduce the issue will help a lot.
Thanks.
It’s just that as soon as CUDA is linked into our application, the application performs an enormous over-commit (i.e. larger than the total amount of DDR memory). Will see if I can come up an example.
So, one example. I run this simple application with cuda10.2:
#include <stdio.h>
__global__ void cuda_hello(){
printf("Hello World from GPU!\n");
}
int main() {
cuda_hello<<<1,1>>>();
getchar();
return 0;
}
When running it uses over 20GB of virtual memory:
➜ ~ ./a.out &
[1] 13954
➜ ~
[1] + 13954 suspended (tty input) ./a.out
➜ ~ pmap 13954
13954: ./a.out
0000000000400000 4K r---- a.out
0000000000401000 4K r-x-- a.out
0000000000402000 28K r---- a.out
0000000000409000 4K r---- a.out
000000000040a000 4K rw--- a.out
000000000213d000 9860K rw--- [ anon ]
0000000200000000 2048K rw-s- zero (deleted)
0000000200200000 38912K ----- [ anon ]
0000000202800000 2048K rw-s- zero (deleted)
0000000202a00000 18432K ----- [ anon ]
0000000203c00000 2048K rw-s- zero (deleted)
0000000203e00000 2048K ----- [ anon ]
0000000204000000 2048K rw-s- zero (deleted)
0000000204200000 2048K rw-s- zero (deleted)
0000000204400000 2048K rw-s- zero (deleted)
0000000204600000 4096K ----- [ anon ]
0000000204a00000 2048K rw-s- zero (deleted)
0000000204c00000 2048K rw-s- zero (deleted)
0000000204e00000 2048K rw-s- zero (deleted)
0000000205000000 2048K rw-s- zero (deleted)
0000000205200000 2048K rw-s- zero (deleted)
0000000205400000 25079808K ----- [ anon ]
00007f8280000000 262144K ----- [ anon ]
00007f8296a00000 2048K rw-s- zero (deleted)
00007f8296c00000 2048K rw-s- zero (deleted)
00007f8296e00000 2048K rw-s- zero (deleted)
Any idea where these 25079808K are coming from?
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.