How to disable memory overcommitment when cuda is enabled?

Hey,
we usually disable overcommitmend by using prlimit64 and limiting the memory to the available physical memory. This does not work when we compile our program with cuda support enabled. Right after startup it already uses 16GB virt RAM on our Tx2 (with only 8GB RAM in total).
Where is this huge usage coming from? Any ideas how to disable over-commitment?

Sorry for the late response, have you managed to get issue resolved or still need the support? Thanks

Hi,

Could you share more details about your use case?
A sample to reproduce the issue will help a lot.

Thanks.

It’s just that as soon as CUDA is linked into our application, the application performs an enormous over-commit (i.e. larger than the total amount of DDR memory). Will see if I can come up an example.

So, one example. I run this simple application with cuda10.2:

#include <stdio.h>

__global__ void cuda_hello(){
    printf("Hello World from GPU!\n");
}

int main() {
    cuda_hello<<<1,1>>>(); 

    getchar();

    return 0;
}

When running it uses over 20GB of virtual memory:

➜  ~ ./a.out &
[1] 13954
➜  ~
[1]  + 13954 suspended (tty input)  ./a.out
➜  ~ pmap 13954
13954:   ./a.out
0000000000400000      4K r---- a.out
0000000000401000      4K r-x-- a.out
0000000000402000     28K r---- a.out
0000000000409000      4K r---- a.out
000000000040a000      4K rw--- a.out
000000000213d000   9860K rw---   [ anon ]
0000000200000000   2048K rw-s- zero (deleted)
0000000200200000  38912K -----   [ anon ]
0000000202800000   2048K rw-s- zero (deleted)
0000000202a00000  18432K -----   [ anon ]
0000000203c00000   2048K rw-s- zero (deleted)
0000000203e00000   2048K -----   [ anon ]
0000000204000000   2048K rw-s- zero (deleted)
0000000204200000   2048K rw-s- zero (deleted)
0000000204400000   2048K rw-s- zero (deleted)
0000000204600000   4096K -----   [ anon ]
0000000204a00000   2048K rw-s- zero (deleted)
0000000204c00000   2048K rw-s- zero (deleted)
0000000204e00000   2048K rw-s- zero (deleted)
0000000205000000   2048K rw-s- zero (deleted)
0000000205200000   2048K rw-s- zero (deleted)
0000000205400000 25079808K -----   [ anon ]
00007f8280000000 262144K -----   [ anon ]
00007f8296a00000   2048K rw-s- zero (deleted)
00007f8296c00000   2048K rw-s- zero (deleted)
00007f8296e00000   2048K rw-s- zero (deleted)

Any idea where these 25079808K are coming from?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.