i have few question i couldn’t find answer to:
- why when calling cudaDeviceSynchronize right after cudaSetDevice on debug with gdb there is a memory spike of 130MB.
- why when running my process in debug with gdb it uses 169MB more then running the process in debug without gdb
- i have large piece of code that when it complies in debug mode it easily fit in the card less then 1GB memory usage, (it compiles under 20 seconds) but when i compile it in release it is much slower
- toolkit 6.5 - take more then 13 minutes to compile x40 slower
- toolkit 7.5 - take more then 3.5 minutes to compile x10 slower
the main problem: in release, it uses 2GB more memory - what is the reason for the 2GB extra memory usage?
- i have monitored the memory with NVIDIA-SMI 352.68
- Linux 3.19.0-33-generic #38~14.04.1-Ubuntu SMP Fri Nov 6 18:17:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
- 4*780gtx ti