I had a rouge “for loop” allocating >500 MB of device memory each time. The program crashed without being able to run cudaFree() and it seems the memory is locked up on the device. I am compiling with CUDA ART 2000 on CentOS with a Tesla C870 (soon migrating to a C1060). Obviously I can just reset the system, but my concern is when this become deployed, is it possible to free the GPU memory programatically prior to initializing the main computations to ensure the state of the device as well as the amount of memory available for computation?
Additionally, why does cudaGetDeviceProperties.totalGlobalMem report different value than cuMemGetInfo
Device: Tesla C870, 1350 MHz clock, 1536 MB memory.
^^^^ Free : 42 bytes (0 KB) (0 MB)
^^^^ Total: 2513032896 bytes (2454133 KB) (2396 MB)
^^^^ 0.000002% free, 99.999998% used