A CUDA program died horribly whilst using a K20. When using the same K20 afterwards
cudaMemGetInfo() returns a number much lower than expected (about 1611661312 smaller)
and so the correctly behaving application fails saying “Not enough memory to perform alignment”
Something similar has come recently
but I think txbob suggests cudaDeviceReset() will not cure the problem.
Has anyone experienced similar problems?
Is there a solution?
Any help or suggestions would be most welcome
Bill
If you have root privilege, you could try unloading and reloading the nvidia driver:
sudo rmmod nvidia
(after that, any CUDA operation will reload the driver.)
If you have root privilege, you could try doing a device reset command from nvidia-smi (please use nvidia-smi --help to learn about the available commands, or refer to the man page for it.)
The last two methods probably will not work if the GPU is currently supporting a display, or has X attached to it.
Dear txbob,
Thank you for the pointer to nvidia-smi
Using this as a diagnostic showed my reading of the problem was entrirely wrong!
nvidia-smi shows 1532MiB of GPU Memory are being used by another user!!