I am trying to start troubleshooting an error on a virtual server that uses the ubuntu 14.04 OS. Basically what happens (seeming random) is that the GPU stops processing and terminates. What Imean by seeming random is that for 3 runs there is no error then on run 4 the error appears. It has happend 4 times now and about the only consistency is that it appears to error at the same time - cycle 21 (as indicated by the log not included). If I reboot the GPU starts up again and processes normal.
Are there any commands/recommendations that might help me figure out what is going on and I can not find what error code 46 refers to? Thank you :).
Error:
CUDA: gpuDeviceConfig: device added for evaluation: 0:GeForce GTX 970 v5.2
3.99982GB
CUDA: gpuDeviceConfig: minimum compute version used for pipeline: 2.0
CUDA 0: gpuDeviceConfig::initDeviceContexts: Creating Context and Constant
memory on device with id: 0
terminate called after throwing an instance of ‘cudaExecutionException’
±---------------------------------------
| ** CUDA ERROR! **
| Error: 46
| Msg: all CUDA-capable devices are busy or unavailable
| File:
cudaWrapper.cpp
| Line: 127
±---------------------------------------
what(): CUDA EXCEPTION: Error occurred during job Execution!