Using GeForce Titan X to train deep network model, then report "Check failed: error == cudaSucc

device: GeForce Titan X
cuda: V7.5.17
nvidia driver: 352.39
CentOS release 6.8 (Final)

I installed the nvidia driver and cuda7.5 successfully and I compiled caffe on GPU model successfully. I can run the simple mnist demo by But when I want to train a larger network model, it can start properly, then report “Check failed: error == cudaSuccess (38 vs. 0) no CUDA-capable device is detected” after some iterations. If I want to run a caffe model, I must reboot. I have not gotten a complete network by now because it always interrupt.

I read the /var/log/message.

abrtd: Executable '/home/caffe/.build_release/tools/caffe.bin' doesn\'t belong to any package and ProcessUnpackaged is set to \'no\'
Saved core dump of pid 27875 (/home/caffe/.build_release/tools/caffe.bin) to /var/spool/abrt/ccpp-2016-08-25-15:23:50-27875 (30154752 bytes)

I edited /etc/abrt/abrt-action-save-package-data.conf and change ProcessUnpackaged = no to ProcessUnpackaged = yes, but it didn’t work.

When I train deep network model, the error message is

Check failed: error == cudaSuccess (38 vs. 0)  no CUDA-capable device is detected
*** Check failure stack trace: ***
    @     0x7f238740fb5d  google::LogMessage::Fail()
    @     0x7f2387413b77  google::LogMessage::SendToLog()
    @     0x7f23874119f9  google::LogMessage::Flush()
    @     0x7f2387411cfd  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f238cb63e4e  caffe::Caffe::SetDevice()
    @           0x40ba7f  train()
    @           0x407d5f  main
    @     0x7f237feccd1d  __libc_start_main
    @           0x406f49  (unknown)
./xxl/test_googlenet/ line 1: 15057 Aborted                 (core dumped) ./build/tools/caffe train -solver xxl/test_googlenet/solver.prototxt -weights xxl/test_googlenet/bvlc_googlenet.caffemodel

But after I reboot, it start properly.

Have you met this situation? May you help me? Thank you very much!