crash when using multi-GPU on caffe. CUDNN_STATUS_EXECUTION_FAILED

2 1080Ti GPUs, on Ubuntu 14.04, cuda 8.0 and cudnn 5.1

I created two caffe instances, on GPU 0 and 1.

when I am using on instance to perform forward, it just crashes before I switch.

and I tried installing other versions of drivers and cuda and cudnn, such as cuda10 with cudnn 7.4, the problem is not solved.

Please give some help, thank you !

here is the log:

F0128 15:48:18.079859 17672 cudnn_conv_layer.cu:28] Check failed: status == CUDNN_STATUS_SUCCESS (8 vs. 0) CUDNN_STATUS_EXECUTION_FAILED

#0 0x00007ffff5d44c37 in __GI_raise (sig=sig@entry=6) at …/nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff5d48028 in __GI_abort () at abort.c:89
#2 0x00007fffebc3cd81 in ?? () from /usr/lib/x86_64-linux-gnu/libglog.so.0
#3 0x00007fffebc3cdaa in google::LogMessage::Fail() () from /usr/lib/x86_64-linux-gnu/libglog.so.0
#4 0x00007fffebc3cce4 in google::LogMessage::SendToLog() () from /usr/lib/x86_64-linux-gnu/libglog.so.0
#5 0x00007fffebc3c6e6 in google::LogMessage::Flush() () from /usr/lib/x86_64-linux-gnu/libglog.so.0
#6 0x00007fffebc3f687 in google::LogMessageFatal::~LogMessageFatal() () from /usr/lib/x86_64-linux-gnu/libglog.so.0
#7 0x00007fffec2f8960 in caffe::CuDNNConvolutionLayer::Forward_gpu(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&) () from /home/user/codes/caffe/lib/libcaffe.so.1.0.0-rc3
#8 0x00007fffec0a6605 in caffe::Net::ForwardFromTo(int, int) () from /home/user/codes/caffe/lib/libcaffe.so.1.0.0-rc3
#9 0x00007fffec0a6995 in caffe::Net::Forward(float*) () from /home/user/codes/caffe/lib/libcaffe.so.1.0.0-rc3

anyone come and help ???

Hi,

I am not an expert on this subject, but check out this topic that might provide a solution for you.

https://github.com/NVIDIA/DIGITS/issues/232

Best,
Tom

It really helped me ! The problem is solved !

Thank you so much !