I have a distributed CUDA program. The program has multiple threads, each thread doing kernel launches and cudaMemcpyAsync() in separate streams. The program usually runs fine, but sometimes I see “unspecified launch failure” errors. Since the error does not occur every time, I guess it’s caused by some concurrency issues.
I googled my problem, and almost all such “unspecified launch failure” are because of out of bound array accesses, and people suggest using cuda-memcheck. I tried running my program with cuda-memcheck, and the problem happens less often (probably because the program runs much slower with cuda-memcheck). I finally got a run with this problem occurred. However, cuda-memcheck says “No CUDA-MEMCHECK results found”. I’m completely confused. Does “No CUDA-MEMCHECK results found” mean no memory problems in my program? So why it has “unspecified launch failure”?
Also, I saw someone saying that “unspecified launch failure” could only be caused by kernel launches, and that cudaMemcpyAsync() will never cause “unspecified launch failure”. Is it true?
Thank you so much!
My output looks something like that:
h5: ========= CUDA-MEMCHECK
h5: ========= Program hit cudaErrorLaunchFailure (error 4) due to “unspecified launch failure” on CUDA API call to cudaStreamSynchronize.
h5: ========= Saved host backtrace up to driver entry point at error
h5: ========= Host Frame:/usr/lib/libcuda.so.1 [0x2ef613]
h5: ========= Host Frame:/usr/local/cuda/lib64/libcudart.so.6.5 (cudaStreamSynchronize + 0x15e) [0x3773e]
h5: ========= Host Frame:libmyproj.so (_ZN19MyClassl25send_updatesEjij + 0x225) [0x3f355]
h5: ========= Host Frame:/usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0 [0xba4a]
h5: ========= Host Frame:/lib/x86_64-linux-gnu/libpthread.so.0 [0x8182]
h5: ========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (clone + 0x6d) [0xfb38d]
h5: ========= Error: process didn’t terminate successfully
h5: ========= Internal error (20)
h5: ========= No CUDA-MEMCHECK results found