Posting after 7 man-days of debugging on this issue with no insights into a solution. We get the Unspecified Launch Failure in a program using only cudaMalloc and cudaMemcpy. The program runs indefinitely until I get some 2nd thread to interact with the thread running the CUDA calls. After this interaction, even if the interaction does nothing, every CUDA fn fails with ULF.
The args to cudaMemcpy are always the same. I print the addresses and length every call. I have double-checked all the synchronization. After each call I call cudaSynchronizeThreads() and cudaGetLastError().
valgrind and helgrind report no issues.
We’re using CUDA 2.2. and Fedora 10, with Fedora Eclipse.
Our best guess is that our process memory heap is being corrupted, and the CUDA libraries require some state in our process or thread that is damaged by the interaction with another thread, though I cannot see how this occurs. The program is basically as follows:
- start cuda thread (cuda thread starts and runs successfully thousands of iters )
- sleep( 20 secs )
- call size() on a std::map that contains pointers to cuda-allocated memory
- ULFs start to occur.
- cudaMemcpy host 2 device
- cudaMemcpy device 2 host
We have done our best to rule out corruption of the container and trace all the memory addresses through the program. They do not change, they work for CUDA calls fine. We never delete the memory passed to CUDA calls.
Any ideas what we could be missing?