I am still trying to pinpoint the exact problem, as to shrink the test cases. What’s going on:
I have a program that compiled with nvcc for CUDA potion and otherwise linked / run as C program with clang (not c++). After upgraded to CUDA 11.0, whenever the computer restarted, my program will hang. The hang will be resolved after you run any normal CUDA program in NVIDIA_CUDA-11.0_Samples directory (they all compiled and linked with nvcc).
Here is the stacktrace when hang happens, seems to be a dead-lock within cuInit when call pthread_once, possibly related to the libc.so I use? (Not sure which one nvcc links to):
* thread #1, name = 'cublas.tests', stop reason = signal SIGSTOP
* frame #0: 0x00007fffe174cc6f libc.so.6`wait4 + 95
frame #1: 0x00007ffff405816e libcuda.so.1`___lldb_unnamed_symbol8125$$libcuda.so.1 + 270
frame #2: 0x00007ffff4063f87 libcuda.so.1`___lldb_unnamed_symbol8324$$libcuda.so.1 + 215
frame #3: 0x00007ffff3f2537e libcuda.so.1`___lldb_unnamed_symbol3781$$libcuda.so.1 + 174
frame #4: 0x00007ffff3efa192 libcuda.so.1`___lldb_unnamed_symbol3073$$libcuda.so.1 + 130
frame #5: 0x00007ffff3efa9ff libcuda.so.1`___lldb_unnamed_symbol3087$$libcuda.so.1 + 399
frame #6: 0x00007ffff3f91be4 libcuda.so.1`cuInit + 84
frame #7: 0x00007ffff3a8b155 libcudart.so.11.0`___lldb_unnamed_symbol623$$libcudart.so.11.0 + 133
frame #8: 0x00007ffff3a8d1c1 libcudart.so.11.0`___lldb_unnamed_symbol645$$libcudart.so.11.0 + 17
frame #9: 0x00007ffff75ba47f libpthread.so.0`__pthread_once_slow(once_control=0x00007ffff3cead30, init_routine=(libcudart.so.11.0`___lldb_unnamed_symbol645$$libcudart.so.11.0)) at pthread_once.c:116:7
frame #10: 0x00007ffff3ac6a29 libcudart.so.11.0`___lldb_unnamed_symbol934$$libcudart.so.11.0 + 9
frame #11: 0x00007ffff3a880c0 libcudart.so.11.0`___lldb_unnamed_symbol621$$libcudart.so.11.0 + 64
frame #12: 0x00007ffff3a8e33e libcudart.so.11.0`___lldb_unnamed_symbol648$$libcudart.so.11.0 + 14
frame #13: 0x00007ffff3aa4b83 libcudart.so.11.0`cudaGetDeviceCount + 51
frame #14: 0x00000000005fbada cublas.tests`ccv_nnc_gpu_device_count + 42