I am writing a multi-threaded progrm using cuSOLVER APIs like cusolverDnSgetrf(), cusolverDnSgetrs() etc, on Linux. The main thread of the program creates one child POSIX thread per GPU, which in turn use the cuSOLVER APIs and execute independently. All threads are host threads.
There are three types of termination possible :
T1). Normal termination. The main thread waits for the per-GPU child threads to
finish and then calls exit().
T2). Timed termination. If the program is given an execution time, the main
thread sleeps for that time, wakes up and calls exit().
T3). Ctrl-c termination. The program has a handler for SIGINT which calls exit().
There are no problems with T1. However, there are failures with T2 and T3 if exit() is used. No problems seen, so far, if _exit() is used.
Different failures are seen with exit() in T2 and T3 in different executions :
- cudaDeviceSynchronize() fails with error code 4 (cudaErrorCudartUnloading)
- cudaMallocManaged() fails with error code 4
- cuSOLVER initialization fails with error code 7
In both T2 and T3, the main thread does not inform the child threads about the termination.
I have the following questions :
Q1). Why are failures seen with exit() but not with _exit()?
I am guessing that CUDA is registering some exit handlers which are executed as part of exit() and not with _exit(). I am not sure how that leads to failures. Perhaps the exit handlers are deleting stuff that the GPU code is using. But then, _exit() too will stop the host code while the GPU code is still running.
Q2). Is it safe to use _exit() instead of exit()?
Are there any issues with using _exit()? For example, will a CUDA kernel launched by the process will be left running when the processor terminates using _exit() or does the CUDA runtime (or the driver) ensure that all kernels associated with a terminating process are also terminated?
Q3). Is it mandatory/better for the main thread to ask the child threads to stop before terminating the process?
I have tried a method in which the main thread sends a signal to the child threads, which catch it and execute pthread_exit(). No failure seen with this, so far. Still under testing.
Thanks
Karthik