I am using a Tesla K40 kepler card on a system. I am trying to share a GPU among multiple processes and want to have an ability to kill a process (and consequently its active kernels) without having any affect on the other.
I have tried two experiments with processes proc1 and proc2. Proc1 triggers a long-running (> 100 sec) kernel k1 and Proc2 triggers a lightweight one (1 sec) k2.
I am running these two CUDA processes (and thus contexts) simultaneously on the same GPU. I observe that if the two kernels k1 and k2 from these processes are simultaneously loaded on the GPU, and the long-running kernel k1 is killed (by killing its process proc1), k2 still take >100 sec to return. This implies that k1 still runs even after proc1 is killed.
I also observe that the cudaEventRecord() of Proc2 returns nonsensical results, after the completion of k2.
I ran the same processes proc1 and proc2 serially this time. I started proc1, and killed it after its long running kernel is loaded on the GPU. Just after that I started proc2 this time k2 completed in 1 sec. Thus, it seems that this time the proc1 kernel was cleaned up successfully.
Is this discrepancy in the outcome of the experiments expected? Does the context not get killed if any other context is simultaneously running on the GPU?
Can killing a kernel while an other kernel is simultaneously running may result in undesirable behavior on the other?