When use multithreading to launch multiple kernels to the same device – and by default, all the threads of the same process have the same
CUcontext – I realized that all the kernels are processed in serial. However, when launching the same number of kernels form different processes, the kernels are processed in parallel. Is this because, when there are multiple kernel launches to the same GPU, the GPU processes them in serial? Please share any details you know regarding
CUcontext and parallel kernel launches.