Asynchronous Concurrent Compute in Pascal P100

Nvidia says that the new Pascal P100 cards improves the overlapping of workloads with Asynchronous Concurrent Compute.

But, is it truly possible to share the GPU with different workloads in the P100?

I’ve read in the NVIDIA manual for Kepler GPUs that “The GPU has a time sliced scheduler to schedule work from work queues belonging to different CUDA contexts. However, work launched to the compute engine from work queues belonging to different CUDA contexts cannot execute concurrently.”

Is it possible to execute different CUDA contexts concurrently in P100?

You can have multiple CUDA contexts resident on a single GPU.

You cannot have kernels from two different contexts executing at the same instant in a GPU, including Pascal P100 on CUDA 8.

However, this is essentially a “concurrent kernel” scenario, which is notoriously difficult to witness even when the kernels belong to the same context. GPU kernels of sufficient size will occupy all available compute resources, forcing such kernels to serialize, whether they are from the same or different contexts.