How to utilize Compute Preemption in the new Pascal architecture (Tesla P100 and GTX1080)?


I have got a new Tesla P100, and I want to try its new feature, Compute Preemption, on some applications with real-time requirements.

Currently I cannot find any documents from NVidia on how to utilize the new feature.Some questions are as follows: 1) Can I control when to preempt a kernel? 2) Can I disable this feature? 3) How many kernels are supported to save their contexts? 4) What is the overhead of context switching? …

Are there any materials as programming guide for the Compute Preemption? Thanks a lot!

Hi kay21s - while waiting for feedback about new P100 features, there may be a few folks interested in P100 performance on familiar old benchmarks… any news you can share?

Hi, we have just got the P100 and have not evaluate it yet. I will let you know when we have some results.

Hi, have you discovered some way to request preemption in the Pascal architecture?

compute preemption isn’t exposed as a programmer visible control at this time

In the future, it may be, but what is also likely is that compute preemption will enable other features.

One example of where compute preemption may be used today is in dynamic parallelism.

Thank you txbob for replying.

I am wondering how does the driver define the priorities…

Is it expected that multiple processes will have the same priority? In this case, will preemption happens only when the timeslice expires? By the way, do you know what is the default timeslice?

Additionally, do you know if the stream priority parameters make any difference in regards the preemption?

The biggest killer CUDA feature of Pascal’s compute preemption is the final, glorious, elimination of kernel time limits on devices also used for display. The GP100 white paper promises this and the GTX 1080 white paper strongly implies it. So I always have hope a new driver will be released one day with this finally implemented and suddenly the timeout beast will be slayed. Then I can stop having to use single-slot display-only GTX 750 Tis in each machine. Since the feature still hasn’t been exposed (after almost a year) I have to think it’s not a minor driver change no matter how capable the hardware is.

The second cool feature will be robust transparent single-GPU debugging with NSight… especially useful on a laptop.

Neither of these will need any change to CUDA or user code. As txbob says, dynamic parallelism is another feature that will be improved and would also probably benefit from some (minor) CUDA runtime API extensions.

This is the other currently employed scenario for compute preemption. This capability should be available today on cc3.5 and higher devices, and it does use preemption.

Hello, I am a college student learning about CUDA.
I have some question about the Pascal GPU preemption
Can i get answers about these questions below??

  1. When does the preemption occurs?
    I know that, in the modern CPU computings, when the process’ time slices went expired or a interrupt occurs, the scheduler stops this process and starts other process who has the highest priority. Like CPU, in GPU, are (1) and (2) right??
    (1)A kernel executing now has some allocated time slices, and after these time slices, the GPU scheduler changes the now kernel with other kernel.
    (2)Other kernel, having higher priority than a kernel executing now, appears, then automatically the lower stops and the higher starts??

  2. How to decide kernel’s priority in preemption??
    I found some CUDA preemption example codes, people used CUDA stream priority to test preemption. Only the CUDA stream priority decides the kernel’s priority? or in CUDA runtime, are there other mechanisms to decide which is the higher priority kernel in preemption?

thank you for reading newbie’s question.
I will wait for replys