CUDA Stream priorities, WDDM and TCC behavior

Hi!

We have tested CUDA Stream priorities with a complex code that uses many CUDA Streams and Events, and CPU multithreading.

Let me explain the code:

  • We have one CPU threadA enqueueing some work in the GPU in an iterative manner, with streamA, doing cudaEventSynchronize at the end of the iteration.
  • We haven another CPU threadB enqueueing some other work in the same GPU, also in an interative manner, with a different streamB, and synchronizing with cudaEventSynchronize at the end of the iteration.
  • We measure the execution time of each iteration.

Now, if we set streamA to have higuest priority, and streamB does not even have priorities set:

  • In WDDM we don’t see anyh difference in the iteration execution times.
  • In TCC we see a lot of difference. threadA takes less time per iteration, and threadB takes more time per iteration.

Questions:

  • Why is this happening? Is it because stream priorities are not supported in WDDM? Or they are supported, but due to WDDM the final effect may not be the exepected?
  • Will newer versions of WDDM help with that?
  • Will “hardware scheduling” help with that?

Thanks!