We have tested CUDA Stream priorities with a complex code that uses many CUDA Streams and Events, and CPU multithreading.
Let me explain the code:
- We have one CPU threadA enqueueing some work in the GPU in an iterative manner, with streamA, doing cudaEventSynchronize at the end of the iteration.
- We haven another CPU threadB enqueueing some other work in the same GPU, also in an interative manner, with a different streamB, and synchronizing with cudaEventSynchronize at the end of the iteration.
- We measure the execution time of each iteration.
Now, if we set streamA to have higuest priority, and streamB does not even have priorities set:
- In WDDM we don’t see anyh difference in the iteration execution times.
- In TCC we see a lot of difference. threadA takes less time per iteration, and threadB takes more time per iteration.
- Why is this happening? Is it because stream priorities are not supported in WDDM? Or they are supported, but due to WDDM the final effect may not be the exepected?
- Will newer versions of WDDM help with that?
- Will “hardware scheduling” help with that?