What happens on a driver context switch

I am trying to allow the same host process to control multiple devices simultaneously using the driver level api. I am trying to decide whether or not I need to use multiple threads or if I can do the same thing from a single thread.

Let’s say that I launch a kernel asynchronously and then immediate perform a context switch via cuCtxPopCurrent and cuCtxPushCurrent and launch another kernel asynchronously on a different GPU. Can these two kernels run concurrently?

yes BUT there will be significant overhead to doing so on WDDM.

other than that, it will work fine. it should still be functional on WDDM, just with more driver overhead than you’d probably like.

Thanks, that’s good to know. It significantly simplifies what I am doing. Do you have any sense of how much overhead there will be on linux?

very little, switching contexts is extremely cheap on every platform except WDDM.

I seem to have confused the board, let me bump this to see if it resets the date/time last responded to

How does Fermi’s faster context switching fall into this?

Hi all,

i’m having a lost of problem running my GPU code under windows 7. Under linux I can get easily 45Gflops on my Tesla (with is good for this application…) but on windows i’m stuck at 7Gflops.

I use a lot of small kernel (i know this is bad …) and i read something about WDDM which increase latency at each kernel call (i read 40us instead of 3us !!)

Any clue, anything possible to speedup windows execution ?

Thanks a lot !

If you have Aero enabled, try disabling it