CUDA vs. win32 Fibers Can a fiber store a CUDA device context?

I’ve been attempting to get multiple devices working well together, but it has resulted in a all out war against the Windows thread schedular, particularily since I have more devices than CPU cores.

I’ve managed to get fairly high efficiency by using cudaDeviceScheduleYield and juggling thread priorities, but it still remains a battle getting Windows to schedule the right threads, rather than constantly rescheduling the thread that just yielded. This results in starved GPUs when the associated control thread is not scheduled in a timely fashion.

This brings me to my question - does a win32 fiber retain enough context to maintain a CUDA device context? Alternately, is there a way to control thread scheduling in a cooperative fashion? 30ms latency for threads to be scheduled just doesn’t cut it…

Have you tried the blocking sync instead of cudaDeviceScheduleYield?

Yes, blocking sync is a good deal slower.

May be, you should ask Microsoft to release a patch…

See if any Service Pack changes can get you a better scheduler… (Check out the release notes)…

I have seen XP SP3 to handle large VM applications without seg-faulting… The same app would segfault in XP SP2…

You may want to check the Microsoft KB (knowldge base) for any improvements on scheduling

Try using pthreads-win32. You can set the controller in Round robin mode.