CUDA CPU Utilization

isaacgerg · September 9, 2008, 5:12pm

So CPU utilization when using the GPU can be reduced by sleeping in between polling calls to see if the GPU is busy.

My question is this:
In the future, will we be able to yield the thread while the GPU processes and then have the GPU signal back to the process to wake up when the computations are finished? Thus, we prevent this polling issue altogether and the CPU should sleep nicely while the GPU works.

SPWorley · September 9, 2008, 6:03pm

You don’t need to have the CPU poll constantly, you can add a microsleep between tests. Yes, this is still polling, but it’s effective and reduces your CPU use to effectively 0%. What you lose is roughly half a millisecond of latency (whatever sleep quantum you use).

I do understand what you want, an OS level event that a thread can be waiting for with NO latency and NO polling, and we don’t have that. I don’t know if that’s an OS, a GPU driver, or a CUDA framework limitation.

isaacgerg · September 9, 2008, 6:10pm

My first statement reflects your first statement. Its seems like current GPU hardware designs are why we have to poll, sleep, etc in the first place. What I am asking is in the future, we will be able to call our CUDA function, have our thread yield, and then have the GPU wake the thread back up when finished.

tmurray · September 9, 2008, 6:13pm

Well, you’d still get significant overhead from the OS, so you’d probably be better off doing the sleep and poll behavior in the first place. Unless you just want some syntactic sugar to hide this behavior from you in the first place?

(pretty sure it’s an OS problem more than anything else–GPU fires an interrupt, CPU catches interrupt, switch to kernel mode, check to see what that interrupt should do, load thread state… yeah, that’s not free.)

alex_dubinsky · September 9, 2008, 10:58pm

I have an idea. The blocking functions in CUDA, especially the sync one, should do the sleep-polling themselves. That way you have low CPU usage and simple syntax. I guess NVIDIA did it the way they did to have low latency. Well, in that case NVIDIA itself should should do the fancy wake-on-interrupt that the OP is talking about. Then everyone is happy. Pretty straight-forward fix to the way things are now.

P.S. tmurray, i’m not sure what you’re saying. Yes you have to process the interrupt, but all that is happening anyway. Modifying the event handler to wake the CUDA runtime thread shouldn’t add much to it.

E.D_Riedijk · September 10, 2008, 5:22am

I think fix is not the right word. If the default behaviour gets modified, the HPC people will scream. Maybe there can be something done with streams and a standard polling function that gets fed the event to look for? Maybe with some macro?

Reimar · September 10, 2008, 6:38am

There is already a setting to make the driver yield() instead of spin-looping, but that does not reduce CPU usage (relevant on e.g. laptops (yes, I know, few will use CUDA on that, but IMO it will be essential if CUDA is intended to be used for truly general-purpose stuff) and to easily see how much actual CPU the program needs).

They could add a flag/option that changes this behaviour, the way it is now I have to carefully tune the sleep time to make it work optimally, and ideally it would also depend on whether you have a tickless kernel etc. and on Windows it would need a completely different implementation because of the usually inaccurate timers.

I am almost certain NVidia could make a generic solution that would at least not work worse, probably it would work very much better and in addition be portable across platforms, not to mention avoiding every developer that needs that reinventing the wheel.

mfatica · September 10, 2008, 2:44pm

We are working on a couple of possible solutions, but we don’t have a timeframe yet.

Reimar · September 11, 2008, 7:10am

It is appreciated, and just to make sure I want to clarify that my comment was not meant so much as a complaint, but mostly an opinion about the best solution to the problem.

Topic		Replies	Views
CPU load when kernel is running why 100%? CUDA Programming and Performance	14	8426	December 22, 2008
cpu usage while waiting for kernel CUDA Programming and Performance	4	9010	August 1, 2009
100% CPU usage when running CUDA code CUDA Programming and Performance	5	5201	October 31, 2023
Using GPU<->CPU polling to reduce overhead CUDA Programming and Performance	12	10664	November 21, 2007
Do the non-async calls sleep or burn CPU? CUDA Programming and Performance	20	22312	January 13, 2008
CPU Usage CUDA Programming and Performance	6	1810	October 5, 2009
how to make the CPU yield while raycasting on GPU OptiX	4	759	June 14, 2022
100% CPU use while waiting for kernel CUDA Programming and Performance	7	4775	July 10, 2008
How to detect async event without polling CUDA Programming and Performance	28	6183	August 20, 2010
Host CPU busy while waiting ? CUDA Programming and Performance	3	2186	May 5, 2009

CUDA CPU Utilization

Related topics