100% CPU Usage - Linux

iCrayer · December 16, 2016, 4:14pm

Hi there,

I created a kernel which took 10ms to execute in order to see what is the CPU behavior when a kernel is running on a GPU.
Thanks to some topics on this forum I understood that the CPU is spinning to wait the end of a GPU process (default behavior). Thus it’s using 100 % of the CPU.
I learned that when you are using cudaSetDeviceFlags with the cudaDeviceScheduleBlockingSync flag, the process will be blocked and the CPU could be used for something else. It means that the cpu usage is lower than the spinning mode.

But when I do this trick, it doesn’t change a thing to the cpu usage, I still have 100 %.

Someone guessed in the bellow link that those flags might not be implemented in Linux.
[url]https://devtalk.nvidia.com/default/topic/611939/cuda-programming-and-performance/cudasetdeviceflags-and-cudadevicescheduleyield-in-embedded-envirronment/post/3953666/#3953666[/url]

The question is : Still this feature not implemented in Linux api ? Or what I’m doing wrong.

Information about my test :

Nvidia GTX 950
RHEL 6.0
CUDA 7.5
cudaSetDeviceFlags is used before all context creation

tera · December 16, 2016, 6:27pm

It’s definitely implemented. If I put

cudaDeviceScheduleBlockingSync

at the start of a compute-heavy CUDA program, CPU usage drops from 100% to less than 1% (as reported by top).
This is on CUDA 8.0, driver 367.44 on openSUSE tumbleweed.

iCrayer · December 19, 2016, 10:09am

It’s definitely implemented. If I put
cudaDeviceScheduleBlockingSync
at the start of a compute-heavy CUDA program, CPU usage drops from 100% to less than 1% (as reported by top).
This is on CUDA 8.0, driver 367.44 on openSUSE tumbleweed.

Thank you for your reply.

I have CUDA 7.5 and use the driver : 352.39
Do you think it doesn’t work because of my settings ? Maybe a bug inside CUDA 7.5 or my driver which doesn’t implement the flags behavior.

When I’m using cudaGetDeviceFlags it returns 8
After using cudaSetDeviceFlags with cudaDeviceScheduleBlockingSync it returns 12.
As I can see, the flag is correctly set but the cpu usage isn’t lower during my kernel execution.

cbuchner1 · December 19, 2016, 10:31am

I remember having had similar frustration while working on a background application for crypto mining (cudaminer).

What I ended up doing was to insert sleep commands into my code that would put the thread to sleep for about 95% of the kernel execution time. Some kind of feedback control loop was used to adjust the sleep time to match this target.

I ended up getting very low CPU usage and no notable performance hit.

Of course this strategy only works for kernels with a predictable execution time per iteration.

Christian

max.wittal · February 12, 2018, 12:58am

I found you have to call

cudaSetDevice(0);

before calling cudaSetDeviceFlags(…), otherwise it has no effect whatsoever.

See the documentation http://docs.nvidia.com/cuda/cuda-runtime-api/index.html#ixzz56qIPnXa2 :
“If no device has been made current to the calling thread, then flags will be applied to the initialization of any device initialized by the calling host thread, unless that device has had its initialization flags set explicitly by this or any host thread.”

Which essentially means “If no device has been made current to the calling thread [before calling cudaSetDeviceFlags()] undefined behavior results in case of a multi-threaded application”.

Max

Topic		Replies	Views
cudaDeviceScheduleBlockingSync & multi-gpu How to use BlockingSync w/ multiple devices? CUDA Programming and Performance	3	6519	April 13, 2011
High host CPU load CUDA Programming and Performance	15	1634	December 15, 2021
CPU core is busy while GPU runs its kernel CUDA Programming and Performance	11	5227	February 11, 2018
letting the host thread sleep in 2.2? CUDA Programming and Performance	8	4313	July 1, 2009
CPU Usage CUDA Programming and Performance	6	1700	October 5, 2009
cpu usage while waiting for kernel CUDA Programming and Performance	4	8911	August 1, 2009
cudaSetDeviceFlags and cudaDeviceScheduleYield in embedded envirronment CUDA Programming and Performance	2	3373	September 27, 2013
Best practices for cudaDeviceScheduleBlockingSync usage pattern on Linux CUDA Programming and Performance	5	3169	June 14, 2021
100% CPU usage when running CUDA code CUDA Programming and Performance	5	4850	October 31, 2023
cudaSetDeviceFlags and cudaDeviceScheduleYield in embedded envirronment CUDA Setup and Installation	1	891	September 24, 2013

100% CPU Usage - Linux

Related topics