Any progress in getting the 100% CPU usage Bug with 270.xx and later drivers fixed? I’m having to leave a Core free to fully utilse Raistmer’s OpenCL app with the 290.53 drivers,
AMD fixed their 100% CPU usage Bug after only 2 driver releases with Cat 11.9, Nvidia have released ~15 drivers since this Bug was introduced,
(Not tried the 295.xx drivers yet, they have a Nasty Bug where the Cuda device disappears when the DVI connected monitor goes to sleep)
App sources are available from Berkeley’s SVN repository (read access granted for Anonymous access too): https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt
In short, there is no 100% usage on older NV drivers, no high CPU usage on current ATi drivers.
Bug was reported to NV more than 2 months ago and since then nothing changed (bug was accepted in work but no results or bugfix so far).
Have a look in the CUDA_Toolkit_Reference_Manual at the section about cudaSetDeviceFlags. AFAIK OpenCL doesn’t have a matching function set, so scheduling is probably set to auto, and the heuristics have probably changed:
cudaDeviceScheduleAuto: The default value if the flagsparameter is zero, uses a heuristic based on the
number of active CUDA contexts in the processCand the number of logical processors in the systemP. If C>
P, then CUDA will yield to other OS threads when waiting for the device, otherwise CUDA will not yield while
waiting for results and actively spin on the processor.
cudaDeviceScheduleSpin: Instruct CUDA to actively spin when waiting for results from the device. This can
decrease latency when waiting for the device, but may lower the performance of CPU threads if they are per-
forming work in parallel with the CUDA thread.
cudaDeviceScheduleYield: Instruct CUDA to yield its thread when waiting for results from the device. This
can increase latency when waiting for the device, but can increase the performance of CPU threads performing
work in parallel with the device.
AMD probably always yields, while on modern processors, it seems that CUDA (I’m guessing that OpenCL is the same) will almost always spin rather than yield.
You can try creating more contexts than cores and see if that changes behavior (assuming that OpenCL works the same and doesn’t just always spin or unite contexts)
Can number of active contexts be increased with runing many instances of app? Cause there is only 1 context per app instance used.
We will try to make number of app instances bigger than number of CPUs in system and report results…
Pity than NV did not expose control of scheduler behavior to OpenCL apps. They could at least use some environment variable to instruct runtime to use one or another method.
AMD uses such variables for ISA/IL dumping control for example so it’s known practice…
Not sure how NVIDIA will treat that. It’s probably better to try to create several OpenCL contexts in the same app and just leave them unused. If your device is not in exclusive mode, you can create as many contexts as you want on it, if you don’t submit any kernels, they will not take any computing resources.