OpenCL busy wait still not fixed

The problem was brought up here years ago:

https://devtalk.nvidia.com/default/topic/494659/?comment=3541121

After many years, this problem is still not fixed. Currently we’re at Nnidia driver 375.20.

Basically, what NVidia is missing here is that they use “spinning” instead of “yielding” to find out if an OpenCL is finished. On CUDA, there’s a way to turn that off, on OpenCL there is not. The problem is that this produces a 100% CPU load on a single core for each GPU involved. It would be good enough to have an environment variable that is checked on startup which configures the library to use “yielding”.

I can only guess what’s the reason here. I think Nvidias goal was to increase the performance with this, but in my case there’s actually no gain from having a CPU burning loop. My kernels run for ~100ms and that’s quite a long time for an OpenCL kernel. At such a scale, “spinning” creates only disadvantages and actually makes the entire tool slower than it needs to be.

But more important, power efficiency is a big topic today. A 100% CPU burning loop is absolutely the opposite of power efficient as this creates an power waste of ~60W on my Intel 6700 CPU.

A more effective way to ask for something like this is to file a bug.

From my experience there’s no difference between filing a bug or posting it on forum.

Either way, nothing happens.

TL;DR: Without a bug report, it is pretty much certain that nothing will happen. If a bug report is filed, something may happen.

There is a difference. These forums are not designed as a support channel, but as a platform for a user community. NVIDIA employees that participate are here on their own initiative.

If you post a problem here, nothing will happen unless you get extremely lucky and an NVIDIA employee happens to see it and finds the problem important and interesting enough to go through the trouble of filing a bug report (that they then have to follow up on).

If you file a bug report, it gets stored and tracked in a database. The issues in the database are collated and prioritized. If the prioritization is high enough (operative words!) a fix may well be in the next CUDA release, otherwise, it may take longer, sometimes years. That’s similar to other large software products, including open source ones (I recall glibc bugs that took almost 10 years to get fixed).

Fairly straightforward assumptions are that bug fix prioritization includes number of customers reporting the same issue (“the squeaky wheel gets the grease”) and an assessment how many customers may be affected and how severely. One might conclude that OpenCL performance issues would rank low: few customers affected, and impact not severe, since not a functional bug.

You’re absolutely right again, but I think this makes it an even bigger problem.

My experience with filing bug reports in a 6+ year long history of GPGPU is a very frustrating one. TBH, that’s mostly the fault of AMD. Before NV started with maxwell and the LOP3 instruction, it was AMD which was the more important factor when it comes to crypto in GPGPU. Back then I’ve reported many bugs to AMD which were of course never fixed. However, with NV I had bad experience, too. It’s maybe not fair to compare AMD and NV at this point, but that’s my personal experience I made.

The experience was that filing bug reports didn’t change anything. But another problem was, such reports are not public. The advantage of an official NV forum like this is that it is public. That’s why I’ve posted it here. A link that everyone can click on and can read that bug X was reported to the vendor years ago is strong evidence. Such a link can be more worthy to developers like me that have otherwise no voice. The goal is to mobilize the users of my software which have alot of money and influence. My hope is that when I can proof to have reported the bug X to the vendor they might decide to step into the game and create more pressure than I can do.

Anyway I’ll file a Bug report, even if this is not really a bug. It’s more like a missing (but important) feature.

You should feel free to post about any CUDA performance issues on this forum to share the information. I just wanted to point out that doing so very likely contributes nothing to get the issue resolved, because these forums are not set up as a bug-reporting venue.

In NVIDIA’s bug reporting system, bug reports are private because many bug reports contain information that the filers of those reports very much prefer be kept confidential (this may include the simple fact that they are using CUDA in their upcoming software!).

Performance issues are typically enhancement requests, unless they represent a significant performance regression. Enhancement requests should be marked with the prefix “RFE:” in the bug synopsis.

You can of course share the bug numbers of any bugs you filed with NVIDIA in this forum (or any other place for that matter) if you think this is advantageous somehow. As noted, knowing the bug number won’t allow anybody else to look up the bug report itself; only the filer and NVIDIA engineers working on the resolution of that issue will have access.

On the NVIDIA side, we use bug reports to track many feature requests as well. We generally call these internally RFE (Request For Enhancement).

Since you have a specific proposal around the use of an environment variable to modify behavior, it probably makes sense to file this as an RFE. The process is no different, just mention RFE somewhere in your description.

It would be best if you file an exact test case (the code, the command to compile, and run, the platform, etc.) that demonstrates the busy-wait behavior and also define the tool you use to measure/observe cpu activity during this period. If you simply assume that someone on the NVIDIA side will write a test case according to your description, that’s not likely to get much traction either.

And having said all that, I offer no guarantees. The prioritization of work in a resource-constrained environment does not always satisfy everyone. If you give me the bug number that you received after filing the report, I will add myself to it and keep an eye on it, however.

I’ve collected and put together all the informations in a very high detailed grade, including sample code etc. Took me two hours.

Then when I want to submit everything and I get this:

This is so frustrating…

In my experience, the bug reporting form is flaky, and has been for years. I believe this is partially due to the use of detectors for malicious HMTL coding, spam, false positives with the same, and likely some real bugs as well.

My recommendation is to start with a very simple report, noting at the end that you will add details in subsequent steps. Then add files or extended descriptions one at a time.

Yes, this is frustrating and one would wish the NVIDIA folks would get around to providing a first-class interface, but then the clunkiness of these forums is some indication how high on the list of priorities such things seem to be.

me too.
bug reporting form rejected my few lines of code fragment for confirming re-produce…

I’ve reported the problem with the broken bugtracker via email to NVidia. They’ve said they know about the issue and will try to fix it. Since that I’ve gotten no update.

I believe the “broken” issue is now considered fixed on the NVIDIA side.

However, the bug reporting form can still be troublesome, because it attempts to detect malicious threats and prevent them from being entered.

My suggestion would be to start very simply with the initial bug entry. Once you have a bug created in the system, you can add information to it.

If you create a bug in the system, and post the bug number here, then provide the additional information you need for a full description and repro case here, I will do my best to get it added to the bug.

Indeed, bug submission works now. Thanks for your support txbob. Bug ID is: 1850558

Did you enter some sort of markup in the “steps to reproduce” field? I think that is supposed to be raw text.

Of course, it’s all files with all information needed to reproduce, including github repository, steps how to build and other informations like versions used, etc.

I’m differentiating between “markup” and “raw text”.

What you entered shows up with lots of stuff like
in it.

Furthermore, my suggestion would be to include a self-contained example, not something that requires building a large project.

If you want to add that here, I’ll add it to the bug.

Not sure what you mean, I didn’t enter any html stuff. Is it possible that view you have on the bugtracker is somehow not showing you as I see it? For me it’s some simple text and that’s it. I can add the full text here if you want?

Over a year now, the problem still exists. Other projects like ethminer (Ethereum miner) or in general all crypto currency miners using OpenCL they suffer from it, too.

The fix would be so simple: In CUDA one just needs to call cuDevicePrimaryCtxSetFlags (dev, CU_CTX_SCHED_YIELD) and that’s it. However in OpenCL there’s no way to set the flags. The most easy way would be to let the user set an environment variable or if that’s too much work just set it statically.

Over 1.5 years ago I’ve opened this Bug. The problem still exist today.

Yesterday my Bug ID https://developer.nvidia.com/nvidia_bug/1850558 has been removed silently from bug tracker. It no longer exist.

The bug (RFE) is still open in our system. I’m not sure why it has been removed silently from bug tracker. It does still exist, although apparently you don’t have visibility. The request is understood, and has been discussed by the development team.

An RFE is evaluated against a number of criteria. The time duration that an RFE has been open is probably one of the least important criteria. So far this one hasn’t been implemented, yet. I don’t have any further information to share.