GeForce Drivers 4xx.xx drop more than 2/3 in OpenCL Performance from the 3xx.xx Drivers

Let me start by saying that I have not created this information. This information was shared by an EVGA user that closely monitors his hardware, and he noticed an anomaly with driver changes that is very easily reproduced just by switching drivers.

Lets start with his information that was initially posted:

The Geforce Game Ready Drivers 4xx.xx Drivers on the GTX 10 Series Drop more than 2/3 in CUDA Performance from the 3xx.xx Drivers
PG AP Runtimes running on the EVGA GTX 1080 Ti SC Black Edition Graphics Card. Smaller Numbers are Faster or Better.
Driver: 411.70 Run time (sec) 2,619.05 CPU time (sec) 45.12
Driver: 416.16 Run time (sec) 2,356.02 CPU time (sec) 43.10
Driver: 416.81 Run time (sec) 2,507.31 CPU time (sec) 44.83
Driver: 416.69 Run time (sec) 2,609.08 CPU time (sec) 46.11
Driver: 391.35 Run time (sec) 851.45 CPU time (sec) 40.19
Driver: 390.65 Run time (sec) 852.15 CPU time (sec) 38.39

First question I asked, “where is this information coming from?” This is from BCavanaugh’s PrimeGrid tracking. The above number relate to tasks completed by prime grid utilizing CUDA. As you can see above, from 411.70 and forward, the times are extremely high compared to the 391.35 and lower drivers.

We are not trying to point fingers or blame NVidia for anything here, but we would all like to know what is causing the massive slow down on the 1080ti Cards when using the 410+ series drivers.

BCavanaugh has done extensive testing, and changing drivers. reverting back to the 300 series drivers yield large improvements for the programming.

I am not sure if I am allowed to post Links currently. I will try, and if it gets removed, I will transcribe what I am able to. <- Overall results currently running the latest driver, these will update as work units are completed. If the 300 series drivers need to be loaded to show testing, please let me know and I will try to talk to the original owner of the information, BCavanaugh, and get the 300 series drivers running for new results. <- 300 series driver directly compared to 400 series drivers. <- 300 series drivers only

I am not a boinc user. I am not a prime grid user. I am simply transcribing as much info as I can, just in case someone else can test this information and we can get NVidia to take a look into this and get it corrected.

This is a link to the original thread in the EVGA forums

Please let me know if you have any questions, and I will try to ask BCav and pass on any information.

My suggestion would be that you file a bug at

The information you’ve given here wouldn’t be sufficient to pursue analysis. I would recommend providing a complete set of instructions about how to reproduce the comparison data. That would include where to get the necessary software, how to download and install it, and how to run it. Also provide the specific driver versions tested and results/comparison data.

With that information, it should be possible for our team to analyze. Without it, for whatever reason, I’m not optimistic of positive results/outcome.

When clicking on any of the primegrid links, it gives instructions directly on that page how to set up primegrid. The specific drivers are listed, and the information requested would already be in the post. I will see if I can find more primegrid users to show this information, but as I stated, I am not the owner of this information. The owner of the information doesn’t want to jump through the hoops that are required.

The results have been reproducable across multiple systems and they are directly tied to the driver and cuda. I would seem that once the 20xx series launched, NVidia purposefully locked down CUDA to perform worse on the new drivers. I am fairly certain NVidia could easily research this issue and correct it. If the issue persisted after reverting back to older drivers, then it would be apparent that there was an issue with the overall system, but reverting back to an older driver removes the issue, and then updating causes it to return.

I will try forwarding this again. It would seem that each time it is posted, the NVidia response it to send it somewhere else and I am not sure what the best course of action would be. I am just trying to get the attention brought to this.

My recommendation is for you or someone you know of, to file a bug at

You’re welcome to do what you wish, of course.

BUG ID 2450242

Submitted the bug report. I have also reached out to other Boinc/Prime Grid users to see if I can scavenge more information on the issue.

Looking at here to do this >My suggestion would be that you file a bug at
New to all this so it my take me a little time

IE 11 is also not supported here.

Is there anyone that Runs BOINC and the Project PrimeGrid
The Program that I am running on PrimeGrid is a CUDA Program called AP27 Search – “ap26”
The list below are the Run Times for a Single Task using the listed Driver.

Driver: 417.01 Run time (sec) 2,222.86 CPU time (sec) 40.41
Driver: 411.70 Run time (sec) 2,619.05 CPU time (sec) 45.12
Driver: 416.16 Run time (sec) 2,356.02 CPU time (sec) 43.10
Driver: 416.81 Run time (sec) 2,507.31 CPU time (sec) 44.83
Driver: 416.69 Run time (sec) 2,609.08 CPU time (sec) 46.11
Driver: 399.24 Run time (sec) 860.20 CPU time (sec) 39.38
Driver: 391.35 Run time (sec) 851.45 CPU time (sec) 40.19
Driver: 390.65 Run time (sec) 852.15 CPU time (sec) 38.39

This link is Tasks that I am now running.
The Output can be seen here

Center Data remove to keep it small

Stderr output

<![CDATA[ AP26 OpenCL 10-shift search version 1.3 by Bryan Little and Iain Bethune Compiled Aug 17 2016 with GCC 4.9.0 Command line: projects/ 83756768 83756889 0 GPU Info: Platform name: NVIDIA CUDA Vendor: NVIDIA Corporation Device name: GeForce GTX 1080 Ti GPU RAM: 3221225472 GPU max malloc: 2952790016 kernel profile (sec): 0, (nanoseconds): 36126422 calculated max kernel queue length: 2 kernel profile (sec): 0, (nanoseconds): 22907035 calculated max kernel queue length: 4 kernel profile (sec): 0, (nanoseconds): 22441009 ............................ calculated max kernel queue length: 4 kernel profile (sec): 0, (nanoseconds): 26055107 calculated max kernel queue length: 3 kernel profile (sec): 0, (nanoseconds): 25667138 calculated max kernel queue length: 3 kernel profile (sec): 0, (nanoseconds): 23294673 calculated max kernel queue length: 4 kernel profile (sec): 0, (nanoseconds): 21241051 calculated max kernel queue length: 4 kernel profile (sec): 0, (nanoseconds): 22785320 calculated max kernel queue length: 4 kernel profile (sec): 0, (nanoseconds): 26370972 calculated max kernel queue length: 3 kernel profile (sec): 0, (nanoseconds): 22562725 calculated max kernel queue length: 4 kernel profile (sec): 0, (nanoseconds): 23121361 calculated max kernel queue length: 4 kernel profile (sec): 0, (nanoseconds): 21761650 calculated max kernel queue length: 4 kernel profile (sec): 0, (nanoseconds): 22901743 calculated max kernel queue length: 4 kernel profile (sec): 0, (nanoseconds): 21447108 calculated max kernel queue length: 4 kernel profile (sec): 0, (nanoseconds): 22844523 calculated max kernel queue length: 4 kernel profile (sec): 0, (nanoseconds): 22898435 calculated max kernel queue length: 4 16:42:42 (8652): called boinc_finish(0) ]]>

This can be confirmed by me and many other Primegrid members. My orignal thread about this driver issue can be found here -

It has been reproduced by several of us at Primegrid.

All additional relevant information should be added to the bug report. Posting it here won’t really help. Note that the NVIDIA bug database keep reports confidential, so only the bug filer and relevant NVIDIA personnel are able to access bug information.

I have a hard time imagining what kind of driver change could cause an application level-slowdown by a factor of three (at least that is what the posted numbers appear to indicate?)

The only thing I can think of is that the application makes use of JIT compilation, but even then factor three seems extreme for any sort of change to a (at this time) mature tool chain.

In general, applications should not use JIT compilation unless absolutely necessary (e.g. dynamic code generation based on user input). Classic fat binaries are much preferred for stability.

This Forum is very unfriendly to IE 11

Well OK
Maybe give the Link to report this 2 month old issue and I will post what I have.

I find it a shame a very easily reproducible issue seems to get shrugged off so quick.

The entire point of posting it under the driver forum for nvidia was to bring attention to the issue. Once posted, there was a message that it needed to be pushed here, since it is a cuda application.

Once posted here, it was said to push it to a bug report, which was done immediately. Pushing it to a bug report also binds the hands as no other memeber can contribute.

I hope nvidia isn’t so quick to shrug off their patrons.

Clearly there is a major issue with the 411.xx+ drivers, as shown by multiple users, and as shown above, this was discovered a while back.

While some may not feel it necessary to continue adding information here, but I will keep all five current lines of information open, (EVGA forums, driver forum post, this forum post, big report, and reddit thread.).

The more areas that pertinent information is shared, the sooner it can be taken care of.

The last time an event like this occurred that was on a scale large enough to reproduce, it was during one of the EVGA yearly folding challenges. When Stanford and the folding team were contacted on their own forum, they attempted to push us to other sites and remove our posts. It didn’t work. Stanford and the folding team management tried to blame nvidia… come to find out, months later, there was a bad entry into the work units, and Stanford and the folding team that gives out work units had caused all the issues. They tried to shrug users off.

This time it’s nvidia, and the problem is corrected by back dating to older drivers, so clearly it is a cause stemming from the nvidia driver for some reason. Hopefully nvidia doesn’t shrug off a very easily reproduced error like this.

I look forward to nvidias response, and hope they can take the time to find the cause.

Found the Bug Report, I think.
GTX Major CUDA issue with the 411.xx & 417.xx
This is NOT an issue on the RTX Graphic Card Platform.

As far as the interaction here is concerned, it is not a question of “shrugging off”. It’s a question of using the proper channels to report bugs.

Filing a bug report with NVIDIA is the necessary starting point for the resolution of any technical issue with NVIDIA products, such as this observed performance regression. Once a bug report has been filed, NVIDIA will attempt in-house repro. Once they have a successful repro, relevant engineering teams will work on the resolution of the issue. How long that will take will depend on the root cause.

These forums are designed as a platform for a user community (“users helping users”), they are not an official bug reporting venue.

As a fellow CUDA user with lengthy experience, I am merely trying to provide advice on how to use the existing process to get the issue resolved in as expedient a manner as possible. That includes the advice that all relevant information should be added to the bug report, where it is actionable.

As the filer of bug 2450242, bomb.squad.q is of course free to share here what responses (if any) they have received on their bug report. The same applies to the filers of other relevant bug reports.

Maybe then you can install BOINC Client on your Computer and Run some AP Tasks from PrimeGrid.
Being a CUDA Expert this would help the Gaming and Folding and BOINC Community.
This would be under the Windows OS
Help Would Really Be Great

But this Really should be under the
We are End Users of the NVIDIA Graphics Cards and Drivers
We are Not Developers

Even if I were to reproduce this issue on my machine, that wouldn’t really help anyone. Only NVIDIA can figure out what is going on in their code base. As I said, I don’t even have a sensible hypothesis of what may have happened here. The observation does not match any case of a performance regression I recall. It doesn’t make sense to me.

The one thing that is important from a practical perspective right now is for the bug filer(s) to supply enough information in their bug report(s) so that NVIDIA can easily reproduce the issue in house. They may find that they are iterating with the repro team for a while, depending on how difficult it is to set up and configure the application. If so, that’s just a necessary part of the process.

I have worked on a least one bug in my career that took more than a work-week to achieve repro, a day of debugging, and five minutes to fix. This just as an example of how crucial efficient repro can be.

Other than this happened going from the 700 then to the 900 and then to the 1000 cards and yet once more it seems that the New Drivers for New Cards Break all the Older Cards.

Please define “break”, it is not clear what you are referring to. An old adage in the computer world is that “If it ain’t broke, don’t fix it”. As a corollary it is probably unwise to always install the latest drivers just because they are there.

I am on the fourth GPU in my six-year old PC. I have used recent drivers with recent GPUs, recent drivers with older GPUs, etc. This is on Windows 7. I have yet to encounter “breakage”. I do not use beta drivers. I have been running F@H on NVIDIA GPUs for more than ten years without issues (currently ranked 6631 among all donors)

New Card Drivers Break the Performance of the Older Cards as we are now seeing.
NVIDIA has released non-Beta Drivers the Killed Folding at home but this off the Top of CUDA Performance on BOINC GPU Projects.
NVIDIA has even release non-Beta Drivers that Killed OpenCL. I can go on but it is not the case nor issue here.
bcavnaugh 1,603,588,652 maybe low, but over the years have many computers and many NVIDIA Graphics Cards.

Fortunately reproducing the error is extremely easy. Just install 411 and higher drivers with the 1080ti and the cuda performance in these tasks tank. Install older drivers on the same 1080ti and performance goes back to normal. It really is that simple. It is directly tied to the driver, so that is a big step in reproduction of the error.