GeForce Drivers 4xx.xx drop more than 2/3 in OpenCL Performance from the 3xx.xx Drivers

A request was already sent over 12 hours ago to the person who filed the bug report. So far no indication in the bug report that any response to that request has been received.

You are 100% correct. I missed the notification that the big report had been updated. Thank you for letting me know I had indeed missed a response.

The response is below. GammerPC, think you can send the requested info from the bug report?

By Fancy Fan
Hi,
Thank you for reporting.
Could you ask BCav provide us a self-contained reproducer and the steps BCav run into deterioration.
You can send attachment to CUDAIssues@nvidia.com for file exchange.

No Clue how I would do this, It is a BONIC CUDA Application as posted
Are you asking for the EXE File?
primegrid_ap27_2.02_windows_x86_64__OCL_cuda_AP27.exe
If run this BONIC AP Project from PrimeGrid it will download that file.
primegrid_ap27_2.02_windows_x86_64__OCL_cuda_AP27.exe.txt (1.85 MB)

primegrid_ap27_2.02_windows_x86_64__OCL_cuda_AP27.exe
Supported platforms:
•Windows: Nvidia GPU1 (OpenCL): 64 bit, AMD/ATI GPU1 (OpenCL): 64 bit, CPU: 64 bit
•Linux: Nvidia GPU1 (OpenCL): 64 bit, AMD/ATI GPU1 (OpenCL): 64 bit, CPU: 64 bit
•Mac2: Nvidia GPU1 (OpenCL): 64 bit, CPU: 64 bit
1 GPU must have a minimum of 1.5 GB of VRAM. 2 Due to an Apple driver bug, no ATI/AMD GPU application is available for Mac.

Added to the Bug Report as well
From AP 27 Search
The original AP26 program was written by Jaroslaw Wróblewski and adapted to BOINC by Geoff Reynolds.
The 2016 CPU and OpenCL versions of AP26 were updated by Bryan Little and Iain Bethune.

A Good reason that PG should be working this.

This is mostly a PG AP Issue, I ran all the other BOINC GPU Projects and apps today.
Starting here [url]https://forums.evga.com/FindPost/2889231[/url]

A widespread performance degradation should have been caught by our QA processes, so if this issue is mostly localized to a single app, that isn’t necessarily surprising.

Also, contrary to this thread title, the development team has categorized this as an issue with OpenCL, not CUDA. I assume this means, in spite of the title of the executable, that the underlying (GPU) code is written in OpenCL, not CUDA, however I have not attempted to confirm this myself. The distinction isn’t terribly important with respect to issue resolution. Merely a point of clarification for others who might read this thread and wonder if it applies to them.

The issue has been reproduced internally at NVIDIA, based on information provided so far via the bug report. There is a (fairly standard) plan in place to attempt to identify underlying root cause. I don’t have any further information to share at this time, and won’t be able to respond to requests for more information, most other questions, or any sort of inquiry about what the current status or state of the issue is, until there is sufficient forward progress on the analysis of the bug. At that time, I will do my best to be proactive and provide an update here. Until then, I’m unlikely to respond to requests for more information.

I don’t expect the issue to be sorted out rapidly. Working with a compiled binary (as opposed to having source code and active participation from the developer) generally results in a slower progress of issue resolution (using the “standard” plan I referred to. If that plan doesn’t yield useful info, progress can be even slower).

And of course, like all issues, resolution of this issue is subject to assessed priority as well as competing priorities in a resource-constrained environment.

Thank you,
I have posted above that this an OpenCL issue and not a CUDA Issue and also seems to be a PG AP App issue.
If there is an area on the Forum for OpenCL please do move this thread as well as replace CUDA in the Topic Title with OpenCL.

“Robert_Crovella”
“I don’t expect the issue to be sorted out rapidly. Working with a compiled binary (as opposed to having source code and active participation from the developer) generally results in a slower progress of issue resolution (using the “standard” plan I referred to. If that plan doesn’t yield useful info, progress can be even slower).”

As far as the Source Code you can contact PG and work with them about the Code.
Thank you,

Robert Cravella,

Thank you a thousand times over! I apologize for labeling it a CUDA issue. I tried to update the title, but I can only edit the post itself. I would gladly have the title corrected if possible.

Thank you again for the team looking into and reproducing this! I appreciate your patience and help as well.

I’ve modified the thread title. Again, not terribly important, but possibly less confusion down the road.

Thank you :-)

Should I Test the New Drivers as they come out or Wait?

There is no point in testing newer drivers; I don’t expect any changes in this respect. Changes are required in the application if they want to restore performance with the newer drivers.

Current Scenario in ap26 app:

  1. App queries CL_KERNEL_WORK_GROUP_SIZE in order to decide local work group size of either 1024 (seems optimal) or 64 (sub-optimal). If app gets value for query <1024 it reduces local work group size to 64 assuming device doesn’t support 1024.

  2. Nvidia OpenCL Driver changed return value for CL_KERNEL_WORK_GROUP_SIZE from 1024 to 256.

  3. App is not using CL_KERNEL_WORK_GROUP_SIZE returned by driver as is, but just choosing a non-optimal local work-group size (64) based on this query.

What should developers do:

• Query CL_KERNEL_WORK_GROUP_SIZE to get just hint about work group size from driver and use it to launch kernel with that specific value. It need not be optimal for all kernels.

• App is free to choose any value from range [1 , CL_DEVICE_MAX_WORK_GROUP_SIZE] to get best possible work group size for different kernels, irrespective of CL_KERNEL_WORK_GROUP_SIZE returned by driver.

Suggestions specific to ap26:

• App can query CL_DEVICE_MAX_WORK_GROUP_SIZE and set work group size accordingly instead of using CL_KERNEL_WORK_GROUP_SIZE.

• Simplest solution for ap26 would be to use 1024 work group size directly if it comes in range [1 , CL_DEVICE_MAX_WORK_GROUP_SIZE].

I don’t know how to best communicate the above information to the developers. If there is a good way to do that, please advise.

Thank you, I posted the above on PG [url]Geforce Drivers 4xx.xx Drop more than 2/3 in OpenCL Performance from the 3xx.xx Drivers. so they have the same Info now.

This is considered not a NVIDIA bug, and the issue will be closed from NVIDIA side.