GeForce Drivers 4xx.xx drop more than 2/3 in OpenCL Performance from the 3xx.xx Drivers

Robert_Crovella · November 27, 2018, 10:34pm

A request was already sent over 12 hours ago to the person who filed the bug report. So far no indication in the bug report that any response to that request has been received.

bomb.squad.q · November 27, 2018, 10:44pm

You are 100% correct. I missed the notification that the big report had been updated. Thank you for letting me know I had indeed missed a response.

The response is below. GammerPC, think you can send the requested info from the bug report?

GammerPC · November 27, 2018, 10:48pm

By Fancy Fan
Hi,
Thank you for reporting.
Could you ask BCav provide us a self-contained reproducer and the steps BCav run into deterioration.
You can send attachment to CUDAIssues@nvidia.com for file exchange.

No Clue how I would do this, It is a BONIC CUDA Application as posted
Are you asking for the EXE File?
primegrid_ap27_2.02_windows_x86_64__OCL_cuda_AP27.exe
If run this BONIC AP Project from PrimeGrid it will download that file.
primegrid_ap27_2.02_windows_x86_64__OCL_cuda_AP27.exe.txt (1.85 MB)

GammerPC · November 28, 2018, 4:22am

primegrid_ap27_2.02_windows_x86_64__OCL_cuda_AP27.exe
Supported platforms:
•Windows: Nvidia GPU1 (OpenCL): 64 bit, AMD/ATI GPU1 (OpenCL): 64 bit, CPU: 64 bit
•Linux: Nvidia GPU1 (OpenCL): 64 bit, AMD/ATI GPU1 (OpenCL): 64 bit, CPU: 64 bit
•Mac2: Nvidia GPU1 (OpenCL): 64 bit, CPU: 64 bit
1 GPU must have a minimum of 1.5 GB of VRAM. 2 Due to an Apple driver bug, no ATI/AMD GPU application is available for Mac.

Added to the Bug Report as well
From AP 27 Search
The original AP26 program was written by Jaroslaw Wróblewski and adapted to BOINC by Geoff Reynolds.
The 2016 CPU and OpenCL versions of AP26 were updated by Bryan Little and Iain Bethune.

A Good reason that PG should be working this.

GammerPC · November 29, 2018, 1:21am

This is mostly a PG AP Issue, I ran all the other BOINC GPU Projects and apps today.
Starting here [url]https://forums.evga.com/FindPost/2889231[/url]

Robert_Crovella · November 29, 2018, 1:55am

A widespread performance degradation should have been caught by our QA processes, so if this issue is mostly localized to a single app, that isn’t necessarily surprising.

Also, contrary to this thread title, the development team has categorized this as an issue with OpenCL, not CUDA. I assume this means, in spite of the title of the executable, that the underlying (GPU) code is written in OpenCL, not CUDA, however I have not attempted to confirm this myself. The distinction isn’t terribly important with respect to issue resolution. Merely a point of clarification for others who might read this thread and wonder if it applies to them.

The issue has been reproduced internally at NVIDIA, based on information provided so far via the bug report. There is a (fairly standard) plan in place to attempt to identify underlying root cause. I don’t have any further information to share at this time, and won’t be able to respond to requests for more information, most other questions, or any sort of inquiry about what the current status or state of the issue is, until there is sufficient forward progress on the analysis of the bug. At that time, I will do my best to be proactive and provide an update here. Until then, I’m unlikely to respond to requests for more information.

I don’t expect the issue to be sorted out rapidly. Working with a compiled binary (as opposed to having source code and active participation from the developer) generally results in a slower progress of issue resolution (using the “standard” plan I referred to. If that plan doesn’t yield useful info, progress can be even slower).

And of course, like all issues, resolution of this issue is subject to assessed priority as well as competing priorities in a resource-constrained environment.

GammerPC · November 29, 2018, 2:01am

Thank you,
I have posted above that this an OpenCL issue and not a CUDA Issue and also seems to be a PG AP App issue.
If there is an area on the Forum for OpenCL please do move this thread as well as replace CUDA in the Topic Title with OpenCL.

“Robert_Crovella”
“I don’t expect the issue to be sorted out rapidly. Working with a compiled binary (as opposed to having source code and active participation from the developer) generally results in a slower progress of issue resolution (using the “standard” plan I referred to. If that plan doesn’t yield useful info, progress can be even slower).”

As far as the Source Code you can contact PG and work with them about the Code.
Thank you,

bomb.squad.q · November 29, 2018, 2:18am

Robert Cravella,

Thank you a thousand times over! I apologize for labeling it a CUDA issue. I tried to update the title, but I can only edit the post itself. I would gladly have the title corrected if possible.

Thank you again for the team looking into and reproducing this! I appreciate your patience and help as well.

Robert_Crovella · November 29, 2018, 2:26am

I’ve modified the thread title. Again, not terribly important, but possibly less confusion down the road.

bomb.squad.q · November 29, 2018, 2:29am

Thank you :-)

GammerPC · December 12, 2018, 4:52pm

Should I Test the New Drivers as they come out or Wait?

Robert_Crovella · December 12, 2018, 5:00pm

There is no point in testing newer drivers; I don’t expect any changes in this respect. Changes are required in the application if they want to restore performance with the newer drivers.

Current Scenario in ap26 app:

App queries CL_KERNEL_WORK_GROUP_SIZE in order to decide local work group size of either 1024 (seems optimal) or 64 (sub-optimal). If app gets value for query <1024 it reduces local work group size to 64 assuming device doesn’t support 1024.
Nvidia OpenCL Driver changed return value for CL_KERNEL_WORK_GROUP_SIZE from 1024 to 256.
App is not using CL_KERNEL_WORK_GROUP_SIZE returned by driver as is, but just choosing a non-optimal local work-group size (64) based on this query.

What should developers do:

• Query CL_KERNEL_WORK_GROUP_SIZE to get just hint about work group size from driver and use it to launch kernel with that specific value. It need not be optimal for all kernels.

• App is free to choose any value from range [1 , CL_DEVICE_MAX_WORK_GROUP_SIZE] to get best possible work group size for different kernels, irrespective of CL_KERNEL_WORK_GROUP_SIZE returned by driver.

Suggestions specific to ap26:

• App can query CL_DEVICE_MAX_WORK_GROUP_SIZE and set work group size accordingly instead of using CL_KERNEL_WORK_GROUP_SIZE.

• Simplest solution for ap26 would be to use 1024 work group size directly if it comes in range [1 , CL_DEVICE_MAX_WORK_GROUP_SIZE].

I don’t know how to best communicate the above information to the developers. If there is a good way to do that, please advise.

GammerPC · December 12, 2018, 5:13pm

Thank you, I posted the above on PG [url]Geforce Drivers 4xx.xx Drop more than 2/3 in OpenCL Performance from the 3xx.xx Drivers. so they have the same Info now.

Robert_Crovella · December 12, 2018, 5:23pm

This is considered not a NVIDIA bug, and the issue will be closed from NVIDIA side.

Topic		Replies	Views
CUDA 2.1 discussion CUDA Programming and Performance	71	63939	February 17, 2009
GPU in state where results are not reproducible! CUDA Programming and Performance	50	16684	November 2, 2012
New Features in CUDA 7.5 Technical Blog	66	1065	August 10, 2016
Cuda vs OpenCL CUDA Programming and Performance	49	262154	December 28, 2008
Wishlist Place your considered suggestions here CUDA Programming and Performance	201	204313	April 13, 2009
Looking for CUDA apps that can use more than 1 GPU. CUDA Programming and Performance	41	12974	December 9, 2009
OpenCL 1.1 driver, 8 months and waiting... CUDA Programming and Performance	67	10022	July 7, 2011
2 Tesla C1060s with a legacy GeForce FX 5200 card Need help editing the xorg.conf file for multiple CUDA Programming and Performance	28	35534	January 29, 2009
Standard nVidia CUDA tests fail with dual RTX 4090 Linux box Linux	54	20450	April 29, 2024
CUDA Toolkit 3.2 release candidate available to registered developers CUDA Programming and Performance	68	63109	December 3, 2010

GeForce Drivers 4xx.xx drop more than 2/3 in OpenCL Performance from the 3xx.xx Drivers

Related topics