Compute Shader Problem in 5xx Drivers, but not in 47x Drivers

karlb7hzt · July 30, 2022, 9:31pm

My client has a fairly complex compute shader that performs deduplication operations on very large datasets. The shader uses subgroups.

The shader works fine on 471, 472, and 473 series drivers (at least those we’ve tested). The 496.76 driver fails. “Fails” means that the returned results are incorrect. There are no crashes or validation errors. All the 5xx series drivers that we’ve tested fail as well. The difference in behavior doesn’t seem to depend on the driver being Game-Ready or Studio. I’ve also tested with NDA drivers and see the same results (works ok on 47x, but not 5xx).

The same shader code works fine on AMD GPUs. The incorrect results appear when running the shader on very large data and not for smaller data sets.

Anyway, the primary observation here is that “it works on 47x and not on 5xx”. Since it works on other hardware and seems to be broken in a specific driver series, I tend to think that the shader doesn’t have a problem. It is possible that it does, but we can’t seem to find it.

I did see a mention of a fix in the release notes for an NDA driver that is in the same area as the functionality that is in question. Since I don’t think I can discuss NDA driver information here, I’d be happy to point out the exact release note item to someone via email.

I understand that a reproduction test case would be ideal here, but it would take me a lot of effort to create one. Instead, I’d like to first explore the possibility that a fix was applied to the 4xx series that hasn’t been applied to 5xx yet.

It is also odd that it fails on 496.76. I don’t understand NVIDIA driver numbering schemes, so if someone can explain why 496.76 is what it is and not something like 47x, that would be useful. The same goes for explaining the differences between 4xx and 5xx.

Wen_Su · August 2, 2022, 12:41am

Vulkan driver team has started tracking this issue ID 3740049.

karlb7hzt · September 14, 2022, 7:42pm

The problem turned out to be in one of our shaders. The 5xx driver series introduces some optimizations that increase the parallelization, which exposed the need for additional barriers that we were missing. I’ve closed issue 3740049. Thanks to the team for helping us to narrow it down.

Topic		Replies	Views
SDSM tighting compute shader at driver 347.09 WHQL super slow and at old driver 344.75 super fast OpenGL	8	2572	February 11, 2015
Vulkan shader experiences a significant performance drop and issues with procedural noise with driver versions > 470 Vulkan	1	1480	September 10, 2022
Using for-loops in this compute shader crashes the Windows kernel driver Vulkan	5	3438	May 31, 2016
Compute Shader Performance Vulkan	11	8363	June 8, 2016
Linux-x86_64 drivers:346.46 GTX980 Compute shader bug persistent threads Linux	2	1330	April 2, 2015
Nvoglv64.dll crashes w/ "Access violation reading location" at vkCreateComputePipelines Vulkan	10	6289	September 15, 2022
Vulkan, shader bug, all shader that use fract (or mod) and floor broken on Nvidia Vulkan. (include tests) Linux	8	1311	February 21, 2021
Why is DirectCompute 2x faster than CUDA for my kernel? CUDA Programming and Performance	23	6624	November 11, 2010
RTX 5070 Ti nvgpucomp64.dll exception thrown when shader uses GL_NV_shader_sm_builtins Vulkan vulkan	0	22	August 27, 2025
Invalid VK_ERROR during compute pipeline creation on 4070 Vulkan vulkan	0	43	July 22, 2025

Compute Shader Problem in 5xx Drivers, but not in 47x Drivers

Related topics