I’ve experienced an NVIDIA driver crash on several Windows-10-powered PCs in an SDL2/Vulkan-based game that my company is developing and have been able to reproduce it almost every time under the following conditions:
Plug in an additional HDMI display device before starting your application, and set the display behaviour to Duplicate to have both of your display devices display the same application.
Start any Vulkan-based application in fullscreen-exclusive mode (i.e. one that creates its window with the WS_POPUP | WS_CLIPSIBLINGS | WS_CLIPCHILDREN style, and calls ChangeDisplaySettings or ChangeDisplaySettingsEx with the CDS_FULLSCREEN parameter – setting the new resolution to be different from your desktop resolution maximizes the odds to reproduce the issue in our tests).
This issue cannot be reproduced if you start the application in windowed mode or borderless windowed mode.
In order to avoid writing a Vulkan application from scratch just for testing, you may consider using one of the Vulkan examples kindly provided by Sascha Willems at https://github.com/SaschaWillems/Vulkan: passing the --fullscreen command-line argument starts the application in fullscreen-exclusive mode, and the -width and the -height command-line arguments can be used to specify the width and height of your window. The gears example was a good one for us to reproduce the issue.
Unplug the external display device you plugged in earlier.
At this point, depending on your hardware and configuration, different things can happen:
On my ASUS mobile gaming PC with a GTX 980M GPU and a G-sync 1920×1080 embedded monitor, this causes the application to become very choppy for about 2 seconds, and then freezes and causes a driver crash (not a BSOD, just a black screen followed by a heavily-corrupted image on the remaining display device). The driver is then restarted by Windows 10, although sometimes, this causes Windows 10 to just reboot. This behavior did not depend on the NVIDIA driver version (latest test was performed on the 441.87 version, if this is any relevant).
When the driver is done restarting, most Vulkan functions called by the application return VK_ERROR_DEVICE_LOST, and attempts to recover properly are always unsuccessful: trying to tear down and then recreate the device causes the following to be displayed by the Vulkan validation layers:
terminator_CreateDevice: Failed in ICD C:\WINDOWS\System32\DriverStore\FileRepository\nvami.inf_amd64_039a3b72bf87b399\.\nvoglv64.dll vkCreateDevicecall
vkCreateDevice: Failed to create device chain.
I've also tried to completely tear down the Vulkan context (including the VkInstance
) and then create an OpenGL context when this occurred, and this caused our application to crash upon calling any glXXX
On my colleague's desktop PC with a GTX 650 Ti (driver version: 432.00) and a (rather old, non G-sync) 1360×768 screen, neither the application nor the driver will crash, but this causes the remaining display to flicker, alternating between a completely black screen and the expected output of the application. If your application is able to toggle between fullscreen-exclusive mode and windowed mode with a shortcut such as Alt+Tab, then using this shortcut may put this flickering phenomenon to an end, restoring the application to a "normal" state.
Our internal logs showed that the crash/unexpected behavior occurs in the vkQueuePresentKHR function (time logs indeed showed that 2-10 seconds are spent inside this function when the issue occurs), but the latest Vulkan SDK’s validation layers (version 1.1.30) do not output any error message, apart from the vkCreateDevice error message above.
We tried to reproduce this issue with an OpenGL context instead of a Vulkan context, and all of our attempts have failed: unplugging the external display device does not cause this issue.
We also tried to reproduce it with a Surface Pro 4 tablet PC (which only has an Intel iGPU), both with Vulkan and OpenGL, but the issue didn’t show up in this case either.
All of the former have led me to believe this is an NVIDIA driver bug that is specific to Vulkan (maybe a presentation engine bug?).
Also, not sure how relevant this is, but our tests were performed using a Samsung 4K TV as the external display device, and the present mode we used was VK_PRESENT_MODE_FIFO_KHR.
Looking forward to your replies!