[550.67] Nvidia Vulkan ICD wakes up dgpu on initialization and exit

The Vulkan ICD provided by Nvidia’s driver always wakes up the dgpu on hybrid graphic systems, even if the gpu chosen by the program is not the Nvidia dgpu.

When running a Vulkan program, its launch will be delayed for several seconds, while the card is resumed. Once the program launches, the card goes back to sleep after a while:

Exiting the program also requires the card to wake up, resulting in another delay:

This affects both x11 and wayland programs:

The only workaround is to disable the nvidia ICD, as seen here:


(Note that the nouveau_icd was also disabled in other tests, so it’s not relevant.)

Symptoms are very similar to those found in this egl-wayland issue, but it’s with the Vulkan implementation instead of EGL:


Here’s some additional system details:

  • Machine: Thinkpad P1 Gen 6
    • CPU: Intel Core i7-13800H
    • Graphics: Intel(R) Graphics (RPL-P), NVIDIA RTX 4000 Ada Generation Laptop GPU
  • OS: Fedora (Rawhide/41)
  • Nvidia Driver Version: 550.67 (open kernel module)
    • Note: NVreg_EnableS0ixPowerManagement is set to 1
  • Mesa Version: 24.03 (Also tested on git main, commit fcb568a5d5a52db75fa2f6d04579bb404ca7f597)
  • vulkaninfo output: vulkaninfo.txt (193.4 KB)

I’ve recorded a few videos showing the symptoms in more detail - will share if needed.

This issue is still present as of 550.78.

It currently isn’t that big of a deal, as most Vulkan programs desire to run in the dgpu anyways…

However, it will be extremely important in the near future as Vulkan starts being adopted by programs and libraries which value power efficiency over performance.

For example, GTK4 has started using its Vulkan renderer by default with v4.15 (testing), in order to determine if it’s ready for the next production version, v4.16.

WIth this bug, all GTK4 programs wake up the nvidia dgpu during their initialization. This causes several seconds of delay before a window appears and reduces battery life on laptops.

GTK’s developers will likely not hold off on migrating to the Vulkan renderer because of nvidia driver bugs, so fixing this issue should be considered a priority.

1 Like

I’ve written a small program which reproduces the issue:

#include <stdio.h>
#include <unistd.h>
#include <time.h>

#include <vulkan/vulkan.h>

int main (void)
{
  const VkInstanceCreateInfo vkInstanceInfo = {};
  VkInstance instance;

  struct timespec start, end;
  float delta_t;

  printf("calling vkCreateInstance...\n");

  clock_gettime(CLOCK_MONOTONIC_RAW, &start);
  vkCreateInstance(&vkInstanceInfo, NULL, &instance);
  clock_gettime(CLOCK_MONOTONIC_RAW, &end);

  delta_t = (end.tv_sec - start.tv_sec) + (float)(end.tv_nsec - start.tv_sec) / (1000 * 1000000);
  printf("vkCreateInstance done in %.3fs\n", delta_t);

  sleep(30);

  printf("Exiting...\n");
  return 0;
}

vkCreateInstance is the call which causes NVIDIA’s driver to wake up the gpu unnecessarily. vkEnumeratePhysicalDevices doesn’t appear to be affected.

Here’s an example of how output looks like:

Running the program with nvidia_icd disabled works as expected. vkCreateInstance returns near-instantly.

Running the program normally adds a delay of around 2 seconds to vkCreateinstance. This is consistent with the time it takes for the dgpu to go into D0.

using gdb to debug reveals that the gpu wake up happens inside of a chain of functions called by nvidia’s vk_icdNegotiateLoaderICDInterfaceVersion.

Some of the IOCTLs sent by nvidia’s userspace appear to wake up the dgpu. The first three IOCTLs do not wake up the gpu, however, the fourth (and some after) do.

Still occurs as of 555.42.02.

Still occurs as of 555.52.04.