[550.67] Nvidia Vulkan ICD wakes up dgpu on initialization and exit

The Vulkan ICD provided by Nvidia’s driver always wakes up the dgpu on hybrid graphic systems, even if the gpu chosen by the program is not the Nvidia dgpu.

When running a Vulkan program, its launch will be delayed for several seconds, while the card is resumed. Once the program launches, the card goes back to sleep after a while:

Exiting the program also requires the card to wake up, resulting in another delay:

This affects both x11 and wayland programs:

The only workaround is to disable the nvidia ICD, as seen here:


(Note that the nouveau_icd was also disabled in other tests, so it’s not relevant.)

Symptoms are very similar to those found in this egl-wayland issue, but it’s with the Vulkan implementation instead of EGL:


Here’s some additional system details:

  • Machine: Thinkpad P1 Gen 6
    • CPU: Intel Core i7-13800H
    • Graphics: Intel(R) Graphics (RPL-P), NVIDIA RTX 4000 Ada Generation Laptop GPU
  • OS: Fedora (Rawhide/41)
  • Nvidia Driver Version: 550.67 (open kernel module)
    • Note: NVreg_EnableS0ixPowerManagement is set to 1
  • Mesa Version: 24.03 (Also tested on git main, commit fcb568a5d5a52db75fa2f6d04579bb404ca7f597)
  • vulkaninfo output: vulkaninfo.txt (193.4 KB)

I’ve recorded a few videos showing the symptoms in more detail - will share if needed.

This issue is still present as of 550.78.

It currently isn’t that big of a deal, as most Vulkan programs desire to run in the dgpu anyways…

However, it will be extremely important in the near future as Vulkan starts being adopted by programs and libraries which value power efficiency over performance.

For example, GTK4 has started using its Vulkan renderer by default with v4.15 (testing), in order to determine if it’s ready for the next production version, v4.16.

WIth this bug, all GTK4 programs wake up the nvidia dgpu during their initialization. This causes several seconds of delay before a window appears and reduces battery life on laptops.

GTK’s developers will likely not hold off on migrating to the Vulkan renderer because of nvidia driver bugs, so fixing this issue should be considered a priority.

1 Like

I’ve written a small program which reproduces the issue:

#include <stdio.h>
#include <unistd.h>
#include <time.h>

#include <vulkan/vulkan.h>

int main (void)
{
  const VkInstanceCreateInfo vkInstanceInfo = {};
  VkInstance instance;

  struct timespec start, end;
  float delta_t;

  printf("calling vkCreateInstance...\n");

  clock_gettime(CLOCK_MONOTONIC_RAW, &start);
  vkCreateInstance(&vkInstanceInfo, NULL, &instance);
  clock_gettime(CLOCK_MONOTONIC_RAW, &end);

  delta_t = (end.tv_sec - start.tv_sec) + (float)(end.tv_nsec - start.tv_sec) / (1000 * 1000000);
  printf("vkCreateInstance done in %.3fs\n", delta_t);

  sleep(30);

  printf("Exiting...\n");
  return 0;
}

vkCreateInstance is the call which causes NVIDIA’s driver to wake up the gpu unnecessarily. vkEnumeratePhysicalDevices doesn’t appear to be affected.

Here’s an example of how output looks like:

Running the program with nvidia_icd disabled works as expected. vkCreateInstance returns near-instantly.

Running the program normally adds a delay of around 2 seconds to vkCreateinstance. This is consistent with the time it takes for the dgpu to go into D0.

using gdb to debug reveals that the gpu wake up happens inside of a chain of functions called by nvidia’s vk_icdNegotiateLoaderICDInterfaceVersion.

Some of the IOCTLs sent by nvidia’s userspace appear to wake up the dgpu. The first three IOCTLs do not wake up the gpu, however, the fourth (and some after) do.

Still occurs as of 555.42.02.

Still occurs as of 555.52.04.

Issue persists as of 560.28.03.

Hi @jrelvas
Apoligies for the late response, I have filed a bug 4770124 internally for tracking purpose.

Hi @jrelvas
I performed tests on couple of notebooks, but it doesn’t seem like I am having exact repro locally.
Could you please help to share repro videos and nvidia bug report for my reference.

Of course!

Here’s a screen recording of the issue in question:


Left terminal shows the repro program’s output, while the right terminal shows the power state of the nvidia gpu. You’ll notice that the dgpu wakes up as soon as a Vulkan instance is created and that there’s an associated delay.

Make sure to perform this test when the GPU’s already at d3cold, otherwise you won’t be able to see it in effect.

Here’s a tarball containing the source of the repro program:
nvidia-wakeup-repro.c.tar.gz (1.0 KB)

And here’s the nvidia bug report log you’ve asked for:
nvidia-bug-report.log.gz (424.2 KB)

Thanks @jrelvas for sharing new source code, I was able to repro reported issue internally.
Engineering team will further review it now.

1 Like

Most Vulkan applications begin by enumerating all devices in the system and selecting one or more based on their capabilities. Currently, the NVIDIA driver must power on GPUs to discover their capabilities during this enumeration phase. The engineering team is investigating methods to perform these operations without powering on the GPU, but we cannot commit to an ETA for such a solution at this time.

1 Like