vkDeviceWaitIdle returns VK_ERROR_DEVICE_LOST after writing to shader binding table with DEVICE_LOCAL memory allocated with CAPTURE_REPLAY flags

This issue was initially triggered using a capture/replay tool. The tool normally adds the capture replay flags for relevant buffers/memory and records their handles to be used at replay time.
The repro case attached and described below was created by modifying an application artificially adding the capture replay flags.

Reproduction steps:

  • Create ray tracing pipeline with 3 shader groups.
  • Retrieve shader group handles for this pipeline using vkGetRayTracingShaderGroupHandlesKHR:
    std::vector<uint8_t> unalignedSbt(raytracingPipelineProperties.shaderGroupHandleSize * 3); // 96 bytes vkGetRayTracingShaderGroupHandlesKHR(device, pipeline, 0, 3, raytracingPipelineProperties.shaderGroupHandleSize * 3, unalignedSbt.data());
  • Create shader binding table large enough to store raytracingPipelineProperties.shaderGroupHandleSize * 3 - this ends up being 192 bytes on my platform.
  • Retrieve memory requirements for such a buffer - this ends up being 256 bytes on my platform.
  • Allocate memory large enough for the shader binding table with VkMemoryAllocateInfo-flags including VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_BIT | VK_MEMORY_ALLOCATE_DEVICE_ADDRESS_CAPTURE_REPLAY_BIT. The allocation is made using a memory type which supports VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT | VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT.
  • Map memory with VK_WHOLE_SIZE (256 bytes).
  • Copy shader group handles to mapped memory:
    • mapped = vkMapMemory(VK_WHOLE_SIZE);
    • std::vector<uint8_t> sbtData(256);
    • for (uint32_t i = 0; i < 3; i++) {
      • memcpy(sbtData.data() + i * raytracingPipelineProperties.shaderGroupBaseAlignment, unalignedSbt.data() + i * raytracingPipelineProperties.shaderGroupHandleSize, raytracingPipelineProperties.shaderGroupHandleSize);
    • }
    • memcpy(mapped, sbtData.data(), 256);
  • unmap
  • later submit a command buffer which issues a vkCmdTraceRaysKHR command
  • call vkDeviceWaitIdle

Attached reproducer can be built with cmake:
cd VulkanHelloRayTracing
mkdir build
cd build
cmake -GNinja -DCMAKE_BUILD_TYPE=Debug …
cmake --build .

run the binary:

Expected result:
vkDeviceWaitIdle returns VK_SUCCESS and application exits cleanly.

Actual result:
vkDeviceWaitIdle returns VK_ERROR_DEVICE_LOST causing an assert to be triggered.

This VK_ERROR_DEVICE_LOST issue can be worked around by:
1 - Removing capture replay flags from buffer + memory allocation.
2 - Allocating memory from a memory type which doesn’t support VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT
3 - Only writing to 192 bytes of the mapped sbt memory

Each of these “workarounds” are highlighted in the repro sample VulkanHelloRayTracing/src/threads/0/thread0_frame0.cpp (i.e. search for // 1, // 2, // 3)

Although the reproduction case is fairly combinationally heavy it could be expected to happen when using other capture/replay tools. It was also fairly tricky to isolate the combination of factors involved.

Operating system and platform details:
Ubuntu 22.04
NVIDIA GeForce RTX 2060

Tested driver versions:

Driver versions which suffer from this VK_ERROR_DEVICE_LOST issue:

  • 515.49.14 (Vulkan beta)
  • 515.65.01 - Production/tested

Driver versions which DO NOT suffer from this VK_ERROR_DEVICE_LOST issue: