Vulkan on nSight : 0xDCDCDCDC pop everywhere

Hi !

I have a hard problem to solve with my program : i randomly have a VK_ERROR_DEVICE_LOST, after few seconds or minutes, and it seems to happen only on RTX cards, not GTX. (i use only raster + compute shaders with vulkan 1.2 standart extensions, nothing too modern like raytracing)

I use nsight to help me finding an idea to fix that, but it’s worst !

On nsight, who usualy works very well, i have new problems :

Part 1 : before capturing a frame :

  • some triangles bugs, drawing to point looks like infinity
  • objects disapear randomly
  • sometimes, all odd or even frames render full gray
    (Of course, nothing of that happen if i do not link nSight, i will never accept such thing !)

Part 2 : after capturing a frame :

  • data seems corrupts randomly
  • sometime freeze (and crash ?) nSight

For exemple, GUI things are created once, and never update. nSight tell me that my vertex can be huge value, value when are displayed in Hexadecimal mode show 0xdcdcdcdc. But not all the buffer is like that ! only few parts…
Sometimes small index value (for a single quad : {0,1,2, 0,2,3}) become huge, and i suppose GPU make out of bound memory data access…

I am sure of the rightness of my data : they can’t be like that !
any idea of that problem ?


Win10 64 bits (up to date)
RTX2080 driver 471.41 (up to date)
Vulkan 1.2.182.0 (last official version)
nSight 2021.3.1 (last version)

I use VMA (Vulkan memory allocator 2.3), and each time i send a vulkan command, i run vmaCheckCorruption() to check corruption on boundarys of all my allocations, and it never return an error, so i suppose that my code is good for that part, and i suppose i don’t do buffer write out of bounds

Tests of the day :
On 2 differents RTX cards, my program randomly crash, and weirdfully bug in nsight session.
On 2 differents GTX cards, my program doesn’t look to crash, but it instant crash when i clic on “Launch Frame Debugger”

All drivers where updated to 471.41 with last nSight 2021.3.1

I still do not have found the issue.
I have disabled all vulkan code, except one thing : a copy of data (just a few triangles for drawing letters, displaying current FPS)

this is the very last “write” command i send, others command will be only 2 remaining :

  1. draw that letters
  2. swapbuffers

I try with renderdoc, my data are broken too. I don’t understand anything ! Why does it works without debuggers ?

Hello,

Thank you for your feedback on Nsight Graphics and sorry you ran into these issues. I’ll discuss with the engineering team and get back to you.

Regards,

Hello !

I have just found the issue !

With VMA : (Vulkan Memory Allocator) when i create the stagging buffer, i had to create with VMA_MEMORY_USAGE_CPU_ONLY and not VMA_MEMORY_USAGE_CPU_TO_GPU. Documentation say that it garantie a memory type HOST_COHERENT and DEVICE_COHERENT. I didn’t know that such memory location can’t be both when it is not pure GPU resident… i don’t understand how that works very well…

But… it works, my data are correctly uploaded.
The strangest thing ? I am writing this program since 3 years at least, how a such problem could only appear now !? Mystery !