Crash with bindless textures inside the nvoglv64.dll driver

We have an application that uses bindless textures in conjunction with indirect rendering.

We aggregate multiple (mesh) models into a single buffer (for indirect rendering) and use bindless texture addresses stored in an array (SSBO) to look up index into the texture(s) of an instance of the model in the shaders.

We have a large number of models’ instances at a time & consequently the texture memory footprint is quite high (but not always near the limits of the vram)

We are consistently getting crashes with this. That is, until we disable the code to make the textures resident (all other things being the same (we store 0s in the addresses for the shader look up in that case, so the shader does not access invalid memory locations)

Any ideas/hints on how to debug/fix/handle this or somehow be warned that there is going to be crash or any other tips to debug what might be going on? We tried using debugging callbacks, etc, but didn’t find anything untoward.

1 Like
  • Where is it crashing? (Suggest posting the driver stack trace and/or the exception details from the Event Log.)
  • Are you also using bindless buffers? 1. Updated with map/unmap? Try subdata instead. Have hit a driver crash there. NV provided a fix. 2. Using CUDA? Avoid bindless buffer use with it.
  • Have you checked checked NVX_gpu_memory_info (or a tool that reports it) to verify that you’re not evicting?
  • Any correlation with amount of GPU mem? Have seen driver crashes on 8GB GPUs (GTX 1080 non-Ti) with lots of bindless mem consumption. “error code: 8” / TDR. Seems like the map/unmap fix resolved this.

-It usually crashes in sub buffer data (buffer is legitimate, bounds are legitimate)
-We do use bindless buffers, but we can get crashes even with taking them out
-There is a correlation between memory usage and the crash (the large textures makes it more easy to crash. loads of smaller textures can also make it happen)
-Its unclear your points 2 & 4: you mean with bindless buffers we should use buffer sub data or nvidia provided a fix for map/unmap? where was the fix provided? Their updated driver?
-basically our main issue is that if there is high memory, there should be some more error/information that its going to crash (GL errors, warnings, gl debugging), we have them all over the place and check at each turn/operation. There is no indication of anything untoward

Sorry I didn’t see your reply sooner.

The latter. It had something to do with our use of orphaning Map/UnmapBuffer. The crashes were usually in the UnmapBuffer. NVIDIA got back to us and said that what we were doing was reasonable. It was just a driver bug.

The workaround was to use SubData. That didn’t crash. Also, as a short-term workaround, NVIDIA provided a driver setting tweak so that our use of Map/UnmapBuffer didn’t crash, but said that the fix would be rolled into the driver at some point. I don’t know which version. This was at least a year ago, so I suspect it’s in the mainline driver now.

However, since your crash is in SubData and not Map/UnmapBuffer, then I don’t think this is relevant to your problem.

Re “we have [GL error checking] all over the place”, you might try plugging in a GL debug callback:

Besides obviating the need for sprinkling glGetError() + checks + logging logic everywhere, it additional provides performance warnings, usage tips, and general information that can be useful in diagnosing problems… Before some driver crashes, I’ve seen useful messages provided to the application using this logging mechanism (including GL_OUT_OF_MEMORY errors).