System
| GPU | NVIDIA GeForce RTX 5070 Ti (GB203, Blackwell / GB200 architecture) |
| OS | Windows 10 Pro 22H2 (build 19045) |
| Drivers tested | GRD 591.74 (WDDM 30.0.15.1174), GRD 596.36 (WDDM 32.0.15.9636) |
| API | Direct3D 12, Feature Level 12_2 |
| D3D12 SDK | Windows SDK 10.0.22621 |
Overview
We are developing a custom game engine editor built on top of D3D12. After upgrading our development machines to RTX 50-series hardware we hit two separate, reproducible D3D12 bugs that do not occur on any other hardware we tested (RTX 30xx, RTX 40xx, AMD RX 7900 XTX, Intel Arc A770). Both bugs are present on the two latest Game Ready Drivers.
Bug 1 — CreateGraphicsPipelineState returns DXGI_ERROR_INVALID_CALL (0x887A0001) during a ~2–3 second window after swapchain creation
Immediately after calling IDXGISwapChain::Present for the first time on a freshly created DXGI_SWAP_EFFECT_FLIP_DISCARD swapchain, any call to ID3D12Device::CreateGraphicsPipelineState or ID3D12Device::CreateRootSignature returns 0x887A0001 (DXGI_ERROR_INVALID_CALL) for approximately 2–3 seconds.
Critically:
-
ID3D12Device::GetDeviceRemovedReason()returnsS_OKduring this entire window — the device is not removed. -
After the window passes, identical PSO creation calls succeed without changes.
-
This is not affected by validation layer settings.
-
Reproduced with
D3D12_PIPELINE_STATE_STREAM_DESCand the legacyD3D12_GRAPHICS_PIPELINE_STATE_DESCpath.
Impact: Any resource that must be created during or shortly after swapchain initialization — ImGui’s root signature, VS/PS pipeline state — fails silently. GetDeviceRemovedReason() == S_OK makes this look like a recoverable soft-failure but the PSO handle is never written, leaving nullptr in the pipeline slot. All subsequent draw calls using that PSO are silently discarded.
Workaround we implemented: Defer all PSO creation until 3+ frames have been presented (staggered bootstrap Present loop), then retry creation. This works but should not be necessary.
Expected behavior: DXGI_ERROR_INVALID_CALL is documented as indicating an invalid API call (wrong parameter, wrong state). It must not be returned when the device is alive and the call parameters are valid. If the driver is in a transient internal state where PSO creation is not yet possible, a different mechanism (e.g. DXGI_ERROR_WAS_STILL_DRAWING or a retry-able error) should be used, or creation should block until ready.
Bug 2 — CreateComputePipelineState triggers a GPU TDR (DXGI_ERROR_DEVICE_HUNG) for non-trivial compute shaders
Calling ID3D12Device::CreateComputePipelineState with compute shaders that contain non-trivial loop bodies (e.g. atmospheric scattering LUT — ~200 ALU instructions per thread, 64 thread groups) causes a GPU TDR. The OS GPU scheduler fires DXGI_ERROR_DEVICE_HUNG after the default 2-second TDR timeout.
Observations:
-
This happens regardless of FXC optimization level — both
/O1and/O3trigger the TDR. -
FXC compilation itself completes immediately and produces valid DXIL. The hang occurs when the driver’s on-GPU ISA compiler processes the DXIL blob, not in the CPU-side FXC step.
-
The TDR fires consistently and reproducibly, not intermittently.
-
Confirmed on GRD 591.74 and GRD 596.36 — the ISA compiler on the GB200 architecture appears to require more than 2 seconds for complex compute kernels.
-
Simpler compute shaders (trivial copy, reduction) compile without issue.
Impact: Any non-trivial compute pass (post-process, physically-based sky, temporal effects) cannot be initialized at startup. The device is fully removed and the entire application must restart.
Workaround we implemented: Skip compute PSO creation entirely on Blackwell + D3D12 for known-heavy shaders, with a mid-fence Present heartbeat every 1.5 seconds to prevent the OS TDR watchdog from firing during the blocking ISA compile. This is fragile and defeats the purpose of GPU-side compilation.
Expected behavior: The driver’s ISA compiler should complete within the Windows TDR budget (2 seconds) for all valid DXIL inputs, or the driver should implement background/async PSO compilation so the calling thread is not blocked for longer than the TDR timeout. ID3D12Device5::CreatePipelineState with an async compile flag would be the clean solution here.
Current status and ask
We have switched our editor to the Vulkan backend as a workaround, where neither issue appears. However, D3D12 is required for NVIDIA Streamline / DLSS / DLSS-G integration, so staying on Vulkan is a long-term limitation for Blackwell users.
We are asking:
-
Is there a known-fixed driver version for either of these issues?
-
Is there a recommended workaround from NVIDIA for Bug 1 (PSO transient) that does not require a fixed number of warm-up frames?
-
Is async PSO compilation (
CreatePipelineStatenon-blocking) planned for a near-term driver update on Blackwell?
Happy to provide a minimal repro project for either issue.