RTX PRO 4000 & PRO 2000 Blackwell - Periodic 102ms Inference Latency Spike (Windows) - Critical for Real-Time Applications

Hamdouch · January 13, 2026, 2:22pm

We are experiencing a consistent and reproducible 102-103ms inference latency spike that occurs periodically (in cycles, not at a fixed time interval or fixed number of images) on NVIDIA RTX PRO 4000 Blackwell and RTX PRO 2000 Blackwell GPUs running Windows with WDDM driver mode.

CRITICAL: The exact same code, same PC, same Windows configuration runs perfectly without any spikes on RTX Ada PRO 4000 and RTX Ada PRO 2000. This confirms the issue is specific to the Blackwell architecture/driver.

This issue makes Blackwell Pro GPUs unsuitable for real-time computer vision applications in our project !

System Configuration

Hardware

GPU Tested (WITH ISSUE):
- NVIDIA RTX PRO 4000 Blackwell ❌ Issue present
- NVIDIA RTX PRO 2000 Blackwell (16GB VRAM) ❌ Issue present
GPU Tested (NO ISSUE):
- NVIDIA RTX Ada PRO 4000 ✅ Works perfectly
- NVIDIA RTX Ada PRO 2000 ✅ Works perfectly
Driver Model: WDDM
Display: Connected (Disp.A = On)
ECC: Off

Software

Driver Version: 581.80
CUDA Version: 12.8
TensorRT Version: 10.8.0.43
Operating System: Windows
Inference Framework: YOLO object detection model (TensorRT optimized)

nvidia-smi Output (RTX PRO 2000 Blackwell)

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 581.80                 Driver Version: 581.80         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX PRO 2000 Blac...  WDDM  |   00000000:01:00.0  On |                  Off |
| 30%   45C    P0             18W /   70W |     943MiB /  16311MiB |      8%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Problem Description

Normal Behavior

Inference time: 1-2ms per image (99.5% of all inferences)
GPU frequency locked at 2000-2200 MHz
Performance state: P0

Abnormal Behavior

Periodically (in cycles - NOT at a fixed time interval, NOT at a fixed number of images)
A single inference suddenly takes 102-103ms (approximately 100x slower than normal)
Immediately returns to normal 1-2ms after the spike
Pattern is reproducible on both RTX PRO 4000 and PRO 2000 Blackwell
Does NOT occur on RTX Ada PRO 4000 or Ada PRO 2000 on the same system

Statistical Analysis (10,274 samples on RTX PRO 2000 Blackwell)

Metric	Value
Total Samples	10,274
Min	1 ms
Max	118 ms
Mean	2.04 ms
Median	2.0 ms
Std Dev	6.91 ms

Distribution of Inference Times

Time Range	Count	Percentage
0-5 ms	10,222	99.49%
5-10 ms	0	0%
10-15 ms	2	0.02%
15-20 ms	2	0.02%
35-40 ms	1	0.01%
100-105 ms	44	0.43%
110-120 ms	3	0.03%

Breakdown of Spikes > 100ms (47 total samples)

Inference Time	Count
102 ms	23
103 ms	20
104 ms	1
114 ms	1
116 ms	1
118 ms	1

Critical Comparison: Blackwell vs Ada (Same PC, Same Code, Same Configuration)

GPU	Architecture	Periodic 102ms Spike Issue
RTX PRO 4000 Blackwell	Blackwell	❌ YES - Issue Present
RTX PRO 2000 Blackwell	Blackwell	❌ YES - Issue Present
RTX Ada PRO 4000	Ada Lovelace	✅ NO - Works Perfectly
RTX Ada PRO 2000	Ada Lovelace	✅ NO - Works Perfectly

This is definitive proof that the issue is specific to the Blackwell driver/architecture, not our code, Windows configuration, or inference pipeline.

Impact on Real-Time Applications

Why This Is Critical

This periodic 102ms spike makes Blackwell Pro GPUs unsuitable for real-time applications, including:

Industrial Machine Vision: Production line inspection systems cannot tolerate 100ms frame drops
Robotics: Real-time object detection for robot guidance requires consistent latency
Autonomous Systems: Vehicle/drone perception systems need guaranteed frame timing
Medical Imaging: Real-time surgical guidance and diagnostic systems
Security & Surveillance: Real-time threat detection systems
Quality Control: High-speed manufacturing inspection

Real-World Consequences

At 30 FPS, a 102ms spike causes 3+ frames to be dropped
Unpredictable timing makes synchronization with other systems impossible
Cannot guarantee SLA/response time requirements
Forces customers to stay on Ada architecture instead of upgrading to Blackwell

What We Have Already Tried

Power Management

Set Windows Power Plan to “High Performance”
NVIDIA Control Panel → Power Management Mode → “Prefer Maximum Performance”
Disabled PCI Express Link State Power Management
Enabled GPU Persistence Mode (nvidia-smi -pm 1)

GPU Configuration

Locked GPU frequency at 2000-2200 MHz (nvidia-smi -lgc 2000,2200)
Verified GPU stays in P0 performance state
Checked ECC is disabled
Tested with display connected and disconnected

System Optimization

Disabled Hardware-Accelerated GPU Scheduling
Stopped NVIDIA Telemetry services
Closed all unnecessary applications
Tested with minimal background processes
Disabled Windows Defender real-time monitoring

Driver

Using latest driver 581.80
Clean driver installation (DDU)

None of these solutions resolved the issue.

Hypothesis

The consistent ~102ms spike pattern suggests this is caused by:

WDDM driver internal housekeeping/command buffer flush - Windows Display Driver Model may be forcing periodic GPU synchronization for display composition, with different behavior on Blackwell vs Ada
Blackwell-specific driver scheduler - The Blackwell driver may have a different internal scheduling or memory management mechanism that causes periodic stalls
TCC mode unavailable - The RTX PRO 4000/2000 Blackwell do not support TCC driver mode, which would bypass WDDM entirely
New Blackwell power management - Blackwell architecture may have different P-state transition behavior causing these spikes

Request to NVIDIA

Can NVIDIA confirm if this is a known issue with Blackwell Pro series drivers in WDDM mode on Windows?
Is there a driver update planned to address this periodic latency spike?
Are there any additional workarounds or registry settings specific to Blackwell that might help?
Can TCC mode support be added to RTX PRO 4000/2000 Blackwell in future drivers?
What is the root cause of this ~102ms spike that does not occur on Ada architecture?

Environment Details for Reproduction

GPUs with issue:
  - NVIDIA RTX PRO 4000 Blackwell
  - NVIDIA RTX PRO 2000 Blackwell

GPUs without issue (same PC):
  - NVIDIA RTX Ada PRO 4000
  - NVIDIA RTX Ada PRO 2000

Driver: 581.80
CUDA: 12.8
TensorRT: 10.8.0.43
OS: Windows (WDDM mode)
Workload: Continuous YOLO inference loop (TensorRT optimized)

Test Duration: Extended continuous operation
Expected: Consistent 1-2ms inference time
Actual: Periodic 102-103ms spike (in cycles, not fixed interval)

rs277 · January 14, 2026, 1:14am

I don’t expect it to make a difference in the behaviour you’re seeing, but I wonder if you’ve tried compiling for sm_120 on a suitable version of Cuda?

Your summary shows you’re using Cuda 11.8. Blackwell support did not start until Cuda 12.8.

Hamdouch · January 14, 2026, 12:09pm

Hi, thank you for your participation on this topic. Yes, indeed, I made a mistake about the CUDA version. I am actually using CUDA 12.8. Otherwise, Ithink CUDA 11.8 does not support the Blackwell GPU architecture.

rs277 · January 14, 2026, 8:36pm

I note you have, “ Disabled Hardware-Accelerated GPU Scheduling”.

In a similar post to the Nsight Systems forum, this is recommended to be enabled for optimal use of WDDM.

It’s odd that TCC mode is not available on this class of card, although I’ve only seen it mentioned regarding the Server version of the PRO 6000.

Topic		Replies	Views
RTX PRO 4000 & PRO 2000 Blackwell - Periodic 102ms Inference Latency Spike (Windows) - Critical for Real-Time Applications GPU - Hardware	0	50	January 13, 2026
Strange CNN inference latency behavior with CUDA and TensorRT TensorRT cuda	13	1645	January 24, 2024
Inference time mismatch between same configuration on Windows and Ubuntu TensorRT tensorrt , windows-driver	2	738	September 27, 2023
"System interrupts" process loads two CPU cores at 100% after launching CUDA aplication, only with RTX. General	2	1086	May 7, 2019
10x slowdowns on simple CUDA kernels when upgraded to 2060 RTX CUDA Programming and Performance	6	962	January 31, 2020
Xid 119 GSP Timeout on RTX 6000 Pro Blackwell (575.64.3) under Load – Reproducible Crash Linux	4	862	November 14, 2025
CUDA execution multiples of 16ms CUDA Programming and Performance	14	2266	May 30, 2015
RTX Pro 6000 Blackwell - Wrong temperature thresholds causes SW Power Cap, GPU throttled to 510MHz (1/6 performance) General Topics and Other SDKs	1	50	March 5, 2026
Quadro RTX 8000 Multi-GPU Performance Issue CUDA Programming and Performance	13	1526	March 8, 2025
"Display driver stopped responding and has recovered" WDDM Timeout Detection and Recovery CUDA Programming and Performance	19	160721	February 4, 2012