We are experiencing a consistent and reproducible 102-103ms inference latency spike that occurs periodically (in cycles, not at a fixed time interval or fixed number of images) on NVIDIA RTX PRO 4000 Blackwell and RTX PRO 2000 Blackwell GPUs running Windows with WDDM driver mode.
CRITICAL: The exact same code, same PC, same Windows configuration runs perfectly without any spikes on RTX Ada PRO 4000 and RTX Ada PRO 2000. This confirms the issue is specific to the Blackwell architecture/driver.
This issue makes Blackwell Pro GPUs unsuitable for real-time computer vision applications in our project !
System Configuration
Hardware
-
GPU Tested (WITH ISSUE):
-
NVIDIA RTX PRO 4000 Blackwell ❌ Issue present
-
NVIDIA RTX PRO 2000 Blackwell (16GB VRAM) ❌ Issue present
-
-
GPU Tested (NO ISSUE):
-
NVIDIA RTX Ada PRO 4000 ✅ Works perfectly
-
NVIDIA RTX Ada PRO 2000 ✅ Works perfectly
-
-
Driver Model: WDDM
-
Display: Connected (Disp.A = On)
-
ECC: Off
Software
-
Driver Version: 581.80
-
CUDA Version: 12.8
-
TensorRT Version: 10.8.0.43
-
Operating System: Windows
-
Inference Framework: YOLO object detection model (TensorRT optimized)
nvidia-smi Output (RTX PRO 2000 Blackwell)
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 581.80 Driver Version: 581.80 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX PRO 2000 Blac... WDDM | 00000000:01:00.0 On | Off |
| 30% 45C P0 18W / 70W | 943MiB / 16311MiB | 8% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
Problem Description
Normal Behavior
-
Inference time: 1-2ms per image (99.5% of all inferences)
-
GPU frequency locked at 2000-2200 MHz
-
Performance state: P0
Abnormal Behavior
-
Periodically (in cycles - NOT at a fixed time interval, NOT at a fixed number of images)
-
A single inference suddenly takes 102-103ms (approximately 100x slower than normal)
-
Immediately returns to normal 1-2ms after the spike
-
Pattern is reproducible on both RTX PRO 4000 and PRO 2000 Blackwell
-
Does NOT occur on RTX Ada PRO 4000 or Ada PRO 2000 on the same system
Statistical Analysis (10,274 samples on RTX PRO 2000 Blackwell)
| Metric | Value |
|---|---|
| Total Samples | 10,274 |
| Min | 1 ms |
| Max | 118 ms |
| Mean | 2.04 ms |
| Median | 2.0 ms |
| Std Dev | 6.91 ms |
Distribution of Inference Times
| Time Range | Count | Percentage |
|---|---|---|
| 0-5 ms | 10,222 | 99.49% |
| 5-10 ms | 0 | 0% |
| 10-15 ms | 2 | 0.02% |
| 15-20 ms | 2 | 0.02% |
| 35-40 ms | 1 | 0.01% |
| 100-105 ms | 44 | 0.43% |
| 110-120 ms | 3 | 0.03% |
Breakdown of Spikes > 100ms (47 total samples)
| Inference Time | Count |
|---|---|
| 102 ms | 23 |
| 103 ms | 20 |
| 104 ms | 1 |
| 114 ms | 1 |
| 116 ms | 1 |
| 118 ms | 1 |
Critical Comparison: Blackwell vs Ada (Same PC, Same Code, Same Configuration)
| GPU | Architecture | Periodic 102ms Spike Issue |
|---|---|---|
| RTX PRO 4000 Blackwell | Blackwell | ❌ YES - Issue Present |
| RTX PRO 2000 Blackwell | Blackwell | ❌ YES - Issue Present |
| RTX Ada PRO 4000 | Ada Lovelace | ✅ NO - Works Perfectly |
| RTX Ada PRO 2000 | Ada Lovelace | ✅ NO - Works Perfectly |
This is definitive proof that the issue is specific to the Blackwell driver/architecture, not our code, Windows configuration, or inference pipeline.
Impact on Real-Time Applications
Why This Is Critical
This periodic 102ms spike makes Blackwell Pro GPUs unsuitable for real-time applications, including:
-
Industrial Machine Vision: Production line inspection systems cannot tolerate 100ms frame drops
-
Robotics: Real-time object detection for robot guidance requires consistent latency
-
Autonomous Systems: Vehicle/drone perception systems need guaranteed frame timing
-
Medical Imaging: Real-time surgical guidance and diagnostic systems
-
Security & Surveillance: Real-time threat detection systems
-
Quality Control: High-speed manufacturing inspection
Real-World Consequences
-
At 30 FPS, a 102ms spike causes 3+ frames to be dropped
-
Unpredictable timing makes synchronization with other systems impossible
-
Cannot guarantee SLA/response time requirements
-
Forces customers to stay on Ada architecture instead of upgrading to Blackwell
What We Have Already Tried
Power Management
-
Set Windows Power Plan to “High Performance”
-
NVIDIA Control Panel → Power Management Mode → “Prefer Maximum Performance”
-
Disabled PCI Express Link State Power Management
-
Enabled GPU Persistence Mode (
nvidia-smi -pm 1)
GPU Configuration
-
Locked GPU frequency at 2000-2200 MHz (
nvidia-smi -lgc 2000,2200) -
Verified GPU stays in P0 performance state
-
Checked ECC is disabled
-
Tested with display connected and disconnected
System Optimization
-
Disabled Hardware-Accelerated GPU Scheduling
-
Stopped NVIDIA Telemetry services
-
Closed all unnecessary applications
-
Tested with minimal background processes
-
Disabled Windows Defender real-time monitoring
Driver
-
Using latest driver 581.80
-
Clean driver installation (DDU)
None of these solutions resolved the issue.
Hypothesis
The consistent ~102ms spike pattern suggests this is caused by:
-
WDDM driver internal housekeeping/command buffer flush - Windows Display Driver Model may be forcing periodic GPU synchronization for display composition, with different behavior on Blackwell vs Ada
-
Blackwell-specific driver scheduler - The Blackwell driver may have a different internal scheduling or memory management mechanism that causes periodic stalls
-
TCC mode unavailable - The RTX PRO 4000/2000 Blackwell do not support TCC driver mode, which would bypass WDDM entirely
-
New Blackwell power management - Blackwell architecture may have different P-state transition behavior causing these spikes
Request to NVIDIA
-
Can NVIDIA confirm if this is a known issue with Blackwell Pro series drivers in WDDM mode on Windows?
-
Is there a driver update planned to address this periodic latency spike?
-
Are there any additional workarounds or registry settings specific to Blackwell that might help?
-
Can TCC mode support be added to RTX PRO 4000/2000 Blackwell in future drivers?
-
What is the root cause of this ~102ms spike that does not occur on Ada architecture?
Environment Details for Reproduction
GPUs with issue:
- NVIDIA RTX PRO 4000 Blackwell
- NVIDIA RTX PRO 2000 Blackwell
GPUs without issue (same PC):
- NVIDIA RTX Ada PRO 4000
- NVIDIA RTX Ada PRO 2000
Driver: 581.80
CUDA: 12.8
TensorRT: 10.8.0.43
OS: Windows (WDDM mode)
Workload: Continuous YOLO inference loop (TensorRT optimized)
Test Duration: Extended continuous operation
Expected: Consistent 1-2ms inference time
Actual: Periodic 102-103ms spike (in cycles, not fixed interval)