RTX PRO 4000 & PRO 2000 Blackwell - Periodic 102ms Inference Latency Spike (Windows) - Critical for Real-Time Applications

Hamdouch · January 13, 2026, 2:07pm

Hi

We are experiencing a consistent and reproducible 102-103ms inference latency spike that occurs periodically (in cycles, not at a fixed time interval or fixed number of images) on NVIDIA RTX PRO 4000 Blackwell and RTX PRO 2000 Blackwell GPUs running Windows with WDDM driver mode.

CRITICAL: The exact same code, same PC, same Windows configuration runs perfectly without any spikes on RTX Ada PRO 4000 and RTX Ada PRO 2000. This confirms the issue is specific to the Blackwell architecture/driver.

This issue makes Blackwell Pro GPUs unsuitable for real-time computer vision applications in our project , where consistent frame timing is essential .

System Configuration

Hardware

GPU Tested (WITH ISSUE):
- NVIDIA RTX PRO 4000 Blackwell ❌ Issue present
- NVIDIA RTX PRO 2000 Blackwell (16GB VRAM) ❌ Issue present
GPU Tested (NO ISSUE):
- NVIDIA RTX Ada PRO 4000 ✅ Works perfectly
- NVIDIA RTX Ada PRO 2000 ✅ Works perfectly
Driver Model: WDDM
Display: Connected (Disp.A = On)
ECC: Off

Software

Driver Version: 581.80
CUDA Version: 11.8
TensorRT Version: 10.8.0.43
Operating System: Windows
Inference Framework: YOLO object detection model (TensorRT optimized)

nvidia-smi Output (RTX PRO 2000 Blackwell)

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 581.80                 Driver Version: 581.80         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX PRO 2000 Blac...  WDDM  |   00000000:01:00.0  On |                  Off |
| 30%   45C    P0             18W /   70W |     943MiB /  16311MiB |      8%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Problem Description

Normal Behavior

Inference time: 1-2ms per image (99.5% of all inferences)
GPU frequency locked at 2000-2200 MHz
Performance state: P0

Abnormal Behavior

Periodically (in cycles - NOT at a fixed time interval, NOT at a fixed number of images)
A single inference suddenly takes 102-103ms (approximately 100x slower than normal)
Immediately returns to normal 1-2ms after the spike
Pattern is reproducible on both RTX PRO 4000 and PRO 2000 Blackwell
Does NOT occur on RTX Ada PRO 4000 or Ada PRO 2000 on the same system

Statistical Analysis (10,274 samples on RTX PRO 2000 Blackwell)

Metric	Value
Total Samples	10,274
Min	1 ms
Max	118 ms
Mean	2.04 ms
Median	2.0 ms
Std Dev	6.91 ms

Distribution of Inference Times

Time Range	Count	Percentage
0-5 ms	10,222	99.49%
5-10 ms	0	0%
10-15 ms	2	0.02%
15-20 ms	2	0.02%
35-40 ms	1	0.01%
100-105 ms	44	0.43%
110-120 ms	3	0.03%

Breakdown of Spikes > 100ms (47 total samples)

Inference Time	Count
102 ms	23
103 ms	20
104 ms	1
114 ms	1
116 ms	1
118 ms	1

Critical Comparison: Blackwell vs Ada (Same PC, Same Code, Same Configuration)

GPU	Architecture	Periodic 102ms Spike Issue
RTX PRO 4000 Blackwell	Blackwell	❌ YES - Issue Present
RTX PRO 2000 Blackwell	Blackwell	❌ YES - Issue Present
RTX Ada PRO 4000	Ada Lovelace	✅ NO - Works Perfectly
RTX Ada PRO 2000	Ada Lovelace	✅ NO - Works Perfectly

This is definitive proof that the issue is specific to the Blackwell driver/architecture, not our code, Windows configuration, or inference pipeline.

Impact on Real-Time Applications

Why This Is Critical

This periodic 102ms spike makes Blackwell Pro GPUs unsuitable for real-time applications, including:

Industrial Machine Vision: Production line inspection systems cannot tolerate 100ms frame drops
Robotics: Real-time object detection for robot guidance requires consistent latency
Autonomous Systems: Vehicle/drone perception systems need guaranteed frame timing
Medical Imaging: Real-time surgical guidance and diagnostic systems
Security & Surveillance: Real-time threat detection systems
Quality Control: High-speed manufacturing inspection

Real-World Consequences

At 30 FPS, a 102ms spike causes 3+ frames to be dropped
Unpredictable timing makes synchronization with other systems impossible
Cannot guarantee SLA/response time requirements
Forces customers to stay on Ada architecture instead of upgrading to Blackwell

What We Have Already Tried

Power Management

Set Windows Power Plan to “High Performance”
NVIDIA Control Panel → Power Management Mode → “Prefer Maximum Performance”
Disabled PCI Express Link State Power Management
Enabled GPU Persistence Mode (nvidia-smi -pm 1)

GPU Configuration

Locked GPU frequency at 2000-2200 MHz (nvidia-smi -lgc 2000,2200)
Verified GPU stays in P0 performance state
Checked ECC is disabled
Tested with display connected and disconnected

System Optimization

Disabled Hardware-Accelerated GPU Scheduling
Stopped NVIDIA Telemetry services
Closed all unnecessary applications
Tested with minimal background processes
Disabled Windows Defender real-time monitoring

Driver

Using latest driver 581.80
Clean driver installation (DDU)

None of these solutions resolved the issue.

Hypothesis

The consistent ~102ms spike pattern suggests this is caused by:

WDDM driver internal housekeeping/command buffer flush - Windows Display Driver Model may be forcing periodic GPU synchronization for display composition, with different behavior on Blackwell vs Ada
Blackwell-specific driver scheduler - The Blackwell driver may have a different internal scheduling or memory management mechanism that causes periodic stalls
TCC mode unavailable - The RTX PRO 4000/2000 Blackwell do not support TCC driver mode, which would bypass WDDM entirely
New Blackwell power management - Blackwell architecture may have different P-state transition behavior causing these spikes

Request to NVIDIA

Can NVIDIA confirm if this is a known issue with Blackwell Pro series drivers in WDDM mode on Windows?
Is there a driver update planned to address this periodic latency spike?
Are there any additional workarounds or registry settings specific to Blackwell that might help?
Can TCC mode support be added to RTX PRO 4000/2000 Blackwell in future drivers?
What is the root cause of this ~102ms spike that does not occur on Ada architecture?

Environment Details for Reproduction

GPUs with issue:
  - NVIDIA RTX PRO 4000 Blackwell
  - NVIDIA RTX PRO 2000 Blackwell

GPUs without issue (same PC):
  - NVIDIA RTX Ada PRO 4000
  - NVIDIA RTX Ada PRO 2000

Driver: 581.80
CUDA: 11.8
TensorRT: 10.8.0.43
OS: Windows (WDDM mode)
Workload: Continuous YOLO inference loop (TensorRT optimized)

Test Duration: Extended continuous operation
Expected: Consistent 1-2ms inference time
Actual: Periodic 102-103ms spike (in cycles, not fixed interval)

Thank’s

Topic		Replies	Views
RTX PRO 4000 & PRO 2000 Blackwell - Periodic 102ms Inference Latency Spike (Windows) - Critical for Real-Time Applications CUDA Programming and Performance	3	244	January 14, 2026
RTX 5060 Blackwell + WSL2: Periodic 3.1s paravirt stall at exact 35.5s intervals (inference workload) CUDA on Windows Subsystem for Linux	0	71	May 18, 2026
Xid / GPU lost crash on RTX PRO 4000 Blackwell during LLM inference (Ollama, Windows 11 + WSL2, Driver 595.79) Software And Drivers	1	159	April 30, 2026
RTX PRO 5000 Blackwell Laptop GPU: Kernel-mode access violations (0x3B/0x1E) under sustained CUDA compute Drivers - Linux, Windows, MacOS cuda , rtx	1	112	April 1, 2026
RTX PRO 6000 Blackwell (GB202) - Recurring full chip reset during sustained LLM inference, requires PSU power cycle to recover Linux	9	405	June 4, 2026
Strange CNN inference latency behavior with CUDA and TensorRT TensorRT cuda	13	1693	January 24, 2024
RTX 4070 (AD104) GSP firmware crash (Xid 120 @ pc:0x1a92c96) under sustained CUDA workload — Windows BSOD + Linux GPU reset CUDA Programming and Performance cuda , kernel , cudnn , driver , cublas , gaming , rtx	0	87	May 11, 2026
Inference time mismatch between same configuration on Windows and Ubuntu TensorRT tensorrt , windows-driver	2	767	September 27, 2023
Xid 62 / Xid 154 GSP PMU halt crash on RTX PRO 6000 Blackwell during LLM inference Linux kernel , linux-driver	0	275	March 27, 2026
RTX Pro 6000 Blackwell - Wrong temperature thresholds causes SW Power Cap, GPU throttled to 510MHz (1/6 performance) General Topics & Other SDKs	2	417	March 16, 2026