RTX PRO 4000 & PRO 2000 Blackwell - Periodic 102ms Inference Latency Spike (Windows) - Critical for Real-Time Applications

Hi

We are experiencing a consistent and reproducible 102-103ms inference latency spike that occurs periodically (in cycles, not at a fixed time interval or fixed number of images) on NVIDIA RTX PRO 4000 Blackwell and RTX PRO 2000 Blackwell GPUs running Windows with WDDM driver mode.

CRITICAL: The exact same code, same PC, same Windows configuration runs perfectly without any spikes on RTX Ada PRO 4000 and RTX Ada PRO 2000. This confirms the issue is specific to the Blackwell architecture/driver.

This issue makes Blackwell Pro GPUs unsuitable for real-time computer vision applications in our project , where consistent frame timing is essential .


System Configuration

Hardware

  • GPU Tested (WITH ISSUE):

    • NVIDIA RTX PRO 4000 Blackwell ❌ Issue present

    • NVIDIA RTX PRO 2000 Blackwell (16GB VRAM) ❌ Issue present

  • GPU Tested (NO ISSUE):

    • NVIDIA RTX Ada PRO 4000 ✅ Works perfectly

    • NVIDIA RTX Ada PRO 2000 ✅ Works perfectly

  • Driver Model: WDDM

  • Display: Connected (Disp.A = On)

  • ECC: Off

Software

  • Driver Version: 581.80

  • CUDA Version: 11.8

  • TensorRT Version: 10.8.0.43

  • Operating System: Windows

  • Inference Framework: YOLO object detection model (TensorRT optimized)

nvidia-smi Output (RTX PRO 2000 Blackwell)

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 581.80                 Driver Version: 581.80         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX PRO 2000 Blac...  WDDM  |   00000000:01:00.0  On |                  Off |
| 30%   45C    P0             18W /   70W |     943MiB /  16311MiB |      8%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Problem Description

Normal Behavior

  • Inference time: 1-2ms per image (99.5% of all inferences)

  • GPU frequency locked at 2000-2200 MHz

  • Performance state: P0

Abnormal Behavior

  • Periodically (in cycles - NOT at a fixed time interval, NOT at a fixed number of images)

  • A single inference suddenly takes 102-103ms (approximately 100x slower than normal)

  • Immediately returns to normal 1-2ms after the spike

  • Pattern is reproducible on both RTX PRO 4000 and PRO 2000 Blackwell

  • Does NOT occur on RTX Ada PRO 4000 or Ada PRO 2000 on the same system

Statistical Analysis (10,274 samples on RTX PRO 2000 Blackwell)

Metric Value
Total Samples 10,274
Min 1 ms
Max 118 ms
Mean 2.04 ms
Median 2.0 ms
Std Dev 6.91 ms

Distribution of Inference Times

Time Range Count Percentage
0-5 ms 10,222 99.49%
5-10 ms 0 0%
10-15 ms 2 0.02%
15-20 ms 2 0.02%
35-40 ms 1 0.01%
100-105 ms 44 0.43%
110-120 ms 3 0.03%

Breakdown of Spikes > 100ms (47 total samples)

Inference Time Count
102 ms 23
103 ms 20
104 ms 1
114 ms 1
116 ms 1
118 ms 1

Critical Comparison: Blackwell vs Ada (Same PC, Same Code, Same Configuration)

GPU Architecture Periodic 102ms Spike Issue
RTX PRO 4000 Blackwell Blackwell YES - Issue Present
RTX PRO 2000 Blackwell Blackwell YES - Issue Present
RTX Ada PRO 4000 Ada Lovelace NO - Works Perfectly
RTX Ada PRO 2000 Ada Lovelace NO - Works Perfectly

This is definitive proof that the issue is specific to the Blackwell driver/architecture, not our code, Windows configuration, or inference pipeline.


Impact on Real-Time Applications

Why This Is Critical

This periodic 102ms spike makes Blackwell Pro GPUs unsuitable for real-time applications, including:

  • Industrial Machine Vision: Production line inspection systems cannot tolerate 100ms frame drops

  • Robotics: Real-time object detection for robot guidance requires consistent latency

  • Autonomous Systems: Vehicle/drone perception systems need guaranteed frame timing

  • Medical Imaging: Real-time surgical guidance and diagnostic systems

  • Security & Surveillance: Real-time threat detection systems

  • Quality Control: High-speed manufacturing inspection

Real-World Consequences

  • At 30 FPS, a 102ms spike causes 3+ frames to be dropped

  • Unpredictable timing makes synchronization with other systems impossible

  • Cannot guarantee SLA/response time requirements

  • Forces customers to stay on Ada architecture instead of upgrading to Blackwell


What We Have Already Tried

Power Management

  • Set Windows Power Plan to “High Performance”

  • NVIDIA Control Panel → Power Management Mode → “Prefer Maximum Performance”

  • Disabled PCI Express Link State Power Management

  • Enabled GPU Persistence Mode (nvidia-smi -pm 1)

GPU Configuration

  • Locked GPU frequency at 2000-2200 MHz (nvidia-smi -lgc 2000,2200)

  • Verified GPU stays in P0 performance state

  • Checked ECC is disabled

  • Tested with display connected and disconnected

System Optimization

  • Disabled Hardware-Accelerated GPU Scheduling

  • Stopped NVIDIA Telemetry services

  • Closed all unnecessary applications

  • Tested with minimal background processes

  • Disabled Windows Defender real-time monitoring

Driver

  • Using latest driver 581.80

  • Clean driver installation (DDU)

None of these solutions resolved the issue.


Hypothesis

The consistent ~102ms spike pattern suggests this is caused by:

  1. WDDM driver internal housekeeping/command buffer flush - Windows Display Driver Model may be forcing periodic GPU synchronization for display composition, with different behavior on Blackwell vs Ada

  2. Blackwell-specific driver scheduler - The Blackwell driver may have a different internal scheduling or memory management mechanism that causes periodic stalls

  3. TCC mode unavailable - The RTX PRO 4000/2000 Blackwell do not support TCC driver mode, which would bypass WDDM entirely

  4. New Blackwell power management - Blackwell architecture may have different P-state transition behavior causing these spikes


Request to NVIDIA

  1. Can NVIDIA confirm if this is a known issue with Blackwell Pro series drivers in WDDM mode on Windows?

  2. Is there a driver update planned to address this periodic latency spike?

  3. Are there any additional workarounds or registry settings specific to Blackwell that might help?

  4. Can TCC mode support be added to RTX PRO 4000/2000 Blackwell in future drivers?

  5. What is the root cause of this ~102ms spike that does not occur on Ada architecture?


Environment Details for Reproduction

GPUs with issue:
  - NVIDIA RTX PRO 4000 Blackwell
  - NVIDIA RTX PRO 2000 Blackwell

GPUs without issue (same PC):
  - NVIDIA RTX Ada PRO 4000
  - NVIDIA RTX Ada PRO 2000

Driver: 581.80
CUDA: 11.8
TensorRT: 10.8.0.43
OS: Windows (WDDM mode)
Workload: Continuous YOLO inference loop (TensorRT optimized)

Test Duration: Extended continuous operation
Expected: Consistent 1-2ms inference time
Actual: Periodic 102-103ms spike (in cycles, not fixed interval)

Thank’s