GB10 Hardware Baseline — First Direct Measurements and Findings

this confirms it across all three versions you tested.

Driver: 580.142 (all runs)

CUDA 13.0  — %clock64 correct, all probes valid     ✓
CUDA 13.1  — GPU timing broken, overflow results    ✗
CUDA 13.2  — %clock64 returns 0, uma_bw overflows   ✗

CPU read/write numbers are correct on all three versions
because CPU timing uses CLOCK_MONOTONIC (Linux wall clock)
— not %clock64. The failure is specific to PTX %clock64
compilation for SM 12.1 on CUDA 13.1 and 13.2.

Build requirement: CUDA 13.0 only.

/usr/local/cuda-13.0/bin/nvcc -O2 -std=c++17 \
  probe_launcher.cu -o uma_probe -lcudart -lcuda -lpthread

/usr/local/cuda-13.0/bin/nvcc -O2 -std=c++17 \
  uma_atomic_test.cu -o uma_atomic -lcudart -lcuda -lpthread

/usr/local/cuda-13.0/bin/nvcc -O2 -std=c++17 \
  uma_bandwidth_test.cu -o uma_bw -lcudart -lcuda -lpthread

Thank you for running all three versions — this is exactly
the systematic data the project needed to confirm the
CUDA version boundary on GB10.