cusolverDnXsyevd status 6 + XID 31 MMU fault at n=50000, FP64 real, CUDA 13.2

spectre10 · April 26, 2026, 7:06pm

Hello,

I’m hitting a hard kernel failure in cusolverDnXsyevd on a 50,000 × 50,000 real FP64 symmetric matrix. This looks like a continuation of the n≈27k Xsyevd thread, which was reported fixed in CUDA Toolkit 13.0. The failure mode here is different (kernel execution, not bufferSize) but appears to be in the same family of internal int-sizing issues at large n.

Environment

CUDA Toolkit: 13.2
libcublas: version=130400
Driver: 580.95.05
GPU: NVIDIA H200
Call site: direct C++ (no CuPy/PyTorch in the path)

Call configuration

n = 50000, lda = 50000, symmetric real FP64
dataTypeA = CUDA_R_64F, dataTypeW = CUDA_R_64F, computeType = CUDA_R_64F
jobz = CUSOLVER_EIG_MODE_VECTOR, uplo = CUBLAS_FILL_MODE_LOWER
Default cusolverDnParams_t (created via cusolverDnCreateParams, no advanced options set)

Sequence

cusolverDnXsyevd_bufferSize → returns success. Reported workspace: device = [X] bytes, host = [Y] bytes.
Device + host workspace allocated successfully (verified cudaMalloc return).
cusolverDnXsyevd → returns status 6 (Idk if it’s CUSOLVER_STATUS_EXECUTION_FAILED or CUSOLVER_STATUS_INTERNAL_ERROR).

System log (concurrent with the syevd call)

XID 31: NVRM: Xid (PCI:0000:59:00): 31, pid=996597, name=exe, channel 0x0000000c
MMU Fault: ENGINE GRAPHICS GPC1 GPCCLIENT_T1_4 faulted @ 0x2aa1_cc016000
Fault type: FAULT_PDE  ACCESS_TYPE_VIRT_READ

A FAULT_PDE virtual read at a high address from a cuSOLVER GPC kernel strongly suggests an out-of-bounds index inside the syevd pipeline — consistent with an internal element-count or stride being computed/stored in 32 bits somewhere along the call chain (n² = 2.5 × 10⁹ exceeds INT32_MAX). The 13.0 release notes describe the X-API dimension limit as removed; this case suggests the fix may not extend through every internal kernel at this scale.

Reproduction The smallest test case I have is a random symmetric FP64 matrix at n = 50,000 with the call sequence above. I haven’t bisected the n threshold yet — happy to do so and report back if useful. Will also try cusolverDnXsyevdx with a top-k range to see whether the same internal path is involved.

Questions

Is Xsyevd validated at n ≥ 50,000 FP64 in 13.2? The earlier thread topped out at ~27k.
Is there a recommended workaround within cuSOLVER (e.g. Xsyevdx, Xsygvd, or routing through cuSolverMp) that avoids the affected code path while staying single-GPU?
Should I file this as a bug, or is there an existing internal tracker?

Thanks.

spectre10 · April 26, 2026, 7:56pm

I should add that it works reliably for n<= 46000.

christophk · April 28, 2026, 2:39am

Hi spectre,

Thanks for the detailed report. We tried to reproduce on our side and could not, so we’d like to ask for a few more pieces of information.

Our setup, intended to match yours as closely as possible:

GPU: NVIDIA H200
CUDA toolkit: 13.2 Update 1
libcublas: 130400 (same major.minor as your report)
libcusolver: 12200
Driver: 590.48.01

Two runs at exactly your call signature (jobz=CUSOLVER_EIG_MODE_VECTOR, uplo=lower, n=50000, lda=50000, computeType=CUDA_R_64F, default cusolverDnParams_t):

Both run cleanly, residuals at the expected level. So with the same cusolverDnXsyevd call, the same cuBLAS major version, on the same GPU class, we don’t see the failure — and matrix data does not appear to be the trigger.

To localize the differential, could you a self-contained reproducer. A small .cu that allocates a random symmetric FP64 matrix with a fixed seed (e.g. std::mt19937 rng(42)), calls cusolverDnXsyevd_bufferSize → cudaMalloc → cusolverDnXsyevd, and prints the status.

If you can spare the wall time (~1 h at n=50000), please run

compute-sanitizer ./your_binary

and attach the output. This typically pins the failing kernel and offending access pattern. And gives more diagnostic than the status code alone.

Topic		Replies	Views
cusolverDnXsyevd_buffersize failed, istat=3 GPU-Accelerated Libraries cusolver	6	71	February 27, 2026
Xsyevd eigenvalue solver limits the matrix size GPU-Accelerated Libraries	0	74	January 30, 2025
Limitations of cusolverDn<t>syevd() GPU-Accelerated Libraries	1	512	January 30, 2025
Excessive lwork memory request in cusolverDnDsyevd GPU-Accelerated Libraries	3	555	January 23, 2020
cuSolver work improperly with cusolverDnSsyevd. GPU-Accelerated Libraries	4	860	June 2, 2019
cusolverSpScsreigvsi throws segmentation fault for larger matrices (e.g. 33000x33000) GPU-Accelerated Libraries cusolver , cusparse	7	1164	January 3, 2023
Cusolver syevd samples has different output from official GPU-Accelerated Libraries cusolver	2	530	December 16, 2022
Bug in cusolverDnDsyevj GPU-Accelerated Libraries	8	587	April 6, 2018
cuSolver memory limit? svd solver cannot handle >128 matrices GPU-Accelerated Libraries cusolver	3	981	July 17, 2023
Segmentation fault in cusolverDnXsyevdx GPU-Accelerated Libraries cusolver	1	653	July 20, 2022

cusolverDnXsyevd status 6 + XID 31 MMU fault at n=50000, FP64 real, CUDA 13.2

Related topics