Hello,
I’m hitting a hard kernel failure in cusolverDnXsyevd on a 50,000 × 50,000 real FP64 symmetric matrix. This looks like a continuation of the n≈27k Xsyevd thread, which was reported fixed in CUDA Toolkit 13.0. The failure mode here is different (kernel execution, not bufferSize) but appears to be in the same family of internal int-sizing issues at large n.
Environment
-
CUDA Toolkit: 13.2
-
libcublas: version=130400
-
Driver: 580.95.05
-
GPU: NVIDIA H200
-
Call site: direct C++ (no CuPy/PyTorch in the path)
Call configuration
-
n = 50000,lda = 50000, symmetric real FP64 -
dataTypeA = CUDA_R_64F,dataTypeW = CUDA_R_64F,computeType = CUDA_R_64F -
jobz = CUSOLVER_EIG_MODE_VECTOR,uplo = CUBLAS_FILL_MODE_LOWER -
Default
cusolverDnParams_t(created viacusolverDnCreateParams, no advanced options set)
Sequence
-
cusolverDnXsyevd_bufferSize→ returns success. Reported workspace: device = [X] bytes, host = [Y] bytes. -
Device + host workspace allocated successfully (verified
cudaMallocreturn). -
cusolverDnXsyevd→ returns status 6 (Idk if it’sCUSOLVER_STATUS_EXECUTION_FAILEDorCUSOLVER_STATUS_INTERNAL_ERROR).
System log (concurrent with the syevd call)
XID 31: NVRM: Xid (PCI:0000:59:00): 31, pid=996597, name=exe, channel 0x0000000c
MMU Fault: ENGINE GRAPHICS GPC1 GPCCLIENT_T1_4 faulted @ 0x2aa1_cc016000
Fault type: FAULT_PDE ACCESS_TYPE_VIRT_READ
A FAULT_PDE virtual read at a high address from a cuSOLVER GPC kernel strongly suggests an out-of-bounds index inside the syevd pipeline — consistent with an internal element-count or stride being computed/stored in 32 bits somewhere along the call chain (n² = 2.5 × 10⁹ exceeds INT32_MAX). The 13.0 release notes describe the X-API dimension limit as removed; this case suggests the fix may not extend through every internal kernel at this scale.
Reproduction The smallest test case I have is a random symmetric FP64 matrix at n = 50,000 with the call sequence above. I haven’t bisected the n threshold yet — happy to do so and report back if useful. Will also try cusolverDnXsyevdx with a top-k range to see whether the same internal path is involved.
Questions
-
Is
Xsyevdvalidated at n ≥ 50,000 FP64 in 13.2? The earlier thread topped out at ~27k. -
Is there a recommended workaround within cuSOLVER (e.g.
Xsyevdx,Xsygvd, or routing through cuSolverMp) that avoids the affected code path while staying single-GPU? -
Should I file this as a bug, or is there an existing internal tracker?
Thanks.