cuSolver memory limit? svd solver cannot handle >128 matrices

Hi, I’m currently running cuSolver on 28x28 matrices using gesvdj and syevj for comparison. For some reason, I start to get incorrect values using gesvdj when I use batch size of >128. It also happens with smaller matrix sizes

Is there any reason why this might happen?

I have to think this is for sure below the allowed memory limit. Code is linked (based on an example I found here). I am running sm_87 on the jetson Orin.
example.zip (11.9 KB)

I suggest filing a bug

I’ve run your test case (512 matrices) on both CUDA 11.4 and CUDA 12.0. I see slightly different behavior in terms of data output, but there is output data for the gesvdj case that is zeros. Furthermore, running your code under compute-sanitizer shows invalid writes in one of the cusolver kernels.

Bring back the bug ticket status here . This is not a bug .

  1. devInfo is supposed to be an integer array of dimension batchSize,
    see details in cuSOLVER
  2. if numMatrices is odd, in your case , last matrix is not initialized, and that’s why we saw zero singular values for it.

See fixed version attachment .
top_ex_fixed.cu (10.9 KB)