I noticed that the performance of CUSOLVER’s syevj significantly degrades for matrices larger than ~2000, compared to syevd. This is what I got on my GeForce Titan XP, double precision, CUDA version 11.0 – syevj time is for 1 sweep, reported number is GFLOPS/s = 2*n^3/time:
|n||syevd||syevj (1 sweep)|
The problem I’m interested is one where I have a large matrix that is almost diagonal, and so ~1 sweep of syevj is enough to get the eigenvalues.
The same phenomenon happens with single precision instead of double precision.
Concerning syevj with single precision: I noticed that the returned orthogonal matrix can be quite far from being orthogonal. The squared 2-norms of the columns can be off from 1 by about 1e-4 (or less) for matrices of size ~1000 or larger. A simple rescaling of the columns can fix the issue.