I have a matrix of approximately 24500 by 24500 doubles I need to diagonalize on a V100 (running CUDA 10.0.130). However, the actual call to cusolverDnDsyevd fails with code 2, which led me to check the lwork size. It turned out to be 1839698262, or approximately 14.7 GB. This seems far excessive for the approximately 4 GB matrix. I am also unclear as to why dsyevd fails with code 2, as only about 20 out of 32 GB is being allocated. Is this a bug, or if not, is there a way to reduce this to more reasonable levels?
Can you use the syevj (jacobi) method?
According to my testing, it cuts the work size in about half.
Otherwise, you could consider filing a bug using the instructions linked at the top of the CUDA programming forum. You could mark the bug as an “enhancement request” to reduce the buffer size needed.
Yes, the Jaccobi variant has much more reasonable memory usage. However, it is also slower by about a factor of 10, making it useless over conventional CPU based eigensolvers.
Thank you, I’ll file a bug. In addition to the excessive buffer size is the issue of dysevd failing due to being unable to allocate additional memory.
If you file a bug, you’ll likely be asked for a complete test case (including the A matrix definition).