cuSolver big deviations from lapack

Hello , I am trying to use cuSolver and specific cusolverDnSgesvd ( really , where can I find any documentation??? ) and I noticed that the results differ a lot from using LAPACKE_sgesvd.

( Also, i am not sure about the “work” , “work size” ,“rwork” )

For example :

cuSolver:

S[ 0 ] = 1.43155e+09
S[ 1 ] = 1.06301e+08
S[ 2 ] = 6.08459e+06
S[ 3 ] = 320.892    
S[ 4 ] = 255.5      
S[ 5 ] = 253.708    
S[ 6 ] = 241.768    
S[ 7 ] = 240.104    
S[ 8 ] = 230.025    
S[ 9 ] = 228.298    
S[ 10 ] = 225.957

lapack:

S[0] = 1.43155e+09             
S[1] = 1.06301e+08             
S[2] = 6.08458e+06             
S[3] = 1499.65                 
S[4] = 1473.44                 
S[5] = 1190.09                 
S[6] = 1040.91                 
S[7] = 824.54                  
S[8] = 819.075                 
S[9] = 775.057                 
S[10] = 769.074

code:

#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <cuda.h>
#include <cuda_runtime.h>
#include <cusolverDn.h>

using namespace std;

int main()
{

    int M = 1000;
    int N = 1000;

    float * A = (float *)malloc( M * N * sizeof(*A) );
    for( int i = 0; i < M; i++ )
    {
        for( int j = 0; j < N; j++ )
        {
            A[ j * M + i ] = ( i + j ) * ( i + j );
        }
    }

    float * devA;         
    cudaMalloc( &devA ,  M * N * sizeof(*devA) );

    float * S = (float *)malloc( M *     sizeof(*S) );
    float * U = (float *)malloc( M * M * sizeof(*U) );
    float * V = (float *)malloc( N * N * sizeof(*V) );

    int WorkSize = M * M;

    int * devInfo;
    cudaMalloc( &devInfo, sizeof(*devInfo) );
    float * devS;
    cudaMalloc( &devS, M * sizeof(*devS) );
    float * devU;
    cudaMalloc( &devU,M * M * sizeof(*devU) );
    float * devV;
    cudaMalloc( &devV, N * N * sizeof(*devV) );
    

    cusolverStatus_t cuSolverStatus;
    cusolverDnHandle_t cuSolverHandle;
    cusolverDnCreate( &cuSolverHandle );

    cuSolverStatus = cusolverDnSgesvd_bufferSize( cuSolverHandle, M, N, &WorkSize );

    float * Work;   
    cudaMalloc( &Work, WorkSize * sizeof(*Work) );
    float * rwork;   
    cudaMalloc( &rwork, M * M * sizeof(*rwork) );

    cudaMemcpy( devA, A, M * N * sizeof(*A), cudaMemcpyHostToDevice );
        
    cuSolverStatus = cusolverDnSgesvd( cuSolverHandle, 'A', 'A', M, N, devA, M, devS, devU, M, devV, N, Work, WorkSize, NULL, devInfo );
    cudaPeekAtLastError();
    cudaDeviceSynchronize();
	    
    cudaMemcpy( S, devS, M * sizeof(*devS), cudaMemcpyDeviceToHost );

    for( int i = 0; i < N; i++ )
        cout << "S[ " << i << " ] = " << S[ i ] << endl;

    cusolverDnDestroy( cuSolverHandle );
    cudaDeviceReset();
    
    return 0;

}

Your best bet for the documentation is the HTML version, the pdf version is truncating the examples horribly.
This example is a single precision example, so you can expect about 1e-6 relative tolerance of accuracy. if you compare the largest values (which are accurate) to the smallest, the smallest ones are smaller than 1e-6 relative to the largest ones. For more accurate values you would have to use the double precision version.
We have validated our results versus Matlab internally, the other values for this problem are all numerically close to zero (double precision 1e-6 or smaller).
Joe Eaton
Manager, cuSOLVER library

Ok, thanks.

Can you point me to the html doc please?

on a standard CUDA 7 RC linux install, it is in:

/usr/local/cuda/doc/html/cusolver/index.html

Ok, thank you!

To add to this discussion, I’ve noticed that the relative accuracy of cusolverDnDsygvd compaired to the equivalent LAPACK routine does not remain consistent across the range of eigenvalues, even while using doubles. Here are the results for a 72x72 matrix

cpu                                 gpu                                   diff (cpu - gpu)               rel error (abs((cpu - gpu)/cpu)) 
             -6557086.151330208                  -6557086.151330523                    0.315E-06                0.480E-13
              -4685518.273470301                  -4685518.273470619                    0.318E-06                0.678E-13
              -3440628.225057856                  -3440628.225057963                    0.107E-06                0.311E-13
              -2546818.741489634                  -2546818.741489429                   -0.205E-06                0.804E-13
              -1888280.376246337                  -1888280.376246130                   -0.207E-06                0.110E-12
              -1400646.869021040                  -1400646.869021131                    0.908E-07                0.648E-13
              -1039837.984309165                  -1039837.984309433                    0.267E-06                0.257E-12
              -773106.0526620904                  -773106.0526622742                    0.184E-06                0.238E-12
              -576039.7666829620                  -576039.7666830107                    0.488E-07                0.847E-13
              -430500.5161689512                  -430500.5161689740                    0.228E-07                0.529E-13
              -323065.1859034715                  -323065.1859035635                    0.919E-07                0.284E-12
              -243814.2513386299                  -243814.2513387445                    0.115E-06                0.470E-12
              -185426.7951479861                  -185426.7951480409                    0.548E-07                0.296E-12
              -142505.8158209964                  -142505.8158209742                   -0.221E-07                0.155E-12
              -111079.0589208710                  -111079.0589208103                   -0.607E-07                0.546E-12
              -88227.25370146416                  -88227.25370140512                   -0.590E-07                0.669E-12
              -71803.56058049876                  -71803.56058046089                   -0.379E-07                0.527E-12
              -60215.01412908621                  -60215.01412907071                   -0.155E-07                0.257E-12
              -52248.39944140727                  -52248.39944141232                    0.505E-08                0.966E-13
              -46941.40528337077                  -46941.40528338988                    0.191E-07                0.407E-12
              -43512.96023924157                  -43512.96023926557                    0.240E-07                0.552E-12
              -41347.28760697029                  -41347.28760698707                    0.168E-07                0.406E-12
              -39993.05005058450                  -39993.05005059529                    0.108E-07                0.270E-12
              -39144.94395507131                  -39144.94395508912                    0.178E-07                0.455E-12
              -38608.63533905784                  -38608.63533907639                    0.185E-07                0.480E-12
              -38264.61613571877                  -38264.61613572028                    0.151E-08                0.394E-13
              -38040.32649315479                  -38040.32649313326                   -0.215E-07                0.566E-12
              -37891.64669089990                  -37891.64669086706                   -0.328E-07                0.867E-12
              -37791.49035304645                  -37791.49035301258                   -0.339E-07                0.896E-12
              -37722.99600354004                  -37722.99600350652                   -0.335E-07                0.889E-12
              -37675.49851111921                  -37675.49851109224                   -0.270E-07                0.716E-12
              -37642.13621302336                  -37642.13621301615                   -0.721E-08                0.192E-12
              -37618.41415477773                  -37618.41415479657                    0.188E-07                0.501E-12
              -37601.32649838573                  -37601.32649841224                    0.265E-07                0.705E-12
              -37588.79943992328                  -37588.79943993517                    0.119E-07                0.316E-12
              -37579.26104008492                  -37579.26104008730                    0.239E-08                0.635E-13
              -8274.321509164043                  -8274.321509166019                    0.198E-08                0.239E-12
              -2200.559094414749                  -2200.559094414718                   -0.309E-10                0.141E-13
              -916.6711795586061                  -916.6711795599832                    0.138E-08                0.150E-11
              -491.5369540784572                  -491.5369540789886                    0.531E-09                0.108E-11
              -304.3623873042177                  -304.3623873039944                   -0.223E-09                0.734E-12
              -206.4197977614530                  -206.4197977619685                    0.516E-09                0.250E-11
              -148.9799828855368                  -148.9799828859675                    0.431E-09                0.289E-11
              -112.4984769342237                  -112.4984769342373                    0.136E-10                0.121E-12
              -87.94814970701286                  -87.94814970619940                   -0.813E-09                0.925E-11
              -69.81651592095926                  -69.81651592170054                    0.741E-09                0.106E-10
              -54.54281460068066                  -54.54281460088433                    0.204E-09                0.373E-11
              -41.52680562952268                  -41.52680562945806                   -0.646E-10                0.156E-11
              -30.86156860980344                  -30.86156860980589                    0.245E-11                0.795E-13
              -22.13204556575384                  -22.13204556582504                    0.712E-10                0.322E-11
               25.02682398550178                   25.02682398519025                    0.312E-09                0.124E-10
               473.9272908758672                   473.9272908800876                   -0.422E-08                0.891E-11
               1568.809884175413                   1568.809884181109                   -0.570E-08                0.363E-11
               3737.588672611608                   3737.588672613908                   -0.230E-08                0.615E-12
               7512.146411444023                   7512.146411441712                    0.231E-08                0.308E-12
               13503.12111184110                   13503.12111184500                   -0.391E-08                0.289E-12
               22432.52615903112                   22432.52615904070                   -0.958E-08                0.427E-12
               35225.94614765471                   35225.94614764160                    0.131E-07                0.372E-12
               53138.14783089799                   53138.14783085838                    0.396E-07                0.745E-12
               77905.14759033514                   77905.14759027305                    0.621E-07                0.797E-12
               111938.3996064112                   111938.3996063399                    0.713E-07                0.637E-12
               158590.5625594642                   158590.5625594025                    0.617E-07                0.389E-12
               222529.0258907696                   222529.0258906945                    0.751E-07                0.337E-12
               310274.1131858221                   310274.1131856868                    0.135E-06                0.436E-12
               430971.9414122708                   430971.9414121570                    0.114E-06                0.264E-12
               597535.1794897385                   597535.1794898062                   -0.676E-07                0.113E-12
               828301.1904094998                   828301.1904097856                   -0.286E-06                0.345E-12
               1149613.030370542                   1149613.030370763                   -0.221E-06                0.192E-12
               1599774.940302968                   1599774.940302923                    0.449E-07                0.281E-13
               2237328.904566578                   2237328.904566573                    0.466E-08                0.208E-14
               3158630.909026634                   3158630.909026912                   -0.278E-06                0.879E-13
               4563385.413813084                   4563385.413813453                   -0.369E-06                0.808E-13

With the smaller eigenvalues, I’m losing about 4 - 5 digits of precision. Is there anything I can do to improve this? My code can be found at https://github.com/d-henness/testing_LA_libs. Thanks!