Hi!
Thanks for the question. I’ve checked, cuDSS just did what it was supposed to do with the given settings.
To understand the cuDSS behavior you should understand how pivoting works.
With default configuration settings, cuDSS does partial pivoting with a default pivoting epsilon (see in cuDSS Data Types — NVIDIA cuDSS) which is for double precision and GENERAL matrix type 1e-13. When the diagonal blocks of the matrix are factorized, cuDSS (due to the default value of CUDSS_PIVOT_COL) searches for the pivot element within the partial column, and then, after the swap, checks whether the pivot element is less than PIVOT_EPSILON (so essentially, pivoting epsilon tells cuDSS how small a value should be so that it is treated as a numerical zero). If the value is too small, it gets replaced with the PIVOT_EPSILON. Such pivots are called perturbed pivots. This is done to limit numerical instability.
You can check via cudssDataGet() for CUDSS_DATA_NPIVOTS after the factorization phase, that your matrix has 5 perturbed pivots (which makes sense, all values are too small compared to the default pivoting epsilon).
Usually, when perturbed pivots are present, iterative refinement may be used in order to restore the accuracy (if it works). By default, however, cuDSS does no iterative refinement steps (can be changed via CUDSS_CONFIG_IR_N_STEPS).
In your case, though, since the pivot epsilon is so large compared to the matrix elements, iterative refinement will not help.
To change the behavior, you have the following options:
a) [recommended right now] Set a small value (to practically disable the checks, smth like 1e-300 would do) for CUDSS_CONFIG_PIVIT_EPSILON via cudssConfigSet
b) [this will not work for your specific case as the scales are changing too wildly] Set a non-default pivot epsilon scaling algorithm via cudssConfigSet with CUDSS_ALG_1 for CUDSS_CONFIG_PIVOT_EPSILON_ALG + enable iterative refinement via CUDSS_CONFIG_IR_N_STEPS with, say, 2 iterative refinement steps.
c) [currently will not work as described and thus does not help, as it only disables swaps but not the epsilon checks] Disable pivoting altogether by setting CUDSS_PIVOT_NONE as CUDSS_CONFIG_PIVOT_TYPE
d) starting with one of the next releases, you can enable matching + scaling (and scaling is here the part which is the most relevant). When scaling is done, default pivoting epsilon should work well if the matrix is not too ill-conditioned.
Also note that with the examples of the nature you have, catastrophic cancellation might happen and the result might depend on the reordering permutation and order of operations during the parallel execution (this of course will only be relevant for larger test cases). So I’d be cautious to say what is the “expected” result in similar cases [with the usual numerical analysis estimates to be respected].
Let us know if you have further questions!
Thanks,
Kirill