Cudss generates different solutions

Hi,

I am new to GPU computing and would like to learn cudss and use it as a Direct solver for a FEM code.

To better understand the tool, I modified simple.cpp in CUDALibrarySamples so that it can read a mtx format matrix and solve it by cudss. I did a test with ex5.mtx as the operator and ones as the rhs. However, I found that the solutions for different runs are slightly different.

Run 1:

6.523435281436166
12.15624546543049
15.48436799649963
18.15624170982419
18.48436467570765
18.15624130668297
15.48436675564838
12.15624345799502
6.523434354351519
6.582028659060476
11.92186998618574
15.60155529938564
17.92186572753139
18.60155216231058
17.92186509258376
15.60155399556828
11.92186846241165
6.582027611061516
6.523435281438486
12.15624546542148
15.4843679965074
18.15624170980835
18.48436467572148
18.15624130666917
15.48436675565014
12.15624345798938
6.523434354355045

Run 2:

6.523435281526809
12.15624546561178
15.48436799677158
18.15624171018677
18.48436467614943
18.15624130727269
15.48436675615095
12.15624345854199
6.523434354593602
6.582028659151121
11.92186998636703
15.60155529965758
17.92186572789398
18.60155216276954
17.92186509310472
15.60155399610374
11.92186846289581
6.5820276113193
6.523435281529133
12.15624546560278
15.48436799677934
18.15624171017094
18.48436467616324
18.1562413072589
15.4843667561527
12.15624345853635
6.523434354597127

For example, the relative difference of the first term is about 1.3895e-11. Since both the operator and vectors are constructed with CUDA_R_64F, I feel both results should be identical. I wonder if some missing settings in simple.cpp caused the slight difference in each solve?

Cudss version: 0.3.0.9
nvcc version: 12.5.82
GPU: RTX4070

Regards,
Di

What GPU are you using?

The GPU I am using is RTX4070.

Hello!

cudss currently relies on atomics and thus does not have bitwise reproducible results.

If bitwise reproducibility is needed, we would consider such a feature request.
If accuracy is the concern, there are ways to increase it (e.g., through iterative refinement or pivoting parameters if numerical (small) pivots occur). Also, there are features lke matching/scaling which would be useful in these situations but cudss currently does not have those features.

I hope this answers your question.

Best,
Kirill

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.