Cudss generates different solutions

miaodi1987 · July 15, 2024, 8:20am

Hi,

I am new to GPU computing and would like to learn cudss and use it as a Direct solver for a FEM code.

To better understand the tool, I modified simple.cpp in CUDALibrarySamples so that it can read a mtx format matrix and solve it by cudss. I did a test with ex5.mtx as the operator and ones as the rhs. However, I found that the solutions for different runs are slightly different.

Run 1:

6.523435281436166
12.15624546543049
15.48436799649963
18.15624170982419
18.48436467570765
18.15624130668297
15.48436675564838
12.15624345799502
6.523434354351519
6.582028659060476
11.92186998618574
15.60155529938564
17.92186572753139
18.60155216231058
17.92186509258376
15.60155399556828
11.92186846241165
6.582027611061516
6.523435281438486
12.15624546542148
15.4843679965074
18.15624170980835
18.48436467572148
18.15624130666917
15.48436675565014
12.15624345798938
6.523434354355045

Run 2:

6.523435281526809
12.15624546561178
15.48436799677158
18.15624171018677
18.48436467614943
18.15624130727269
15.48436675615095
12.15624345854199
6.523434354593602
6.582028659151121
11.92186998636703
15.60155529965758
17.92186572789398
18.60155216276954
17.92186509310472
15.60155399610374
11.92186846289581
6.5820276113193
6.523435281529133
12.15624546560278
15.48436799677934
18.15624171017094
18.48436467616324
18.1562413072589
15.4843667561527
12.15624345853635
6.523434354597127

For example, the relative difference of the first term is about 1.3895e-11. Since both the operator and vectors are constructed with CUDA_R_64F, I feel both results should be identical. I wonder if some missing settings in simple.cpp caused the slight difference in each solve?

Cudss version: 0.3.0.9
nvcc version: 12.5.82
GPU: RTX4070

Regards,
Di

gerd4 · July 15, 2024, 4:14pm

What GPU are you using?

miaodi1987 · July 15, 2024, 4:59pm

The GPU I am using is RTX4070.

kvoronin · July 17, 2024, 7:43pm

Hello!

cudss currently relies on atomics and thus does not have bitwise reproducible results.

If bitwise reproducibility is needed, we would consider such a feature request.
If accuracy is the concern, there are ways to increase it (e.g., through iterative refinement or pivoting parameters if numerical (small) pivots occur). Also, there are features lke matching/scaling which would be useful in these situations but cudss currently does not have those features.

I hope this answers your question.

Best,
Kirill

system · July 31, 2024, 7:44pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cublasSgemm produces non-trivially different results in CUDA 9.1 vs CUDA 8.0 GPU-Accelerated Libraries	9	1117	February 19, 2019
Using CUDA Libraries from CUDA Fortran Device Code Legacy PGI Compilers	6	7434	July 19, 2017
cufftExecR2C and cufftExecC2R API calls generates different results in different CUDA tool kit versions GPU-Accelerated Libraries cufft	1	1530	August 9, 2021
RTX shows bigger calculation difference with CPU than GTX? CUDA Programming and Performance	5	494	June 4, 2021
CUDA 2.1 Beta Problem/Bugs (Linux) CUDA Programming and Performance	5	1645	January 6, 2009
Cuda Latency problems Slow Cuda CUDA Programming and Performance	15	13927	September 5, 2008
cublasZgemm() gives false result for large data and potential bug GPU-Accelerated Libraries	6	1145	October 12, 2021
Are double precision functions in CUDA MATH API only the copy-paste version of single precision func CUDA Programming and Performance	4	1953	June 28, 2014
Why performance is worse with CUBLAS- than with kernel-function GPU-Accelerated Libraries	3	848	September 5, 2019
discrepancy between CPU and GPU after a division (accuracy issue) CUDA Programming and Performance	3	1478	June 10, 2015

Cudss generates different solutions

Related topics