I have programmed an iterative solver using the BiCGSTAB and PBiCGSTAB methods using CUDA. Given a dataset, this code will sometimes converge, and sometimes it won’t. Different behaviour beween runs, same dataset. And when it does converge, the time and number of iterations it takes for the solver to converge will vary drastically (± 50% on both times and iterations).
This holds true when running on a P1000 and 3070Ti (Mobile), but NOT on a T1000 (8GB), which always requires the EXACT same number of iterations to find a solution. When it does converge for the other two GPUs, the solutions are the same for all three GPUs. The T1000 and the P1000 even run on the same driver. The T1000 we are using does not have ECC (I believe some support that feature).
There are no random values in the code, and I see perfect reproducibility across all platforms when running my CG numerical methods, which tells me it has something to do with the numerical instabilities inherent to the BiCGSTAB method. What I don’t understand, is why this only affects one GPU.
Can someone explain to me why this is happening? I understand that BiCGSTAB is not guaranteed to converge, but why would it converge on one run, and not the next? I also understand CUDA does not guarantee reproducibility, but why then is it fully reproducable (even when using two different PC’s) on the T1000? Does the T1000’s architecture offer different features which reduce numerical errors? I have been unable to find anything like that in any publicly available datasheets.