cublasSdot_v2() gives different results when running on different GPU types,

I’m running the same binary code on computers with different GPU types, with the same input parameters.
The code only calls cublasSdot_v2(), the input values are 32 bit floats, and the results on the 2 machines are different by 0.0000152587890625 or 0.000030517578125.

Is this a known issue?
Is there a way to receive binary-exact results on 2 different GPU types?


Generally speaking, GPU architecture specific kernels in CUBLAS are a thing. Whether SDOT is one of the BLAS functions affected, I do not know.

Architecture-specific kernels usually do use a different order of floating-point operations, and since algebraically identical computation is usually not identical in finite-precision floating-point, bit-wise identical BLAS results are not guaranteed across GPU architecture of CUDA versions.

SDOT is a function that is subject to a numerical phenomenon called subtractive cancellation, and when that occurs, relative error can get almost arbitrarily large. Whether this explains the observation reported we cannot tell, because the question does not include a minimal self-contained example code that reproduces the observation.

You may also want to double-check the assertion that the input data to the SDOT call is bit-wise identical on both platforms.

1 Like


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.