I have a question regarding the accuracy of the CUBLAS single-precision multiplication, when we are working with large matrices. I’ve tried to compare the results generated by Sgemm and Dgemm in Cublas (v 2.3), with the CPU results obtained from MATLAB. (The error metric used is the sum of absolute difference of the two matrices).
I find that the error between the GPU-single and GPU-double versions seems to be increasing at a much faster rate compared to the corresponding CPU versions(please see attached file: precision_results.png). For example, in the case of random 3000x3000 matrices, the error values are the following:
err(CPU-single, CPU-double) = 456.64
err(GPU-single, GPU-double) = 4443.8 // an order of magnitude higher than CPU-difference
err(GPU-single, CPU-single) = 4486.3
err(GPU-double,CPU-double) = 8.36e-6.
The fact that both cases having large errors involve GPU-single seems to point to a loss of accuracy in this version for matrices of this size. Also, in the single precision case, some results are off (relative to the cpu) by the third decimal at this matrix size. (the magnitude of the numbers is about a few hundred).
I tried out the same experiment with the all matrix elements initialized to 1, and the errors came out to be a perfect 0. Is this a case of the single-precision algorithm accumulating error when all bits of a floating point number are used up (which is more likely when matrices are randomly initialized)? Could you please let us know your opinion on this?
Thanks and Regards,
p.s. Please find the code used to generate these results attached. The *.cu files are mex-wrappers around cublas calls, and the file trial.txt is the matlab script that runs it across various matrix sizes (Please rename this to trial.m to run it in MATLAB)
trial.txt (1.17 KB)
dgemm.cu (1.93 KB)
sgemm.cu (1.92 KB)