I am converting my code from CPU to GPU. The original CPU code has been tested and gives correct results. However, I found that, because of the precision limitation (SP) of GPU, my codes fails in the GPU environment. The algorithm that I am working on is consisting some of the basic Blas 1 and Blas 2 operations such as: matrix vector vector, saxpy, norm_2 etc. The library that I am using is cublas, and the GPU card that I am running on is Tesla C-1060. Could someone here give me some points on how to compensate for the lost of the accuracy under GPU with SP precision?
I’d appreciate for your help!