I tried to accelerate my code by reducing the numbers accuracy from float (32 bits) to __half (16 bits).
I’m using cuSPARSE to do matrix/vector multiplications - using the cusparseSpMV generic API.
I expected to get about x2 acceleration, but to my surprise I’ve got about x2 deceleration !
In my example the sparse matrix size is: COLS,ROWS,NNZ: 768800 80000 61504000 (have 80 non-zero elements per column).
Typical run time, of GTX1080Ti, for half and single precision I’m getting are:
FLOAT Ax run time: 3.841 msec.
FLOAT A^Ty run time: 2.09 msec.
HALF Ax run time: 9.885 msec.
HALF A^Ty run time: 4.666 msec.
I’m using CUDA 11.0 of fedora 31. The cuSPARSE version is 184.108.40.206. Driver version 450.66.
Do I have to configure cuSPARSE specially to get acceleration in half precision?
Thanks in advance,