cublasHgemm did not faster than cublasSgemm on 2080Ti

haoliuhust · July 31, 2020, 3:59am

I am testing cublasHgemm on 2080Ti, according to the product docs, 2080Ti has fast fp16 mode which should be 2x faster than fp32, but when I run it on 2080Ti, it did not faster. the benchmark app was compiled on 1080Ti with cuda 10.2 and then run on 2080Ti, I have add nvcc flag -arch=sm_75.

haoliuhust · August 4, 2020, 10:10am

according to my test, the matrix need to be very large, then fp16 will be faster than fp32

mnicely · September 14, 2020, 2:35pm

Can you provide a reproducer and results?