I’m using both CUBLAS and MKL for my research. “dgemm” in CUBLAS is much much faster than that in MKL on my computer, but I found that “daxpy” in CUBLAS is slightly slower than that in MKL on the same machine. Why is that? Can anyone give me a hint?
Thank you in advance!