I’ve read the the “programming guide”, as all of you I guess, and I’ve learnt many things on section “Example of Matrix Multiplication”.
Actually, I’ve slightly modified this example because I do a very similar things (sum of substractions instead of sum of multiplications). I works perfectly and it’s faster than the naive approach.
However, I’ve noticed an interesting phrase in the documentation :
I think that there is an improved (faster) version. Is anybody know how to speed-up this mutiplication?
Actually, I don’t want to do a matrix multiplication.
For example, if you multiply A by B, the result martix C is built as follow :
C[r][c] = sum i=0 to d-1 ( A[r][i] * B[i][c] )
Me, I’d like to do
C[r][c] = sum i=0 to d-1 ( ( A[r][i] - B[i][c] ) * ( A[r][i] - B[i][c] ) )
This is not exactly the same thing but it’s very similar.
So, As you can see, I cannot use CUBLAS.
Is it possible to have the source for function such as SGEMM. I could modify slightly this function to do exactely what I want.