Matrix multiplication

Ladies and gentlemen,

I’ve read the the “programming guide”, as all of you I guess, and I’ve learnt many things on section “Example of Matrix Multiplication”.

Actually, I’ve slightly modified this example because I do a very similar things (sum of substractions instead of sum of multiplications). I works perfectly and it’s faster than the naive approach.

However, I’ve noticed an interesting phrase in the documentation :

I think that there is an improved (faster) version. Is anybody know how to speed-up this mutiplication?

Thanks,

Vince

Sure, take the CUBLAS library SGEMM for instance.

Hand tuning on the BLOCKSIZE, thus utilizing more shared memory would be an option

Actually, I don’t want to do a matrix multiplication.
For example, if you multiply A by B, the result martix C is built as follow :

C[r][c] = sum i=0 to d-1 ( A[r][i] * B[i][c] )

Me, I’d like to do

C[r][c] = sum i=0 to d-1 ( ( A[r][i] - B[i][c] ) * ( A[r][i] - B[i][c] ) )

This is not exactly the same thing but it’s very similar.
So, As you can see, I cannot use CUBLAS.
Is it possible to have the source for function such as SGEMM. I could modify slightly this function to do exactely what I want.

Thanks
Vince

Check the announcements forum. http://forums.nvidia.com/index.php?showtopic=59101