As part of my learning Cuda and how to use it in conjunction with Fortran, I have written Fortran code that creates, randomly, two square matrices in single complex form. I then have five different approaches to multiplying them:

-Naive, brute force row-column multiplication and addition;

-use the Fortran intrinsic MATMUL;

-use MKL gemm;

-use my own Cuda code;

-use CUBLAS Cgemm.

After computing with each of the above methods, I calculate the norm of the matrix. The first three methods, which are native to Fortran, give me identical answers. The last two methods also give identical answers to each other, but significantly different from the first group.

When I print out the resulting matrix for each method, the matrix from the first group is markedly different from the first group. I have twiddled the ‘N’ and '‘T’ options on the CUBLAS Cgemm but it doen’t make any difference.

The nVidia card is a C1060, and it runs the MATMUL example correctly. Unfortunately, that is only in single precision.

I would be interested in all comments. If you have used the CUBLAS Cgemm successfully, I would appreciate an opportunity to communicate with you.

Thanks

Malcolm