Wide Matrix Multiply Covariance cudaErrorUnknown SGEMM cudaErrorUnknown when trying to calculate for


I want to calculate the covariance matrix on a large data set (MxN), say N=400,000 samples each with M=256 dimensions. This essentially boils down to AA[sup]T[/sup], however, using SGEMM from the CUBLAS library (toolkit 4.1) to compute this gives a cudaErrorUnknown error when I use a large N.

cublasGetError() – returns OK
cudaDeviceSynchronize() – returns cudaErrorUnknown

My code works correctly when I use a smaller N, I can do other operations on this data and the Tesla C2070 has enough RAM.

Could I be running into a kernel time limitation? or is there some other limitation on the SGEMM function?
Is there another library which can do this for me?


Can you post the code you are using to get this error ?

What OS host do you use?

How do you call cublasSgemm ? what value for m,n and k do you pass?


I built a new simple project to demonstrate the problem for you guys, but it worked in that project. So copied the build settings and the re-implementation of that part over and it works now.