Hi,

I want to calculate the covariance matrix on a large data set (MxN), say N=400,000 samples each with M=256 dimensions. This essentially boils down to AA[sup]T[/sup], however, using SGEMM from the CUBLAS library (toolkit 4.1) to compute this gives a cudaErrorUnknown error when I use a large N.

cublasSgemm(…)

cublasGetError() – returns OK

cudaDeviceSynchronize() – returns cudaErrorUnknown

My code works correctly when I use a smaller N, I can do other operations on this data and the Tesla C2070 has enough RAM.

Could I be running into a kernel time limitation? or is there some other limitation on the SGEMM function?

Is there another library which can do this for me?

Thanks,