Hi,
I want to calculate the covariance matrix on a large data set (MxN), say N=400,000 samples each with M=256 dimensions. This essentially boils down to AA[sup]T[/sup], however, using SGEMM from the CUBLAS library (toolkit 4.1) to compute this gives a cudaErrorUnknown error when I use a large N.
cublasSgemm(…)
cublasGetError() – returns OK
cudaDeviceSynchronize() – returns cudaErrorUnknown
My code works correctly when I use a smaller N, I can do other operations on this data and the Tesla C2070 has enough RAM.
Could I be running into a kernel time limitation? or is there some other limitation on the SGEMM function?
Is there another library which can do this for me?
Thanks,