correlation matrix calculating correlation matrices on GPU

Hallo wizards,

has someone already worked on accelerating the calculation of correlation matrices for moderately large datasets (> 20 factors , > 50000 realizations)?

best to all and long life to CUDA.

Can’t this operation be done with matrix multiplies ie B*A’ ? cublas SGEMM (or CGEMM if complex) should do the trick.