Hey!

I was wondering if anyone can answer a small question about CUBLAS.

I am using a GTX275 board and have 2 complex matrices of 1024*1024 x 1024*1000.

This complex multiplication is taking about 47 miliseconds (using the cublasCgemm() function) on average and I was wondering if there is any way I could speed it up. I need this speed up since I am hoping to have a real time system and hence I should drop this operation bellow 20ms if possible.

I am new to CUDA and just read about constant memory. The first matrix (1024*1024) will not change and hence it can be set by the CPU at start up. If I use texture or constant memory for this, would I see an improvement?

Are the CUBLAS functions open source? is there a link to them somewhere?

Thanks a lot for your help!