Eigendecomposition of a matrix with CULA ?

Hello,

I need to compute all eigenvalues and eigenvectors of a complex hermitian double precision matrix. I found a function in CULA that does this (only on real matrix in current version 1.2 but will work on complex in 2.0 version). I understood that high GPU/CPU speedup is reached when the size of the matrix is huge.
Problem is that I need to find all eigenvalues/vectors of 180 of these matrix every second, that represents about 20/30 double precision Gflops.

This calculation is currently made on a very expensive 3 years old Mercury platform in 1,25 seconds using 20 SPARC CPUs.

The purpose is to replace this machine with a single PC.

Should I go for CULA or do my own CUDA program ? Would it be really faster than a CPU program (with Intel’s MKL for exemple) ?

Tests will be made on a bi-xeon E5420/Tesla C1060 computer but we plan to get a Nehalem/Fermi station as soon as I will be available.

Thanks