problem with eigenvalue decomposition of hermitian matrix with cusolverDnDsyevd

Hello,

I want to compute the eigenvectors and eigenvalues of a positive semi-definite hermitian matrix with cusolverDnDsyevd. I need to compute it in double precission.
So far I was able to compute any real symmetric matrix with double precission using the example provided in the dokumentation of the cuda 8.0 Toolkit D. Examples of Dense Eigenvalue Solver. http://docs.nvidia.com/cuda/cusolver/index.html#syevd-example1
The matrices that I need to compute are complex but hermitian. cusolverDnDsyevd allows me datatypes of cudoublecomplex and returns with no error. Sadly my results are completly wrong.

I could help myself by using the single value decomposition provided by cusolverDnDgesvd, see example E. Examples of Singular Value Decomposition in the Toolkit Documentation.
http://docs.nvidia.com/cuda/cusolver/index.html#svd-example1
Since the matrix is positive semi-definit and hermitian the single value decomposition returns the eigenvalues and eigenvectors. I get correct results even with cudoublecomplex datatypes.
My problem now is that the cusolverDnDgesvd is very slow, so that a intel i7 with mkl is faster in computing then the Tesla K40 that I use. I have changed the matrix size to 3000x3000 in all the examples.

My questions are:
Is it possible to compute complex hermitian matrices with the cuda 8.0 provided toolkit and cusolverDnDsyevd?
The documentation is a little bit unclear.

Is something known about the performance of the single value decomposition provided by the toolkit?
By that does it need still a lot of optimization or do I need special flags for nvcc compiler to increase the speed?

As cuda 9.0 is on the horizon, are optimization done on the eigenvalue decomposition and the single value decomposition?

Thank you very much.

Cheers,
Blue

there are some ideas in this article - https://devtalk.nvidia.com/cmd/default/download-comment-attachment/43356/