GTX TITAN BLACK slow cuDoubleComplex performance!

My code is running equally fast with or without dp mode on.
I am mainly running cuDoubleComplex calculations which are in essence double precision calculations.

Do I need to adjust my code so it can work with a GTX Titan BLACK in DP mode?

If there is no real runtime difference between running with and without the DP mode enabled, to me the first things I would consider would be:

Are the double precision calculations actually the limiting factor?
Is your kernel memory bound, not compute bound?
If your kernels are really small, is your app host bound? (Especially a case on windows)

Someone might be able to point you at various profiling pages/documents which is where I would start.

They are the limiting factor: GPU load is 100% in NSight timeline.

I have run my code also on a GTX 970 which is only around half as slow which confirms my suspicion.
GTX 970 can do 135 DP Gflops, Titan black in non dp mode can achieve about 240 GFlops. It is the exact speed difference as my code.

My kernels are not small.

perhaps you don’t have full dp mode properly enabled on your titan black.

and 100% load doesn’t really have anything to do with DP usage in your GPU code.

OP, Do you have cuda-Z on your PC?

Take a screen shot of this exact output before and after you toggle the DP settings for the Titan Black.

This shot is what gives you a ‘general’ idea of you GPU functionality, for example here is the output for a GTX 780ti:


And your 64 bit DP numbers output should be better with a Titan Black than the GTX 780ti.