I am changing some of my labs (I am a university student in EE) simulation code from Intel mkl to CUDA. I am trying out the code on a GeForce GTX 560 (compute capability 2.1) and writing the code in Visual Studio 2010. Once working it will be run on Quadro 6000’s on our server.
I read that adding -arch-sm_20 enables double precision (tried and did nothing but may have not done it correctly), but I also read about nvcc directive -prec-div=true (a function that is dividing array elements is the culprit for double → float), but the programming guide mentioned that my setup should be double precision by default. My code does millions of double precision cuFFT’s that work fine but this type demoting in the division is destroying the result.
I am sure this must have been asked before so sorry for repeating but I would appreciate if someone could provide a link to a relevant post or help me out in some way.
Thanks in advance