I’m running code on either a K20x, K40x or Titan V. The cluster picks one if I don’t specify or I can select one of them. The source code (a .cu file) and was compiled on CUDA ver 9.1 nvcc compiler. Everything that isn’t an integer is declared double precision floating point (double). I don’t seem to be getting the same answer that I was getting on my Windows CPU version (C++/CLI). Of course the Windows version took a lot longer to run. I’m pretty sure that I properly parallelized and ported the code to the CUDA C. So, I’m reading forums that say unless one uses the -arch sm_13 nvcc option NVCC will convert all the doubles to floats which might explain the discrepancy that I’m seeing in the results from CPU results to GPU results.
So just to confirm, is this -arch sm_13 business still required even in the newer CUDA compilers if one wants to use doubles?