-arch sm_13 business


I’m running code on either a K20x, K40x or Titan V. The cluster picks one if I don’t specify or I can select one of them. The source code (a .cu file) and was compiled on CUDA ver 9.1 nvcc compiler. Everything that isn’t an integer is declared double precision floating point (double). I don’t seem to be getting the same answer that I was getting on my Windows CPU version (C++/CLI). Of course the Windows version took a lot longer to run. I’m pretty sure that I properly parallelized and ported the code to the CUDA C. So, I’m reading forums that say unless one uses the -arch sm_13 nvcc option NVCC will convert all the doubles to floats which might explain the discrepancy that I’m seeing in the results from CPU results to GPU results.

So just to confirm, is this -arch sm_13 business still required even in the newer CUDA compilers if one wants to use doubles?


You must be reading some really old forum contributions, from 2007 or so. sm_13 was the first GPU architecture with hardware support for double precision. At the time it was introduced, the CUDA compiler defaulted to an sm_10 target, so specifying -arch=sm_13 was necessary to get proper double-precision support.

As you should have found out by now, the compiler from CUDA 9.x won’t let you compile for any architecture prior to sm_30, which is also the default target architecture if you don’t specify one.

It is a best practice to always compile for the specific architecture(s) one intends to use the executable on. In your case the relevant two architectures are sm_35 for K20x and K40x, and sm_70 for the Titan V.

Thank you, njuffa for the info. It is succinct, to the point and very helpful. I suspected the forums were dated but often forum entries don’t have a time stamp.