Question on virtual architecture and 2 levels of compilation

  1. What is the difference between compute_xy and sm_xy ( ex. compute_50 vs sm_50 )?

  2. What is the reason why nvcc uses two stages of compilation? I have read about it here (https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#virtual-architectures) but it is still not clear.

  3. The guide (https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#virtual-architectures) mentions I should use low possible virtual architecture and high possible GPU architecture. The reason, according to the guide, is setting the required features as small as possible gives more options for compiler to chose from a wider range of GPU architectures. I don’t understand the notion of chosing from a pool of options in the second stage. Isn’t the GPU architecture fixed ( ex. if I’m using GTx 1080, then I am fixed with sm_61 and arch_61 )?

  4. It also implies that the different choices have different impact on the performance. How is this possible?