What is the difference between
What is the reason why nvcc uses two stages of compilation? I have read about it here (https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#virtual-architectures) but it is still not clear.
The guide (https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#virtual-architectures) mentions I should use low possible virtual architecture and high possible GPU architecture. The reason, according to the guide, is setting the required features as small as possible gives more options for compiler to chose from a wider range of GPU architectures. I don’t understand the notion of chosing from a pool of options in the second stage. Isn’t the GPU architecture fixed ( ex. if I’m using GTx 1080, then I am fixed with sm_61 and arch_61 )?
It also implies that the different choices have different impact on the performance. How is this possible?