I wrote a very simple cuda program called vecAdd.cu, the gpu in my machine is L4, when I compile it as follows:
nvcc --verbose vecAdd.cu -o vecAdd
I see that there is a output: “ptxas -arch=sm_52 -m64 “/tmp/tmpxft_00002390_00000000-6_vecAdd.ptx” -o "/tmp/tmpxft_00002390_00000000-10_vecAdd.sm_52.cubin”
But the cc of L4 is actually sm_89, why does nvcc use sm_52 here, is it wrong?
SM 5.2 is the default value used by nvcc, see the docs here.
Thanks, I see it says:
is equivalent to
nvcc x.cu --gpu-architecture=compute_52 --gpu-code=sm_52,compute_52
Here, my further question is if my machine is L4(sm_89), the built binary code as above will follow JIT execution path which will load ptx code and compile against L4 platform?
Yes, although bear in mind that the compiler has optimised based on the sm52 and so may not fully take advantage of features present in sm89 - your driver will JIT compile using sm52 PTX.
If you haven’t already, you may want to check through Section 3.1 of the Programming Guide.