Don't see the SASS code via objdump

May I know, why I don’t see the SASS code of the a binary file which is built using the following nvcc command?

$ nvcc -arch=sm_75 -use_fast_math -Xptxas -O3,-v mm.cu -lcublas -lcurand -o mm.2080ti
ptxas info    : 0 bytes gmem
$ cuobjdump -sass mm.2080ti

Fatbin elf code:
================
arch = sm_75
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_75

Fatbin elf code:
================
arch = sm_75
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_75

Fatbin ptx code:
================
arch = sm_75
code version = [6,4]
producer = <unknown>
host = linux
compile_size = 64bit
compressed
ptxasOptions = -O3 -v

The only sass code you’ll see is the sass for kernels you actually have in your code. If you are making calls to CUBLAS (only) then you won’t see any SASS in that case.

If you want to see the CUBLAS SASS, then use cuobjdump on the cublas library.

You are right. Thank you.
I wanted to check that because, I have specified SM_75 in

nvcc -arch=sm_75 -use_fast_math -Xptxas -O3,-v mm.cu -lcublas -lcurand -o mm.2080ti

but when I profile with nsight, I see the kernel name starts with volta_*
I expected to see turing_*

Please see

$ ~/sdk/deviceQuery/deviceQuery
deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce RTX 2080 Ti"
  CUDA Driver Version / Runtime Version          10.1 / 10.1
  CUDA Capability Major/Minor version number:    7.5
...
$ nvcc -arch=sm_75 -use_fast_math -Xptxas -O3,-v mm.cu -lcublas -lcurand -o mm.2080ti
ptxas info    : 0 bytes gmem
$ nv-nsight-cu-cli ./mm.2080ti 100
==PROF== Connected to process 44682
==PROF== Profiling "generate_seed_pseudo" - 1: 0%....50%....100% - 33 passes
==PROF== Profiling "gen_sequenced" - 2: 0%....50%....100% - 32 passes
==PROF== Profiling "generate_seed_pseudo" - 3: 0%....50%....100% - 33 passes
==PROF== Profiling "gen_sequenced" - 4: 0%....50%....100% - 32 passes
==PROF== Profiling "volta_sgemm_32x32_sliced1x4_nn" - 5: 0%....50%....100% - 32 passes
==PROF== Disconnected from process 44682

Is that OK?

cublas is a compiled library.

It does not matter what compilation settings you make when you call into that library.

The library decides for itself what GPU it is running on, and what kernels it will call. You have essentially no control over that, and sm_75 compilation for your code doesn’t change the library behavior at all.

Yes, its OK. If the cublas team feels that an already-designed kernel called volta_sgemm_… is perfectly suited for use on Turing, they may very well reuse that kernel, even though you are running on a Turing GPU. There is not a separate set of kernels for every possible architecture.