how to embed source code information with nsight compute cli

I have a few independent questions:

I have read https://docs.nvidia.com/nsight-compute/NsightCompute/index.html#profiler-report-source-page but there is not information on how to embed the source code information with nsight compute cli.

I have generated my profiler output with cli:

nv-nsight-cu-cli --profile-from-start off --export run_test run_test

and my compilation flags are:

NVCC = nvcc
DEBUG = -g -G
ARCH=sm_61
NVCC_FLAGS = -arch=$(ARCH) -std=c++11
NVCC_FLAGS += --use_fast_math
VERBOSE = --ptxas-options=-v

default: run_test

run_test: ref_kernel.cuh run_test.cu
	$(NVCC) $(NVCC_FLAGS) -rdc=true -lcudadevrt run_test.cu -o run_test

when I move the exported profile file to my local and open with Nsight Compute, I can only see SASS and not source code.

What compilation flags or cli options must I use to enable CUDA C source code with SASS?

  1. is there an equivalent of “gld_transactions_per_request” and “gst_transactions_per_request” in nsight compute UI?

  2. according to the documentation here: https://docs.nvidia.com/nsight-compute/NsightCompute/index.html#statistical-sampler, “Barrier - Warp was stalled waiting for sibling warps at a CTA barrier.”. Since blocks are abstractions over CTAs, is this only for any block-wise synchronization stalls? how about stalls caused by PTX assembly barrier calls:

barrier.sync
barrier.arrive

Are the above barrier stalls also included in the “Barrier” metric? Lastly, are warp-wise and grid-wise barrier stalls also included?

Thanks as always.

Sincerely,
Isaac Lee

Hi Isaac,

CUDA-C source is not embedded or stored with Nsight Compute profile reports, only the cubin is. To allow the tools to correlate SASS instructions with CUDA-C source line information, you need to pass one of -lineinfo or -G (which includes -lineinfo) during compilation. I would recommend to use only -lineinfo, since you likely want to profile an optimized executable, while -G will get you a debug build.

If you inspect the profile report on a machine different from the one where your application was built, you have to make the source locally available. You can either copy it into the same directory on the local machine, and Nsight Compute will find it automatically, or you can use the “Resolve” button on the Source page to point Nsight Compute to the new location. The Resolve button is visible when you select an unresolved source file in the “CUDA-C” view.

Hi Felix,

Thanks a lot!