I have a few independent questions:
I have read https://docs.nvidia.com/nsight-compute/NsightCompute/index.html#profiler-report-source-page but there is not information on how to embed the source code information with nsight compute cli.
I have generated my profiler output with cli:
nv-nsight-cu-cli --profile-from-start off --export run_test run_test
and my compilation flags are:
NVCC = nvcc
DEBUG = -g -G
ARCH=sm_61
NVCC_FLAGS = -arch=$(ARCH) -std=c++11
NVCC_FLAGS += --use_fast_math
VERBOSE = --ptxas-options=-v
default: run_test
run_test: ref_kernel.cuh run_test.cu
$(NVCC) $(NVCC_FLAGS) -rdc=true -lcudadevrt run_test.cu -o run_test
when I move the exported profile file to my local and open with Nsight Compute, I can only see SASS and not source code.
What compilation flags or cli options must I use to enable CUDA C source code with SASS?
-
is there an equivalent of “gld_transactions_per_request” and “gst_transactions_per_request” in nsight compute UI?
-
according to the documentation here: https://docs.nvidia.com/nsight-compute/NsightCompute/index.html#statistical-sampler, “Barrier - Warp was stalled waiting for sibling warps at a CTA barrier.”. Since blocks are abstractions over CTAs, is this only for any block-wise synchronization stalls? how about stalls caused by PTX assembly barrier calls:
barrier.sync
barrier.arrive
Are the above barrier stalls also included in the “Barrier” metric? Lastly, are warp-wise and grid-wise barrier stalls also included?
Thanks as always.
Sincerely,
Isaac Lee