nsight system GUI fail to display python backtrace:
with the following comand
export CUDA_VISIBLE_DEVICES="7"
nsys profile \
-t cuda,nvtx,osrt,cudnn,cublas \
--capture-range=cudaProfilerApi \
--capture-range-end=stop \
--cudabacktrace=all \
-o test \
-f true \
--python-backtrace=cuda \
python test.py
test.py:
# test.py
import torch
from torch.cuda import nvtx
def main():
torch.cuda.profiler.start()
with nvtx.range("init"):
stream1 = torch.cuda.Stream()
stream2 = torch.cuda.Stream()
a = torch.zeros(233, device='cuda')
with torch.cuda.stream(stream1):
with nvtx.range("first add"):
a += 1
with torch.cuda.stream(stream2):
with nvtx.range("second add"):
a += 1
with nvtx.range("print"):
torch.cuda.synchronize()
average = a.mean().item()
print(f'Average of the elements in tensor a: {average}')
torch.cuda.profiler.stop()
if __name__ == '__main__':
main()
I have tried adding --python-sampling=true --python-sampling-frequency=2000
but still didn’t work.
Version
NVIDIA Nsight Systems version 2024.6.1.90-246134905481v0
pytorch 2.1.2+cu121
python 3.8