Hi,
I am discovering Nsys for profiling deep learning models, Thus I tried simple with this code:
import torch
import torch
import torch.nn as nn
class MyModule(torch.nn.Module):
def __init__(self):
super(MyModule, self).__init__()
self.qkv = nn.Linear(128, 384)
def forward(self, xQuery):
torch.cuda.nvtx.range_push("linear")
qkv=self.qkv(xQuery) # LINEAR(123,3*128)
torch.cuda.nvtx.range_pop()
torch.cuda.nvtx.range_push("chunk")
qkv=qkv.chunk(3,dim=-1)
torch.cuda.nvtx.range_pop()
torch.cuda.nvtx.range_push("clone")
q=qkv[0].clone()
torch.cuda.nvtx.range_pop()
torch.cuda.nvtx.range_push("permute")
q = q.permute(0, 3, 1, 2) # B, C, H, W
torch.cuda.nvtx.range_pop()
return q
torch.cuda.cudart().cudaProfilerStart()
torch.cuda.nvtx.range_push("initModel")
model = MyModule()
torch.cuda.nvtx.range_pop()
device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
torch.cuda.nvtx.range_push("ModelAndDataToGPU")
model.cuda()
xQuery = torch.randn(1,259, 259,128).to(device)
xFocalMaps=[torch.randn(1, 259, 259, 128).to(device), torch.randn(1, 130, 130, 128).to(device), torch.randn(1,65, 65, 128).to(device)]
torch.cuda.nvtx.range_pop()
for i in range(10):
torch.cuda.nvtx.range_push(f"iteration{i}")
result = model.forward(xQuery)
torch.cuda.nvtx.range_pop()
torch.cuda.cudart().cudaProfilerStop()
To do this I found this CLI that seems out-dated :
nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas -s cpu --capture-range=cudaProfilerApi --stop-on-range-end=true --cudabacktrace=true -x true -o my_profile python main.py
So I modified it to fit my code and by removing not working parameters :
nsys profile -w true -t cuda,nvtx,cublas -s cpu --capture-range=cudaProfilerApi -x true -o icarusComparisonProfiler .\venv\scripts\python.exe icarusTest.py --python-sampling-frequency=1000 --python-sampling=true --python-backtrace=cuda --cudabacktrace=true
According to User Guide :: Nsight Systems Documentation I should be able to see a python backtrace in the report, yet there is not any python backtrace.
Also : I see that cudnn is not a valid argument for --trace option ? Is this normal ? As it seems that it was working in the past ?
I am using pytorch, do I need to install nvtx package on my python distribution for annotation ?
I am using NSYS 2023.3.1
Python 3.11.6
Thank you !