Profling a simple deep learning code : no python backtrace + cannot use cudnn trace

Monkey.py · November 3, 2023, 3:01pm

Hi,

I am discovering Nsys for profiling deep learning models, Thus I tried simple with this code:

import torch
import torch
import torch.nn as nn

class MyModule(torch.nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.qkv = nn.Linear(128, 384)

    def forward(self, xQuery):
        torch.cuda.nvtx.range_push("linear")
        qkv=self.qkv(xQuery) # LINEAR(123,3*128)
        torch.cuda.nvtx.range_pop()
        torch.cuda.nvtx.range_push("chunk")
        qkv=qkv.chunk(3,dim=-1)
        torch.cuda.nvtx.range_pop()
        torch.cuda.nvtx.range_push("clone")
        q=qkv[0].clone()
        torch.cuda.nvtx.range_pop()
        torch.cuda.nvtx.range_push("permute")
        q = q.permute(0, 3, 1, 2)  # B, C, H, W
        torch.cuda.nvtx.range_pop()
        return q

torch.cuda.cudart().cudaProfilerStart()

torch.cuda.nvtx.range_push("initModel")
model = MyModule()
torch.cuda.nvtx.range_pop()
device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
torch.cuda.nvtx.range_push("ModelAndDataToGPU")
model.cuda()
xQuery = torch.randn(1,259, 259,128).to(device)
xFocalMaps=[torch.randn(1, 259, 259, 128).to(device), torch.randn(1, 130, 130, 128).to(device), torch.randn(1,65, 65, 128).to(device)]
torch.cuda.nvtx.range_pop()

for i in range(10):
    torch.cuda.nvtx.range_push(f"iteration{i}")
    result = model.forward(xQuery)
    torch.cuda.nvtx.range_pop()

torch.cuda.cudart().cudaProfilerStop()

To do this I found this CLI that seems out-dated :

nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas -s cpu  --capture-range=cudaProfilerApi --stop-on-range-end=true --cudabacktrace=true -x true -o my_profile python main.py

So I modified it to fit my code and by removing not working parameters :

nsys profile -w true -t cuda,nvtx,cublas -s cpu  --capture-range=cudaProfilerApi -x true -o icarusComparisonProfiler .\venv\scripts\python.exe icarusTest.py --python-sampling-frequency=1000 --python-sampling=true --python-backtrace=cuda --cudabacktrace=true

According to User Guide :: Nsight Systems Documentation I should be able to see a python backtrace in the report, yet there is not any python backtrace.

Also : I see that cudnn is not a valid argument for --trace option ? Is this normal ? As it seems that it was working in the past ?

I am using pytorch, do I need to install nvtx package on my python distribution for annotation ?

I am using NSYS 2023.3.1
Python 3.11.6

Thank you !

hwilper · November 3, 2023, 3:35pm

Your screen shot seems to show that you were running this on a Windows target.

You have the options to trace Python backtraces after your application name. The CLI assumes that any option after the application name is an argument to the application, rather than an argument to Nsys.

I am also confused as to why you are not seeing cudnn as a valid trace option, it certainly should be. That first command line should work.

Monkey.py · November 3, 2023, 3:53pm

Oh damn I feel so dumb that was obvious, of course the python backtrace is fully working now, yet for the cudnn trace it is not working :

nsys profile -w true -t cuda,nvtx,cublas,cudnn -s cpu  --capture-range=cudaProfilerApi -x true -o icarusComparisonProfiler --python-nvtx-annotations .\annotations.json --python-sampling-frequency=1000 --python-sampling=true .\venv\scripts\python.exe icarusTest.py
Illegal --trace argument 'cudnn'
Possible --trace values are one or more of 'cuda', 'nvtx', 'cublas', 'cublas-verbose', 'cusparse', 'cusparse-verbose', 'nvvideo', 'opengl', 'opengl-annotations', 'vulkan', 'vulkan-annotations', 'dx11', 'dx11-annotations', 'dx12', 'dx12-annotations', 'wddm' or 'none'

usage: nsys profile [<args>] [application] [<application args>]
Try 'nsys profile --help' for more information.

Can you help me with the cudnn thing ?

Thank you !

hwilper · November 3, 2023, 4:05pm

@skottapalli can you run a quick test on this and make sure that cudnn is working as expected? I would ask one of the Windows team, but they have today off.

Monkey.py · November 6, 2023, 6:25pm

Hi @hwilper, did you have the time to make the test ?
Thank you !

hwilper · November 7, 2023, 8:10pm

Let me ping Sneha again.

skottapalli · November 7, 2023, 8:23pm

@Monkey.py - we don’t support cudnn tracing on windows targets. It is available on all the Linux targets.

Monkey.py · November 7, 2023, 10:34pm

Thank you for your answer, yeah it is, I will have to use a Linux distribution I guess. I tried on WSL but it seems that cuda tracing is not available on WSL, is it ?

skottapalli · November 7, 2023, 11:07pm

CUDA tracing on WSL is not supported yet due to inherent limitations. We are trying to overcome those in a future release of Nsight Systems.

skottapalli · November 7, 2023, 11:08pm

Unfortunately, this means that you will need to use a Linux distribution. I apologize for the inconvenience.

Monkey.py · November 14, 2023, 1:39pm

@skottapalli

Hi, I created a linux ubuntu dual boot to access cudnn trace. I tested with the following code :

import torch
import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.linear = nn.Linear(128, 64)
    def forward(self,x):
        return self.linear(x)

torch.cuda.cudart().cudaProfilerStart()

torch.cuda.nvtx.range_push("initModel")
model= MyModel()
torch.cuda.nvtx.range_pop()

device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

torch.cuda.nvtx.range_push("ModelAndDataToGPU")
model.cuda()
xQuery = torch.randn(1,259, 259,128).to(device)
torch.cuda.nvtx.range_pop()

for i in range(10):
    torch.cuda.nvtx.range_push(f"iteration{i}")
    result = model.forward(xQuery)
    torch.cuda.nvtx.range_pop()
torch.cuda.cudart().cudaProfilerStop()

And i use the following line to profile :

sudo nsys profile -w true -t cuda,nvtx,cublas,cudnn -s cpu  --capture-range=cudaProfilerApi -x true -o icarusComparisonProfiler --python-sampling-frequency=1000 --python-sampling=true ./venv/bin/python ./Icarus/icarus/profiling/profCudnn.py

As expected no error shows up when I use the line, yet when I look at the profiling report there is still no cudnn trace.

Am I missing something ? The trace should be there right ?

skottapalli · November 14, 2023, 11:20pm

Could you share the nsys-rep file if you can to help me debug?

skottapalli · November 14, 2023, 11:22pm

Do you get cudnn traces if you remove the --capture-range=cudaProfilerApi? Do you see any warning or errors in the diagnostics page of the report in the GUI? It would be helpful to get the nsys-rep file from you so that I can check a few more things.

Monkey.py · November 17, 2023, 3:47pm

@skottapalli

You will find attached a .zip containing the python script, the report generated with and without --capture-range=cudaProfilerApi (see the readMe.txt which provides the command used to generate the files).

I tried to remove the --capture-range option but still no cudnn trace.
Any idea of what could cause the issue ?

Monkey.py · November 17, 2023, 3:49pm

In the report I have the error “Analysis 4589 00:02.643 No cuDNN events collected. Does the process use cuDNN?”. I suppose that a Linear layer should be using the cudnn library right ?

skottapalli · November 17, 2023, 4:02pm

I am not very familiar with the torch model and its use of cudnn. Could you try profiling a cudnn sample where we know it uses cudnn? This will help us isolate if there is a problem with cuddn tracing in nsys or if your original app is not using cudnn.

Monkey.py · November 22, 2023, 10:29am

I will try to do that, I am not very familiar with directly using cuda and cudnn, I just installed the toolkit to code a sample. If you have a sample to provide I will be glad to use it.

Monkey.py · November 23, 2023, 11:13am

Okay after many experiments (building torch from source, using nsys on cuda & cudnn library in C++, …) I figured out that torch is not using cudnn library for a linear & sigmoid layer, but it is using for conv2d for example, and with a conv2d layer the trace appears. SO it turns out that I have been searching for a cudnn trace for function that does not call cudnn …
Anyway thank you for your help and your time ! You’ve helped me a lot

skottapalli · November 29, 2023, 7:55pm

Thank you for trying the experiments.

Topic		Replies	Views
Broken Backtraces Profiling Linux Targets cudnn	2	361	April 11, 2025
Nsight --trace can not trace cublas and cudnn ... in windows10 Profiling x86 Windows Targets	1	1083	January 22, 2022
Can not get CUDA python backtrace Profiling Linux Targets	12	2426	May 7, 2023
No cuDNN info in nsys traces Profiling Linux Targets nsight , pytorch	2	1534	May 16, 2022
Nsys not collecting python backtrace with --python-backtrace=cuda Profiling Linux Targets cuda , python , cudnn	4	288	October 9, 2024
Call stack is visible/captured only for some CUDA kernels (broken backtraces) Profiling Linux Targets	5	1818	December 29, 2022
Nsys cli cannot trace cuda Profiling Embedded Targets	5	2563	May 13, 2022
Tracing cuDNN library version 90.6 is currently not supported Profiling Linux Targets ubuntu , pytorch , cudnn	3	201	January 28, 2025
How to get full profiling with Nsight system for a particular process Profiling Linux Targets cudnn	8	2461	September 23, 2024
Nsys profile failed when using pytorch cudagraph Profiling Linux Targets pytorch	4	622	June 26, 2024

Profling a simple deep learning code : no python backtrace + cannot use cudnn trace

Related topics