I am trying to profile a deep learning script, which uses cupy to compute on Nvidia GPU. Using Nsight compute ( ncu cli)
iwia050h@a0705:~/numpy-transformer-master/transformer$ ncu python3.9 transformer.py
Background: ncu without any paramters in cli, and running on complete script works fine. I am facing a problem with profiling specific parts of the code.
The profiling is expectedly to take 651 hours., for that reason I want to profile only specific part.
The aim is to Profile the “self_attention.py” or "attention"part of the code, the outcome would be a .rep file that could be opened with Nsight compute.
The script self_attention.py returns attention
when I run transformers.py it calls a subroutine knows as self_attention.py and I want to profile the specic part when this function is being called -
Initially loading the modules and allocating a node on Alex.
The directory with node is iwia050h@a0705:~/numpy-transformer-master/transformer$
CLI commands I have used are:
iwia050h@a0705:~/numpy-transformer-master/transformer$ ** ncu -k regex:attention python3.9 transformer.py**
iwia050h@a0705:~/numpy-transformer-master/transformer$ ncu -k attention python3.9 transformer.py
iwia050h@a0705:~/numpy-transformer-master/transformer$ ncu --kernel-name attention python3.9 transformer.py
iwia050h@a0705:~/numpy-transformer-master/transformer$ ncu --kernel-name self_attention.py python3.9 transformer.py
It starts the profiling with :
==PROF== Connected to process 356412 (/apps/python/3.9-anaconda/bin/python3.9)
After the code is run the Error code I receive is :
==PROF== Disconnected from process 356412
==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.
It is not profiling the self_attention or attention, could you please guide to it?
What part am i doing wrong?