I want to generate stall breakdown information for GAN application. Which option of nvprof can I use ?

shounakrockz47 · July 23, 2018, 7:09pm

Hi, I want to capture layer-wise information from my GAN application. From that information, I want to visualize total time taken (layer-wise) and stall breakdown information (this too layer-wise).

Here is sample code from my GAN application :

# G(z)
class generator(nn.Module):
    # initializers
    def __init__(self, d=128):
        super(generator, self).__init__()
        self.deconv1 = nn.ConvTranspose2d(100, d*8, 4, 1, 0)
        self.deconv1_bn = nn.BatchNorm2d(d*8)
        self.deconv2 = nn.ConvTranspose2d(d*8, d*4, 4, 2, 1)
        self.deconv2_bn = nn.BatchNorm2d(d*4)
        self.deconv3 = nn.ConvTranspose2d(d*4, d*2, 4, 2, 1)
        self.deconv3_bn = nn.BatchNorm2d(d*2)
        self.deconv4 = nn.ConvTranspose2d(d*2, d, 4, 2, 1)
        self.deconv4_bn = nn.BatchNorm2d(d)
        self.deconv5 = nn.ConvTranspose2d(d, 1, 4, 2, 1)

I have a couple of questions here :

Let’s say I want to monitor deconv1 layer, should I put it in –kernel argument ?

What should my nvprof argument look like ?

How can I capture the invocation order , kernel ID and kernel name through nvprof ?

Edit

I found something like this

--kernels <kernel path syntax>
                        This option changes the scope of subsequent "--events", "--metrics"
                        options. The syntax is as following:
                        	<kernel name>
                        or
                        	<context id/name>:<stream id/name>:<kernel name>:<invocation>
                        The context/stream IDs, names, kernel name and invocation
                        can be regular expressions. Empty string matches any number
                        or characters. If <context id/name> or <stream id/name>
                        is a positive number, it's strictly matched against the
                        CUDA context/stream ID. Otherwise it's treated as a regular
                        expression and matched against the context/stream name

https://helpmanual.io/help/nvprof/

Can anyone please tell me what is context id , stream id , invocation here ?

veraj · July 24, 2018, 7:10am

Hi,

Thanks for using profile tools.
I have already posted your question in our internal system, any response, will get back to you immediately.

Best Regards
Vera J

shounakrockz47 · July 31, 2018, 6:30pm

hi @Vera J,

any updates on the same?

veraj · August 6, 2018, 2:51am

Hi,shounarkrockz47

Sorry for the late response.

It looks like you has 2 problems

Visualize total time taken (layer-wise)
Best way to do this is using NVTX range APIs.
Stall breakdown information (this too layer-wise).
Suppose you are talking about warp stall reasons metrics.
After visualizing the kernels in the layer, you can profile the specific kernels using --kernels argument.
You can skip the context/stream argument and only specify kernel and invocation using --kernels “:::”.

How can I capture the invocation order , kernel ID and kernel name through nvprof ?
There is no kernel ID. Kernel name is obtained in timeline run.
Invocation order can be captured only through visualization and NVTX ranges will help.

Topic		Replies	Views
Profiling deadloop (replay kernel) with nvprof on deep neural network Visual Profiler and nvprof	8	3395	August 24, 2017
questions about using nvprof to profiler caffemodel Visual Profiler and nvprof	0	1276	March 14, 2019
nvprof --print-api-trace - puzzling outputs. Visual Profiler and nvprof	0	665	January 7, 2020
Matching layer to nvprof output Jetson AGX Xavier	2	453	October 18, 2021
nvprof supported metrics Visual Profiler and nvprof	1	1915	January 6, 2015
nvprof to profile multiple kernel names Visual Profiler and nvprof	3	2639	April 1, 2019
nvprof metrics (Stalls) Visual Profiler and nvprof	0	1664	February 27, 2015
How to control profiling start time using Nsight System gui like --capture-range=cudaProfilerApi in cli Profiling Linux Targets nsight	12	4503	April 4, 2023
CUDA Pro Tip: Generate Custom Application Profile Timelines with NVTX Technical Blog	6	742	September 19, 2022
Toolkit V9 RC 103 Visual Profiler and nvprof	1	958	August 29, 2017

I want to generate stall breakdown information for GAN application. Which option of nvprof can I use ?

Related topics