I want to generate stall breakdown information for GAN application. Which option of nvprof can I use ?

Hi, I want to capture layer-wise information from my GAN application. From that information, I want to visualize total time taken (layer-wise) and stall breakdown information (this too layer-wise).

Here is sample code from my GAN application :

# G(z)
class generator(nn.Module):
    # initializers
    def __init__(self, d=128):
        super(generator, self).__init__()
        self.deconv1 = nn.ConvTranspose2d(100, d*8, 4, 1, 0)
        self.deconv1_bn = nn.BatchNorm2d(d*8)
        self.deconv2 = nn.ConvTranspose2d(d*8, d*4, 4, 2, 1)
        self.deconv2_bn = nn.BatchNorm2d(d*4)
        self.deconv3 = nn.ConvTranspose2d(d*4, d*2, 4, 2, 1)
        self.deconv3_bn = nn.BatchNorm2d(d*2)
        self.deconv4 = nn.ConvTranspose2d(d*2, d, 4, 2, 1)
        self.deconv4_bn = nn.BatchNorm2d(d)
        self.deconv5 = nn.ConvTranspose2d(d, 1, 4, 2, 1)

I have a couple of questions here :

  1. Let’s say I want to monitor deconv1 layer, should I put it in –kernel argument ?

What should my nvprof argument look like ?

  1. How can I capture the invocation order , kernel ID and kernel name through nvprof ?


I found something like this

--kernels <kernel path syntax>
                        This option changes the scope of subsequent "--events", "--metrics"
                        options. The syntax is as following:
                        	<kernel name>
                        	<context id/name>:<stream id/name>:<kernel name>:<invocation>
                        The context/stream IDs, names, kernel name and invocation
                        can be regular expressions. Empty string matches any number
                        or characters. If <context id/name> or <stream id/name>
                        is a positive number, it's strictly matched against the
                        CUDA context/stream ID. Otherwise it's treated as a regular
                        expression and matched against the context/stream name


Can anyone please tell me what is context id , stream id , invocation here ?


Thanks for using profile tools.
I have already posted your question in our internal system, any response, will get back to you immediately.

Best Regards
Vera J

hi @Vera J,

any updates on the same?


Sorry for the late response.

It looks like you has 2 problems

  1. Visualize total time taken (layer-wise)
    Best way to do this is using NVTX range APIs.

  2. Stall breakdown information (this too layer-wise).
    Suppose you are talking about warp stall reasons metrics.
    After visualizing the kernels in the layer, you can profile the specific kernels using --kernels argument.
    You can skip the context/stream argument and only specify kernel and invocation using --kernels “:::”.

How can I capture the invocation order , kernel ID and kernel name through nvprof ?
There is no kernel ID. Kernel name is obtained in timeline run.
Invocation order can be captured only through visualization and NVTX ranges will help.