Importing large report file

mahmood.nt · September 11, 2020, 6:38am

Hi
I seems that a 3.5GB nsight report file as below

$ ls -lh 2080ti.rep.nsight-cuprof-report
-rw-r--r-- 1 mahmood mahmood 3.4G Sep 11 08:25 2080ti.rep.nsight-cuprof-report

Takes about 70GB of RAM when I want to import it via “nv-nsight-cu-cli -i 2080ti.rep.nsight-cuprof-report”

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  3062 mahmood   20   0   70.4g  70.4g  12988 R  99.7  74.6  27:41.40 nv-nsight-cu-cl

It is still growing and I guess it won’t be opened any more.
Is that normal?

felix_dt · September 11, 2020, 9:57am

I would say this is not expected. Please share the exact version of the tool you use, as well as the command line used to create the report.

nv-nsight-cu-cli --version

mahmood.nt · September 11, 2020, 10:03am

Please see below

$ nv-nsight-cu-cli --version
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2012-2019 NVIDIA Corporation
Version 2019.5.0 (Build 27346997)

I used nsight compute for a gromacs command.

nv-nsight-cu-cli --quiet -f -o 2080ti.rep ../single75/bin/gmx mdrun -nb gpu -v -deffnm nvt_5k

The compressed file is about 250MB. May I upload that and give you the download link?

felix_dt · September 11, 2020, 1:54pm

Yes, you can send me the link as a direct message.

felix_dt · September 14, 2020, 7:10am

We can replicate the issue, and will be looking into it.

mahmood.nt · September 14, 2020, 7:34am

Thank you very much. Looking forward to receive a workaround about that since I have stuck at some large files and I am not aware of the data inside the files.

felix_dt · September 14, 2020, 8:12am

The issue will be fixed in a future version of the tool.

Independent of this, it appears you collected metrics for multiple 10000 kernels in your application. This is not a recommended use case for Nsight Compute. Instead, you should try to identify the kernel relevant for optimization using Nsight Systems, and then do targeted profiling with Nsight Compute on those high-value kernels for further optimization. Profiling all kernels in an application like gromacs, where each each kernel is executed multiple times across the application iterations, is unlikely to yield interesting results.

mahmood.nt · September 14, 2020, 1:06pm

What I did is similar to “nvprof ./app” which shows the kernel time and number of invocations for each kernel. With that, nvprof shows the GPU time percent for each kernel. Then I was able to pick a kernel for further investigations.
So, I actually want to get kernel_time (avg) and number of invocations. Isn’t nsight compute suitable for that? I haven’t tried nsight systems yet.

felix_dt · September 14, 2020, 1:10pm

Nsight Systems would be the right tool for this, as it captures a low-overhead trace of all (selected) GPU and CPU activities, very similar to nvprof when used in trace mode (default). Nsight Compute collects detailed performance metrics for individual kernels with significantly higher overhead and app perturbation, and thus works more similar to nvprof in metric/event profiling mode.