Hi
I seems that a 3.5GB nsight report file as below
$ ls -lh 2080ti.rep.nsight-cuprof-report
-rw-r--r-- 1 mahmood mahmood 3.4G Sep 11 08:25 2080ti.rep.nsight-cuprof-report
Takes about 70GB of RAM when I want to import it via “nv-nsight-cu-cli -i 2080ti.rep.nsight-cuprof-report”
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3062 mahmood 20 0 70.4g 70.4g 12988 R 99.7 74.6 27:41.40 nv-nsight-cu-cl
It is still growing and I guess it won’t be opened any more.
Is that normal?
I would say this is not expected. Please share the exact version of the tool you use, as well as the command line used to create the report.
nv-nsight-cu-cli --version
Please see below
$ nv-nsight-cu-cli --version
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2012-2019 NVIDIA Corporation
Version 2019.5.0 (Build 27346997)
I used nsight compute for a gromacs command.
nv-nsight-cu-cli --quiet -f -o 2080ti.rep ../single75/bin/gmx mdrun -nb gpu -v -deffnm nvt_5k
The compressed file is about 250MB. May I upload that and give you the download link?
Yes, you can send me the link as a direct message.
1 Like
We can replicate the issue, and will be looking into it.
Thank you very much. Looking forward to receive a workaround about that since I have stuck at some large files and I am not aware of the data inside the files.
The issue will be fixed in a future version of the tool.
Independent of this, it appears you collected metrics for multiple 10000 kernels in your application. This is not a recommended use case for Nsight Compute. Instead, you should try to identify the kernel relevant for optimization using Nsight Systems, and then do targeted profiling with Nsight Compute on those high-value kernels for further optimization. Profiling all kernels in an application like gromacs, where each each kernel is executed multiple times across the application iterations, is unlikely to yield interesting results.
1 Like
What I did is similar to “nvprof ./app” which shows the kernel time and number of invocations for each kernel. With that, nvprof shows the GPU time percent for each kernel. Then I was able to pick a kernel for further investigations.
So, I actually want to get kernel_time (avg) and number of invocations. Isn’t nsight compute suitable for that? I haven’t tried nsight systems yet.
Nsight Systems would be the right tool for this, as it captures a low-overhead trace of all (selected) GPU and CPU activities, very similar to nvprof when used in trace mode (default). Nsight Compute collects detailed performance metrics for individual kernels with significantly higher overhead and app perturbation, and thus works more similar to nvprof in metric/event profiling mode.
1 Like