Nsys profiler get full /tmp

We are experiencing problems with the CLI nsys profiler when profiling an application on a Linux computing node based on AMD Epyc CPU based: during profiling, the /tmp directory of our compute node become completly full (5GB) in less than 30 seconds and, when the application ends, the nsys stays hung and does not complete and compose the profiling files.

$> df /tmp/
Filesystem                 1K-blocks    Used Available Use% Mounted on
/dev/mapper/systemvg-tmplv   5230592 5229744       848 100% /tmp
$> du -sh /tmp/
du: cannot read directory '/tmp/systemd-private-6afe0a80e5124eb484b91dd0ae3c54b0-chronyd.service-Yax7jq': Permission denied
39M     /tmp/

As soon as we interrupt the nsys profile (which is hung), the /tmp directory become empty again and the program exits.

I also tried to set TMDIR=/huge/work/area, but the /tmp directory get full in few minutes.

Any idea on how to control in which directory the nsys profiler will write its temporaries and overcome our /tmp limit?

The good 'ld nvprof did not have these limitations. The same simulation test run on an Intel CPU compute node with similar software stack (see notes below) can is profiled with nsys profiler correctly without any problem and without filling up the /tmp directory.

We also noticed that on the AMD compute node we get the following warning:

Warning: LBR backtrace method is not supported on this platform. DWARF backtrace method will be used.

while on the Intel based compute node we get other information on the profiling collection settings:

**** collection configuration **** output_filename = /profiling.0.nsys
force-overwrite = true
stop-on-exit = true
export_sqlite = false
stats = false

Thank you for any hint or support.

Luca Ferraro

Some notes on our compute node environment and OS:

AMD CPU based compute node:

NVIDIA A100 GPU
Cuda compilation tools, release 11.1, V11.1.105
> nsys --version NVIDIA Nsight Systems version 2020.3.4.32-52657a0 > uname -a
Linux c4n0002 4.18.0-147.el8.x86_64 #1 SMP Wed Dec 4 21:51:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$> cat /etc/redhat-release
CentOS Linux release 8.1.1911 (Core)

Intel CPU based compute node:

NVIDIA V100 GPU
Cuda compilation tools, release 10.2, V10.2.89
> nsys --version NVIDIA Nsight Systems version 2019.5.2.16-b54ef97 > uname -a
Linux c5n0223 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$> cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)