Nsys unrecognised option '--trace-fork-before-exec=true'

Hardware: Nvidia Jetson Xavier NX
Command (as root): nsys profile --trace-fork-before-exec=true --process-scope=process-tree python3 test.py
Nsight systems version: NVIDIA Nsight Systems version 2021.2.3.8-78c8c79
My goal: See the GPU usage of a python script including all the child processes forked within it.

Hi, I get this unrecognised option '--trace-fork-before-exec=true' message when I try to profile the GPU usage of a python script that forks some child processes that also uses the GPU. I’m confused because this option is listed in the user guide of nsight systems.

In my understanding this option is needed to profile my script since python doesn’t need to exec after it forks, however other methods around this is also appreciated.

Full output:

root# nsys profile --trace-fork-before-exec=true --process-scope=process-tree python3 test.py
unrecognised option '--trace-fork-before-exec=true'

usage: nsys profile [<args>] [application] [<application args>]
Try 'nsys profile --help' for more information.

Sorry for the confusion. That option is not available for the Tegra target. We are in the process of changing --help to be a little more user-friendly about what options are available on a platform.

And no, you do not need this for your python script. We use this for C++ forks, where trace behavior is undefined if you try to trace between fork and exec call. Because Nsys might crash in that situation, we require the explicit request.

Thanks for the clarification, but my problem still persists.

The above is what I’m getting when running my script,
pid 23021 forks two children 23053 and 23054 to do 5 TensorRT inferences each.
I’m not getting any GPU information, also when I compare other people’s profiling output:


There is this “CUDA (GM20B)” row that shows GPU memory and kernel info, which doesn’t appear in my report. Why is this?

Another observation is that when profiling my script I see the following in dmesg:

[101844.526582] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:121  [ERR]  ringmaster intr status0: 0x00000100,status1: 0x00000000
[101844.526943] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:149  [ERR]  SYS write error. ADR 0x004041f4 WRDAT 0x07fffffe INFO 0x1a408201 (subid 0x0000001a priv level 0), CODE 0xbadf1301
[101844.527337] nvgpu: 17000000.gv11b gp10b_priv_ring_decode_error_code:79   [ERR]  client timeout
[101844.588876] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:121  [ERR]  ringmaster intr status0: 0x00000100,status1: 0x00000000
[101844.589257] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:149  [ERR]  SYS write error. ADR 0x004041f4 WRDAT 0x07fffffe INFO 0x18408201 (subid 0x00000018 priv level 0), CODE 0xbadf1301
[101844.589669] nvgpu: 17000000.gv11b gp10b_priv_ring_decode_error_code:79   [ERR]  client timeout
[101844.597330] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:121  [ERR]  ringmaster intr status0: 0x00000100,status1: 0x00000000
[101844.597670] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:149  [ERR]  SYS write error. ADR 0x00406004 WRDAT 0x0001fffe INFO 0x1d408210 (subid 0x0000001d priv level 0), CODE 0xbadf1301
[101844.598103] nvgpu: 17000000.gv11b gp10b_priv_ring_decode_error_code:79   [ERR]  client timeout
[101844.601675] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:121  [ERR]  ringmaster intr status0: 0x00000100,status1: 0x00000000
[101844.602023] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:149  [ERR]  SYS write error. ADR 0x004041f4 WRDAT 0x07fffffe INFO 0x1b408201 (subid 0x0000001b priv level 0), CODE 0xbadf1301
[101844.603968] nvgpu: 17000000.gv11b gp10b_priv_ring_decode_error_code:79   [ERR]  client timeout
[101844.617546] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:121  [ERR]  ringmaster intr status0: 0x00000100,status1: 0x00000000
[101844.625288] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:149  [ERR]  SYS write error. ADR 0x004041f4 WRDAT 0x07fffffe INFO 0x19408201 (subid 0x00000019 priv level 0), CODE 0xbadf1301
[101844.642512] nvgpu: 17000000.gv11b gp10b_priv_ring_decode_error_code:79   [ERR]  client timeout
[101846.116039] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:121  [ERR]  ringmaster intr status0: 0x00000100,status1: 0x00000000
[101846.116392] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:149  [ERR]  SYS write error. ADR 0x004041f4 WRDAT 0x07fffffe INFO 0x1f408201 (subid 0x0000001f priv level 0), CODE 0xbadf1301
[101846.116871] nvgpu: 17000000.gv11b gp10b_priv_ring_decode_error_code:79   [ERR]  client timeout
[101846.323674] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:121  [ERR]  ringmaster intr status0: 0x00000100,status1: 0x00000000
[101846.324063] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:149  [ERR]  SYS write error. ADR 0x004041f4 WRDAT 0x07fffffe INFO 0x18408201 (subid 0x00000018 priv level 0), CODE 0xbadf1301
[101846.324674] nvgpu: 17000000.gv11b gp10b_priv_ring_decode_error_code:79   [ERR]  client timeout
[101847.862725] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:121  [ERR]  ringmaster intr status0: 0x00000100,status1: 0x00000000
[101847.863087] nvgpu: 17000000.gv11b               gp10b_priv_ring_isr:149  [ERR]  SYS write error. ADR 0x004041f4 WRDAT 0x07fffffe INFO 0x1e408201 (subid 0x0000001e priv level 0), CODE 0xbadf1301
[101847.863492] nvgpu: 17000000.gv11b gp10b_priv_ring_decode_error_code:79   [ERR]  client timeout

Thanks for helping!

@liuyis if you can help. If not, please refer to someone on Andrey’s team.

Hi @user107170, what was the operating system that you were using? Was it QNX?

If 23053 and 23054 were forked but not exec-ed, we won’t trace it unless --trace-fork-before-exec=true is specified. However, this feature is not supported on QNX currently. It’s only supported on Linux platforms (x64, PPC, L4T, SBSA). That’s why you do not see CUDA APIs for these processes. The reason you don’t see “CUDA (GM20B)” row is due to the same reason - CUDA is not traced after fork but before exec.

The uname -a output is
Linux username-xavier 4.9.253-tegra #1 SMP PREEMPT Mon Jul 26 12:19:28 PDT 2021 aarch64 aarch64 aarch64 GNU/Linux

The OS was installed by downloading jetson-nx-jp46-sd-card-image.zip from nvidia’s website on December 4th, 2021, and flashing the SD card. However nsight cli was not in it so I downloaded
nsight-systems-cli-2021.2.3_2021.2.3.8-1_arm64.deb last week using nvidia sdkmanager, copied it to the Xavier NX, and installed it.

Interesting, looks like it’s L4T and should have this switch available. I’ll double-check it and give an update.

Hi @user107170, it seems this switch will only be available for L4T platform for version >= 2021.5. Unfortunately the current public release version for L4T (bubdled with Jet Pack 4.6) is 2021.2, so you’ll have to wait for newer version to be available.

Are you an NDA customer with NVIDIA? If so I can share you an internal build for latest version.

Thanks for the information, I’m not an NDA customer with NVIDIA, guess I’ll have to wait for the new version.

Can you share a rough estimate on the public release date of version 2021.5?

@user107170 My personal expectation is that there will be a new Jet Pack release in about a month which contains Nsight Systems 2021.5, but since Jet Pack release schedule is decided by a different team, I’m not sure of it.

You may be able to get more information about when will there be a Jet Pack update from Jetson forum.

@user107170 JetPack 4.6.1 is now release and it has Nsys 2021.5. See JetPack SDK | NVIDIA Developer.