Unable to profile process on TX2

Hi all,

I’m trying to profile a CUDA kernel executed as part of a gstreamer pipeline containing nvivafilter. I have a TX2 with the latest Jetpack version installed and an Ubuntu 18.04 running nv-nsight-cu-cli.

The pipeline is running on my target TX2, but when I execute on my host:

sudo ./nv-nsight-cu-cli --mode=attach --hostname <tx2_ip>

I’m getting the errors:

==PROF== Finding attachable processes on host: <tx2_ip>
==ERROR== No processes found on device.
==ERROR== No attachable process found.
==ERROR== An error occurred while trying to profile
==WARNING== No kernels were profiled
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option

The pipeline running on the TX2 is:

gst-launch-1.0 -e v4l2src device=/dev/video0 num-buffers=-1 ! video/x-raw, format=(string)UYVY, width=(int)2880, height=(int)2160 ! nvvidconv ! video/x-raw(memory:NVMM), width=(int)1440, height=(int)1080 ! queue ! comp. v4l2src device=/dev/video1 num-buffers=-1 ! video/x-raw, format=(string)UYVY, width=(int)2880, height=(int)2160 ! nvvidconv ! video/x-raw(memory:NVMM), width=(int)1440, height=(int)1080 ! queue ! comp. v4l2src device=/dev/video2 num-buffers=-1 ! video/x-raw, format=(string)UYVY, width=(int)2880, height=(int)2160 ! nvvidconv ! video/x-raw(memory:NVMM), width=(int)1440, height=(int)1080 ! queue ! comp. nvcompositor name=comp latency=500000000 sink_2::xpos=0 sink_1::xpos=1440 sink_0::xpos=2880 ! nvvidconv ! queue ! nvivafilter cuda-process=true customer-lib-name=<path_to_my_so_file> ! video/x-raw(memory:NVMM), format=(string)RGBA ! nvvidconv ! queue ! omxh264enc bitrate=20000000 ! matroskamux name=mux ! queue ! filesink location=output.mkv -e audiotestsrc num-buffers=-1 ! queue ! audioconvert ! queue ! audioresample ! queue ! audio/x-raw,rate=44100,channels=2 ! queue ! avenc_aac bitrate=128000 ! queue ! audio/mpeg ! aacparse ! audio/mpeg, mpegversion=4 ! mux.

L4T info for target device:

R32 (release), REVISION: 3.1, GCID: 18186506, BOARD: t186ref, EABI: aarch64, DATE: Tue Dec 10 07:03:07 UTC 2019

nv-nsight-cu-cli version on host

NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2012-2019 NVIDIA Corporation
Version 2019.5.0 (Build 27346997)

I tried to run using the GUI as well, but from my understanding - that is not possible if the target hardware is a TX2, am I correct?

Any help would be appreciated.

You cannot attach or profile any random application on your device, you need to launch the application to be profiled through Nsight Compute. This means that either you use the Nsight Compute CLI on your TX2 to launch the application (–mode launch) and attach from the host (–mode attach), or you profile directly on the TX2 using the local CLI. You can inspect the results directly, or copy the saved report to your host for further analysis, e.g. in the UI.

The third and recommended way, however, is to use the host’s Nsight Compute UI to remotely launch and profile the application on your remote TX2 target. For this, choose Linux (aarch64)/Embedded Linux as the target platform in the connection dialog, and create a new remote connection to your TX2 device with IP address and username/password.

Note that only the Nsight Compute version shipped as part of your JetPack release is supported for profiling on TX2 devices. It is installed with the CUDA Toolkit component from JetPack.

Thank you for your reply.

The last part of of your answer gave me the clue I needed.

Running Nsight Compute initially ran from the folder /usr/local/cuda-10.2 which gave me this window for connection:

Screenshot from 2020-04-03 10-31-00

But after checking the version on my TX2 I realized I need to run Nsight Compute from /usr/local/cuda-10.0, which gave me this window:

Screenshot from 2020-04-03 10-31-10

I had some trouble getting meaningful input from the “Connect to process” window (it closes after encountering errors without allowing you to read the output! I had to screenshot the output at the “right” moment).

So now it appears that the process doesn’t stop (even though it should after a second or two) and if I kill the process manually there is a problem with retrieving the reports:

More helpful guidance would be appreciated.

Yes, that’s a known issue that will be fixed in a future version of Nsight Compute.

From the output, I see that the Nsight Compute host on your desktop system failed to copy the report it expects the CLI on the TX2 target to have created. The reason very likely is that no such report was created on the target side, because of the “terminate called without an active exception…” error. I can’t say for sure why that error occurs, but it appears that the target process that nv-nsight-cu-cli launches crashes immediately.

To simplify debugging this issue, can you try running the commands directly on your target device, rather than via the host? Since the Nsight Compute CLI was already deployed, you should be able to run it as
/tmp/var/target/linux-v4l_l4t-glx-t210-a64/nv-nsight-cu-cli <app> <app args>
from the target’s shell.

As an additional note, the version of Nsight Compute shipped with CUDA 10.0 is fairly old, and a much newer build will be shipped with the next JetPack release. Until then, if you are blocked by this issue right now, you can also still use the nvprof tool to collect kernel metrics for apps running on TX2.