Profile nvivafilter on Jetson Nano

Hi,

Im building a Gstreamer based application on Jetson Nano where we are using nvivafilter to overlay real-time data on the video feed. I a newbie in CUDA and in GPU programing, so I guess there would be lot opportunities to make the current solution more performant. So my question would be: how can I profile my code?

This is our pipeline:

gst-launch-1.0 -e
nvarguscamerasrc sensor-id="$SENSOR_ID" sensor-mode=0 gainrange=“1 16” ispdigitalgainrange=“1 1” name="${APP_NAME}pipeline_overrides${SENSOR_ID}"
! “video/x-raw(memory:NVMM), width=(int)${capture_width}, height=(int)${capture_height}, format=(string)NV12, framerate=(fraction)${capture_framerate}/1”
! nvvidconv ! nvivafilter cuda-process=true customer-lib-name="$customer_lib" ! ‘video/x-raw(memory:NVMM), format=(string)NV12’
! nvvidconv ! nvv4l2vp8enc bitrate="${capture_bitrate}" control-rate=1 ! rtpvp8pay mtu=1400
! udpsink auto-multicast=true clients="${udp_clients}"

Where I would like get insights from the ‘nvivafilter’ element.

As a naive solution I checked the output of top and jtop to check the CPU, GPU and memory usages but its not that accurate.

I also tried with some remote debugging (starting the connection from MacOS) with visual profiler, Nsight Systems and Nsight Compute but I was not able to connect to the Jetson.

The visual profiler gives me the following error on startup:

!ENTRY org.eclipse.osgi 4 0 2021-05-24 16:33:53.658
!MESSAGE Application error
!STACK 1
java.lang.RuntimeException: Application “com.nvidia.viper.application.application” could not be found in the registry. The applications available are: org.eclipse.ant.core.antRunner, org.eclipse.birt.report.engine.ReportExecutor, org.eclipse.e4.ui.workbench.swt.E4Application, org.eclipse.e4.ui.workbench.swt.GenTopic, org.eclipse.equinox.app.error, org.eclipse.equinox.p2.director, org.eclipse.equinox.p2.garbagecollector.application, org.eclipse.equinox.p2.publisher.InstallPublisher, org.eclipse.equinox.p2.publisher.EclipseGenerator, org.eclipse.equinox.p2.publisher.ProductPublisher, org.eclipse.equinox.p2.publisher.FeaturesAndBundlesPublisher, org.eclipse.equinox.p2.reconciler.application, org.eclipse.equinox.p2.repository.repo2runnable, org.eclipse.equinox.p2.repository.metadataverifier, org.eclipse.equinox.p2.artifact.repository.mirrorApplication, org.eclipse.equinox.p2.metadata.repository.mirrorApplication, org.eclipse.equinox.p2.updatesite.UpdateSitePublisher, org.eclipse.equinox.p2.publisher.UpdateSitePublisher, org.eclipse.equinox.p2.publisher.CategoryPublisher, org.eclipse.help.base.infocenterApplication, org.eclipse.help.base.helpApplication, org.eclipse.help.base.indexTool, org.eclipse.ui.ide.workbench.
at org.eclipse.equinox.internal.app.EclipseAppContainer.startDefaultApp(EclipseAppContainer.java:248)
at org.eclipse.equinox.internal.app.MainApplicationLauncher.run(MainApplicationLauncher.java:29)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:134)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:104)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:380)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:235)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:648)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:603)
at org.eclipse.equinox.launcher.Main.run(Main.java:1465)

The Nsight systems:

DirectoryNotFoundError (150) {
OriginalExceptionClass: N5boost16exception_detail10clone_implIN11QuadDCommon26DirectoryNotFoundExceptionEEE
OriginalFile: /Users/devtools/buildAgent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/PosixDeviceValidator.cpp
OriginalLine: 26
OriginalFunction: bool QuadDAnalysis::PosixDeviceValidator::CheckHostSupport(const QuadDAnalysis::DevicePtr &)
Filename: /Applications/NVIDIA Nsight Systems.app/Contents/target-linux-armv8
ErrorText: Deploy directory does not exist
}

Missing directory with target binaries:
/Applications/NVIDIA Nsight Systems.app/Contents/target-linux-armv8

NVIDIA Nsight Systems
2021.2.1.58-642947b OSX

  • falcon@falcondev.devices:

[Error] Target is not supported.
This version of Nsight Systems does not support profiling on the selected target.

Missing directory with target binaries:
/Applications/NVIDIA Nsight Systems.app/Contents/target-linux-armv8

And finally the Nsight Compute is able to connect to the remote but it is stuck with this message:
Trying to connect to process…

Searching for attachable processes on falcondev.devices:49152-49215…

But maybe Im on a wrong track, I mean Im not even sure if any of these tools would work for me as my CUDA code is compiled to a shared lib and will be called by the nvivafilter from a Gstreamer pipeline.

Do you have any recommendation what would be the best way to profile the application?

Thanks!

Bests,
Peter

Hi,
A direct way is to execute sudo tegrastats. You can get clock/loading of hardware components. For checking fps, you can use fpsdisplaysink plugin:

gst-launch-1.0 -v nvarguscamerasrc sensor-id="$SENSOR_ID" sensor-mode=0 gainrange="1 16" ispdigitalgainrange="1 1" name="${APP_NAME}pipeline_overrides${SENSOR_ID}" ! 'video/x-raw(memory:NVMM), width=(int)${capture_width}, height=(int)${capture_height}, format=(string)NV12, framerate=(fraction)${capture_framerate}/1' ! nvvidconv ! nvivafilter cuda-process=true customer-lib-name="$customer_lib" ! 'video/x-raw(memory:NVMM), format=(string)NV12' ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0 

I see, so there is only this indirect way :(

Too bad… I was mesmerised by all those nice visual features what these Nsight tools have :)

I mean with this I can only assume that my improvements are really doing something, cause I will only see the overall system performance and it wont be drilled down to a specific process (ok there wont be any resource hungry processes during the measurements but still). Not to mention the statistics I would be able get out of a real profiling tool where I can get performance metrics on code / function level.

I think that by direct way, @DaneLLL was refering to native profiling.
Not tried with recent releases, but I think you may get more insights from Nsight on host. I may not be able to further help on this, though.

Unfortunately I was not able to try the NSight tools on the Nano since we are using them in headless mode :S so first I would need to reinstall my dev env, I would rather climb a mountain just not reinstall it :D #baddevhabits. I was not sure if its even possible to profile CUDA code if it is not directly executed as a binary but as in my case it is a shared library. But I guess if no one tried this someone should be the first… :D

I think that good habit would be keeping a live version and managing your deviations from there.
However, I also think that how to profile nvivafilter is a legitimate question.
Hope someone from NVIDIA will provide some way to try this.

NSight can debug over ssh, but not with pubkey auth. If you’re using a password to secure your ssh (the default), it’ll probably work.

Hi,
we have all tools being listed in
https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/debug_setup.html#

Please check and see if you can utilize either tool in your usecase.