Nsight compute not working when running LLM demo on Jetson AGX Orin


I would like to use Nsight compute to find out the GPU cache activity ( utilization and hit rate) of the Jetson AGX Orin Devkit.
I installed the latest version of the Jetpack SDK (v5.1.2) and associated CUDA (v11.4) and other components using the SDK Manager in the AGX Orin Devkit.

I have profiled the following CUDA samples as a start, and this outputs the report successfully.


Next, we attempted to obtain the GPU profile when the following LLM Demo was being performed.

I ran the LLM demo app with the following command, and the app itself is working fine, but the GPU profile report is not output with the warning “No Kernels were profiled”.

sudo /opt/nvidia/nsight-compute/2022.2.1/nuc --target-processes all ./run.sh $(./autotag text-generation-webui)

Is there any special treatment required to get the GPU profile for this LLM demo using Nsight compute?

Hi, @UNA_H

Are you running your LLM demo app in the container but launching NCU outside of the container? If yes, then it is expected.

Hi veraj,

As you pointed out, the LLM demo APP is running in the container and the NVU is running outside the container.
What is the best way to get a profile on the GPU cache for an app running in a container?

Note that when nsys is run with the following command, the report is output, but it does not seem to include profile information about the GPU cache.

$nsys profile --gpu-metrics-device=all ./panasonic-app([LLM Demo App])

You can map NCU into container and launch.

