nsight systems not seeing profile ranges when DLA is enabled

cogwheel42 · November 14, 2019, 1:29am

When we compile our project to run a segmentation network on GPU all of our profiling frames show up in nsight systems remote profiler. But when we compile it to run the net on the DLA, not only do the NVTX frames stop showing up, it seems all of the GPU data and much of the CUDA traces are lost as well.

The whole point of moving our segmentation to the dla was to see if we could reduce the load on the GPU but we can’t actually measure that. Is there any way to get nsight systems to work correctly here?

AastaLLL · November 14, 2019, 7:21am

Hi,

Could you check if TensorRT works when putting the model into DLA first?
Since not all the operations in TensorRT are supported by the DLA, you might need to enable allowGPUFallback to enable the pipeline for certain model.

If above works well on your environment, could you share a photo of your Nsight system with us?
Suppose you should be able to see the DLA status at attached picture.

Please also noticed that Nsight System only output the DLA status (idle or busy) rather than utilization.

Thanks.

cogwheel42 · November 14, 2019, 6:29pm

Yes, the network fully runs on the DLA. See my other thread for that whole process.

The other thing I forgot to mention is that when I run the program under nvprof, all of the ranges are observed as expected. It’s only when I run nsight systems remotely that things don’t work right.

At the moment, we’re not so concerned with profiling the DLA performance. We just want to verify that the GPU is idle while the network is running on DLA in hopes of pipelining the process.

The reports have a completely different structure. When running on GPU it has CPU, Threads, CUDA, and NVTX at the top level. When running on DLA it has CPU, Processes, and iGPU at the toplevel. NVTX only shows up under the Process->process name.

And when running on the DLA the diagnostics summary has warnings about “Not all NVTX/CUDA events might have been collected” that don’t appear when running on GPU.

Actually, looking again, it seems the run on DLA stops after 100s whereas the GPU version stops at 200s. It like the profiler stops gathering events as soon as the DLA starts running? Or maybe running it under the profiler is causing the process to abort early?

cogwheel42 · November 15, 2019, 6:47pm

Is it possible to get DLA information from nvprof (which is working for us)? or is that only available in nsight (which isn’t)?

AastaLLL · November 26, 2019, 8:56am

Hi,

For checking the GPU status, it’s recommended to use tegrastats instead.

sudo tegrastats

Is this sufficient for you?
Thanks.

cogwheel42 · November 26, 2019, 7:26pm

I’m trying to get detailed profiling information on the DLA, not general GPU stats

AastaLLL · November 27, 2019, 3:06am

Hi,

We don’t support detail DLA profiling data at this time.
The only information is just active or idle. Is this sufficient for you?

If yes, you don’t need a profiler for this information.
You can check the device node directly:

cat /sys/devices/platform/host1x/15880000.nvdla0/power/runtime_status   #DLA0
cat /sys/devices/platform/host1x/158c0000.nvdla1/power/runtime_status   #DLA1

Ex.

nvidia@jetson-0330618100118:~$ cat /sys/devices/platform/host1x/158c0000.nvdla1/power/runtime_status
active
nvidia@jetson-0330618100118:~$ cat /sys/devices/platform/host1x/158c0000.nvdla1/power/runtime_status
suspended

Thanks.

cogwheel42 · November 28, 2019, 12:20am

I don’t understand this response given my first two messages. Nsight systems completely stops gathering profiling data as soon as anything starts running on the DLA. Even if we don’t get detailed profiling data for the actual work going on in the DLA, we should at least be able to get profiling data that has gaps during the operation of the DLA. And the “Other accelerators” profiling in Nsight does show the DLA.

AastaLLL · December 6, 2019, 7:03am

Hi,

Sorry that I thought you are finding other workaround to get DLA status without Nsight System.
The suggestion in comment#7 try to grep device node information directly to get the DLA status.

Another thing we suspect is the memory amount.
When running DL usecase, it usually occupies lots of device memory, especially for a segmentation network.

Is it possible that the device is running out of memory when the inference time, which cause the system stall?
Could you check this by monitoring with tegrastats?

sudo tegrastats

Thanks.

cogwheel42 · December 23, 2019, 6:14pm

I think you may be right about the memory amount. When running under nvprof with our custom scripts (not using Nsight remote), the system is using all of RAM and almost 8GB of swap space.

Is there any way to verify that it’s exiting due to OOM? I don’t see anything obvious in any of the stdout or error logs.

AastaLLL · December 31, 2019, 3:40am

Hi,

You can monitor the system status with tegrastats.

sudo tegrastats

Thanks.

Topic		Replies	Views
Not able to check whether DLA enabled/disabled using Nsight System Profiling Linux Targets nsight , dla	0	757	March 11, 2022
DLA usage profiler Jetson AGX Xavier dla	6	1320	June 2, 2022
DLA activities not shown in Nsight system for Jetson Orin Jetson AGX Orin nsight , nvbugs , dla	15	1013	March 7, 2023
How can i use NSight Systems to check the sampleMNIST DLA usage? Jetson AGX Xavier	1	522	December 6, 2019
Profiling DLA with GPU fallback on Jetson Xavier Jetson AGX Xavier dla	6	1558	August 29, 2021
Questions about DLA Xavier with Nsight System Jetson AGX Xavier tensorrt	6	666	October 18, 2021
When using DLA, how to measure performance such as usage occupancy of DLA? Jetson AGX Xavier	3	776	August 19, 2019
DLA utilisation level in Xavier AGX Jetson AGX Xavier dla	2	650	October 18, 2021
Jetson Xavier NX : Running DLA and GPU cores at the same time and Check Nsight System Jetson Xavier NX tensorrt , kernel	2	118	June 6, 2024
Nsight Systems DLA Profiling Data Missing on Export Jetson AGX Xavier nsight	5	543	October 18, 2021

nsight systems not seeing profile ranges when DLA is enabled

Related topics