DLA activities not shown in Nsight system for Jetson Orin

lab2022 · December 16, 2022, 8:38am

Hello,

I am following a DLA Orin tutorial from Nvidia’s github ( GitHub - NVIDIA-AI-IOT/jetson_dla_tutorial: A tutorial for getting started with the Deep Learning Accelerator (DLA) on NVIDIA Jetson). This tutorial provides small and simple networks to test Orin DLAs.

When profiling the trtexec with Nsight System tool, I cannot obtain the same results as shown in the tutorial despite launching the same commands. I can only access the DLA’s submissions, but not its effective activities.

My configuration :
- Platform : Orin AGX Dev. kit
- JetPack : 5.0.2
- TensorRT : 8.4.0.1
- CUDA : 11.4
- Nsight system : 2022.3.3.18-4d5367b
The report obtained in the tutorial (where we see the NvMediaDlaSubmit activity) :

The report I obtained (where the submissions may be the tiny purple tasks submit on the last line) :

Screenshot from 2022-12-16 09-16-511580×373 47.3 KB
I launched the profiling with the following command :

sudo /opt/nvidia/nsight-systems/nsys profile --trace=cuda,nvtx,cublas,cudla,cusparse,cudnn,nvmedia --accelerator-trace=nvmedia --output=model_bn_int8_dla0_.nvvp /usr/src/tensorrt/bin/trtexec --loadEngine=engines/model_bn_int8_dla0.engine --iterations=10 --idleTime=500 --duration=0 --useSpinWait

Here is the summary diagnostic collected from Nsight : only 2 NvMedia events were collected.

Information Daemon -00:00.012
Camrtc log-level has been changed from ‘0’ to ‘2’ in order to collect nvmedia events
Profiling has started.
Process was launched by the profiler, see /tmp/nvidia/nsight_systems/quadd_session_101848900/streams/pid_1849083_stdout.log and stderr.log for program output
Common injection library initialized successfully.
NvMedia injection initialized successfully.
NVTX injection initialized successfully.
cuBLAS symbols found in /usr/local/cuda/targets/aarch64-linux/lib/libcublas.so.11 symbol table. No cuBLAS trace will be generated from that library. Was cuBLAS statically linked?
cuDNN injection initialized successfully.
Buffers holding CUDA trace data will be flushed on CudaProfilerStop() call.
Number of other accelerators events collected: 2 019.
CUDA injection initialized successfully.
Number of NVTX events collected: 1 289.
Number of CUDA events collected: 3 555.
No cuDNN events collected. Does the process use cuDNN?
Number of NvMedia events collected: 2.
Number of CUPTI events produced: 4 480, CUPTI buffers: 20.
The cudaProfilerStart API was ignored 1 times due to configuration settings; Note that, when requested, only the first pair of cudaProfilerStart/Stop APIs, after the collection is started, will be effective.
The cudaProfilerStop API was ignored 1 times due to configuration settings; Note that, when requested, only the first pair of cudaProfilerStart/Stop APIs, after the collection is started, will be effective.
Camrtc log-level has been restored to ‘0’

Finally, I have also tried detecting any OOM as suggested in this topic, but without success.

Could you help me please ?

AastaLLL · December 16, 2022, 9:20am

Hi,

Do you follow the status shared below:

Thanks

lab2022 · December 16, 2022, 9:35am

Thank you for your quick answer,

Yes, I encountered the same problem for that specific section :

At step 4 of the tutorial, here is what I should get :

Here is what I got :

with this summary diagnostic :

Information Daemon -00:00.000
Camrtc log-level has been changed from ‘0’ to ‘2’ in order to collect nvmedia events
Profiling has started.
Process was launched by the profiler, see /tmp/nvidia/nsight_systems/quadd_session_101867829/streams/pid_1868012_stdout.log and stderr.log for program output
Common injection library initialized successfully.
NvMedia injection initialized successfully.
NVTX injection initialized successfully.
cuBLAS symbols found in /usr/local/cuda/targets/aarch64-linux/lib/libcublas.so.11 symbol table. No cuBLAS trace will be generated from that library. Was cuBLAS statically linked?
cuDNN injection initialized successfully.
Buffers holding CUDA trace data will be flushed on CudaProfilerStop() call.
Number of other accelerators events collected: 1 678.
CUDA injection initialized successfully.
Number of NVTX events collected: 1 256.
Number of CUDA events collected: 19 861.
Number of cuDNN events collected: 128.
Number of NvMedia events collected: 2.
cuBLAS symbols found in /usr/local/cuda/targets/aarch64-linux/lib/libcublas.so.11 symbol table. No cuBLAS trace will be generated from that library. Was cuBLAS statically linked?
Number of CUPTI events produced: 20 459, CUPTI buffers: 20.
The cudaProfilerStart API was ignored 1 times due to configuration settings; Note that, when requested, only the first pair of cudaProfilerStart/Stop APIs, after the collection is started, will be effective.
The cudaProfilerStop API was ignored 1 times due to configuration settings; Note that, when requested, only the first pair of cudaProfilerStart/Stop APIs, after the collection is started, will be effective.
Profiling has stopped.
Camrtc log-level has been restored to ‘0’

AastaLLL · December 21, 2022, 7:13am

Hi,

We can see similar behavior with an MNIST model.
Will discuss this with our internal team and share more information with you.

Thanks.

AastaLLL · December 28, 2022, 8:28am

Hi,

Have you applied step 5 to use the BatchNorm2d layer instead?

If yes, could you share the model with us for comparison?
Thanks.

lab2022 · January 2, 2023, 11:24am

Hello,

The problem persists at step 5 of the tutorial when replacing GroupNorm layers with BatchNorm2d.

At step 5 of the tutorial, here is what I should get :

And here is what I get (with BatchNorm2d) :

I have found the DLA accelerator trace for one inference (long purple task on the 3rd line) in a subsection different from the trtexec thread. Still, it seems to contradict the other accelerator submission shown in trtexec thread (tiny purple tasks submit on the last line). Once again, the results I got are different from the tutorial ones.

with this summary diagnostic :

Please find below the onnx, engine, and profiling of the model used for step 5 :

model_bn_int8_dla0_.nvvp.nsys-rep (1.4 MB)
model_bn_int8_dla0.engine (1.8 MB)
model_bn.onnx (5.9 MB)

Thank you very much for your help !

AastaLLL · January 4, 2023, 10:13am

Hi,

Thanks for your sharing.

Could you also tell us your host machine environment?
Is it Ubuntu 20.4?

Thanks.

lab2022 · January 4, 2023, 12:57pm

Hello,

Yes, the host machine is using Ubuntu 20.04

AastaLLL · January 5, 2023, 9:56am

Hi,

Thanks for the confirm.
We are checking this internally. Will share more information with you soon.

AastaLLL · January 11, 2023, 8:02am

Hi,

Could you try the below command to see if it helps?

$ sudo nsys profile -t nvtx,nvmedia,osrt --show-output=true --use-agent-api=false --accelerator-trace=nvmedia /usr/src/tensorrt/bin/trtexec ...

Thanks.

lab2022 · January 11, 2023, 4:40pm

Hello,

Thank you very much for your help. Here is what I got with your last command at step 5 of the tutorial:

Some new processed appeared, but the DLA API is still partially detected.

Please find here the associated report : test.nsys-rep (1.4 MB)

AastaLLL · January 12, 2023, 7:15am

Hi,

Thanks for your testing.

Would you mind sharing your ONNX model after step 5 with us? (has replaced GroupNorm with BatchNorm2d)
We want to check it further internally.

Thanks.

lab2022 · January 12, 2023, 8:08am

Hello,

Here is the onnx model used for step 5 (and for the last custom profiling) where GroupNorm is replaced with BathNorm2d :

model_bn.onnx (5.9 MB)

AastaLLL · January 17, 2023, 5:38am

Hi,

Thanks for the model.
We are checking this issue with our internal team.
Will share more information with you later.

Thanks.

AastaLLL · February 8, 2023, 7:35am

Hi,

Since we have a new JetPack 5.1 release, could you try the same on JetPack 5.1?

Thanks

system · March 7, 2023, 6:30am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
nsight systems not seeing profile ranges when DLA is enabled Jetson AGX Xavier	11	3275	October 18, 2021
Questions about DLA Xavier with Nsight System Jetson AGX Xavier tensorrt	6	675	October 18, 2021
Profile results of model running on DLA mismatch between TensorRT and nsys Jetson AGX Orin tensorrt , dla	10	1122	April 5, 2023
Profiling DLA with GPU fallback on Jetson Xavier Jetson AGX Xavier dla	6	1565	August 29, 2021
Deep Learning Accelerator problems DRIVE AGX Xavier General	2	1453	October 12, 2021
How can i use NSight Systems to check the sampleMNIST DLA usage? Jetson AGX Xavier	1	524	December 6, 2019
slower when change DefaultDeviceType from GPU to DLA? Jetson AGX Xavier	3	669	October 18, 2021
Unable to use DLA with TensorRT Jetson AGX Xavier	11	3400	November 8, 2018
Trtexec profile TensorRT	6	3238	October 12, 2021
TensorRT 5 docs and examples (Solved) Jetson AGX Xavier	16	7097	October 18, 2021

DLA activities not shown in Nsight system for Jetson Orin

Related topics