DLA activities not shown in Nsight system for Jetson Orin

Hello,

I am following a DLA Orin tutorial from Nvidia’s github ( GitHub - NVIDIA-AI-IOT/jetson_dla_tutorial: A tutorial for getting started with the Deep Learning Accelerator (DLA) on NVIDIA Jetson). This tutorial provides small and simple networks to test Orin DLAs.

When profiling the trtexec with Nsight System tool, I cannot obtain the same results as shown in the tutorial despite launching the same commands. I can only access the DLA’s submissions, but not its effective activities.

  • My configuration :

    • Platform : Orin AGX Dev. kit
    • JetPack : 5.0.2
    • TensorRT : 8.4.0.1
    • CUDA : 11.4
    • Nsight system : 2022.3.3.18-4d5367b
  • The report obtained in the tutorial (where we see the NvMediaDlaSubmit activity) :

sudo /opt/nvidia/nsight-systems/nsys profile --trace=cuda,nvtx,cublas,cudla,cusparse,cudnn,nvmedia --accelerator-trace=nvmedia --output=model_bn_int8_dla0_.nvvp /usr/src/tensorrt/bin/trtexec --loadEngine=engines/model_bn_int8_dla0.engine --iterations=10 --idleTime=500 --duration=0 --useSpinWait

  • Here is the summary diagnostic collected from Nsight : only 2 NvMedia events were collected.

Information Daemon -00:00.012
Camrtc log-level has been changed from ‘0’ to ‘2’ in order to collect nvmedia events
Profiling has started.
Process was launched by the profiler, see /tmp/nvidia/nsight_systems/quadd_session_101848900/streams/pid_1849083_stdout.log and stderr.log for program output
Common injection library initialized successfully.
NvMedia injection initialized successfully.
NVTX injection initialized successfully.
cuBLAS symbols found in /usr/local/cuda/targets/aarch64-linux/lib/libcublas.so.11 symbol table. No cuBLAS trace will be generated from that library. Was cuBLAS statically linked?
cuDNN injection initialized successfully.
Buffers holding CUDA trace data will be flushed on CudaProfilerStop() call.
Number of other accelerators events collected: 2 019.
CUDA injection initialized successfully.
Number of NVTX events collected: 1 289.
Number of CUDA events collected: 3 555.
No cuDNN events collected. Does the process use cuDNN?
Number of NvMedia events collected: 2.
Number of CUPTI events produced: 4 480, CUPTI buffers: 20.
The cudaProfilerStart API was ignored 1 times due to configuration settings; Note that, when requested, only the first pair of cudaProfilerStart/Stop APIs, after the collection is started, will be effective.
The cudaProfilerStop API was ignored 1 times due to configuration settings; Note that, when requested, only the first pair of cudaProfilerStart/Stop APIs, after the collection is started, will be effective.
Camrtc log-level has been restored to ‘0’

  • Finally, I have also tried detecting any OOM as suggested in this topic, but without success.

Could you help me please ?

Hi,

Do you follow the status shared below:

Thanks

Thank you for your quick answer,

Yes, I encountered the same problem for that specific section :

  • At step 4 of the tutorial, here is what I should get :

  • Here is what I got :

  • with this summary diagnostic :

Information Daemon -00:00.000
Camrtc log-level has been changed from ‘0’ to ‘2’ in order to collect nvmedia events
Profiling has started.
Process was launched by the profiler, see /tmp/nvidia/nsight_systems/quadd_session_101867829/streams/pid_1868012_stdout.log and stderr.log for program output
Common injection library initialized successfully.
NvMedia injection initialized successfully.
NVTX injection initialized successfully.
cuBLAS symbols found in /usr/local/cuda/targets/aarch64-linux/lib/libcublas.so.11 symbol table. No cuBLAS trace will be generated from that library. Was cuBLAS statically linked?
cuDNN injection initialized successfully.
Buffers holding CUDA trace data will be flushed on CudaProfilerStop() call.
Number of other accelerators events collected: 1 678.
CUDA injection initialized successfully.
Number of NVTX events collected: 1 256.
Number of CUDA events collected: 19 861.
Number of cuDNN events collected: 128.
Number of NvMedia events collected: 2.
cuBLAS symbols found in /usr/local/cuda/targets/aarch64-linux/lib/libcublas.so.11 symbol table. No cuBLAS trace will be generated from that library. Was cuBLAS statically linked?
Number of CUPTI events produced: 20 459, CUPTI buffers: 20.
The cudaProfilerStart API was ignored 1 times due to configuration settings; Note that, when requested, only the first pair of cudaProfilerStart/Stop APIs, after the collection is started, will be effective.
The cudaProfilerStop API was ignored 1 times due to configuration settings; Note that, when requested, only the first pair of cudaProfilerStart/Stop APIs, after the collection is started, will be effective.
Profiling has stopped.
Camrtc log-level has been restored to ‘0’

Hi,

We can see similar behavior with an MNIST model.
Will discuss this with our internal team and share more information with you.

Thanks.

Hi,

Have you applied step 5 to use the BatchNorm2d layer instead?

If yes, could you share the model with us for comparison?
Thanks.

Hello,

The problem persists at step 5 of the tutorial when replacing GroupNorm layers with BatchNorm2d.

  • At step 5 of the tutorial, here is what I should get :

  • And here is what I get (with BatchNorm2d) :

I have found the DLA accelerator trace for one inference (long purple task on the 3rd line) in a subsection different from the trtexec thread. Still, it seems to contradict the other accelerator submission shown in trtexec thread (tiny purple tasks submit on the last line). Once again, the results I got are different from the tutorial ones.

  • with this summary diagnostic :

  • Please find below the onnx, engine, and profiling of the model used for step 5 :

model_bn_int8_dla0_.nvvp.nsys-rep (1.4 MB)
model_bn_int8_dla0.engine (1.8 MB)
model_bn.onnx (5.9 MB)

Thank you very much for your help !

Hi,

Thanks for your sharing.

Could you also tell us your host machine environment?
Is it Ubuntu 20.4?

Thanks.

Hello,

Yes, the host machine is using Ubuntu 20.04

Hi,

Thanks for the confirm.
We are checking this internally. Will share more information with you soon.

1 Like

Hi,

Could you try the below command to see if it helps?

$ sudo nsys profile -t nvtx,nvmedia,osrt --show-output=true --use-agent-api=false --accelerator-trace=nvmedia /usr/src/tensorrt/bin/trtexec ...

Thanks.

Hello,

Thank you very much for your help. Here is what I got with your last command at step 5 of the tutorial:

Some new processed appeared, but the DLA API is still partially detected.

Please find here the associated report : test.nsys-rep (1.4 MB)

Hi,

Thanks for your testing.

Would you mind sharing your ONNX model after step 5 with us? (has replaced GroupNorm with BatchNorm2d)
We want to check it further internally.

Thanks.

Hello,

Here is the onnx model used for step 5 (and for the last custom profiling) where GroupNorm is replaced with BathNorm2d :

model_bn.onnx (5.9 MB)

Hi,

Thanks for the model.
We are checking this issue with our internal team.
Will share more information with you later.

Thanks.

Hi,

Since we have a new JetPack 5.1 release, could you try the same on JetPack 5.1?

Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.