Is using DLA for inference really more energy efficient??

Hi, I think it’s hard to believe that DLA is more energy efficient.

I used trtexec to examine inference time and energy consumption of model.
I used Resnet-50 caffe prototxt, but I removed last layer(softmax) and renamed last fully connected layer to prob, to avoid GPUfallback.
The command I used was something like this :

./trtexec --avgRuns=100 --deploy=../models/Resnet_without_prob.prototxt --fp16 --batch=8 --iterations=10000 --output=prob --useDLACore=0 --useSpinWait
./trtexec --avgRuns=100 --deploy=../models/Resnet_without_prob.prototxt --fp16 --batch=8 --iterations=10000 --output=prob --useSpinWait

I examined power consumption with tegrastats. And calculated (img/sec)/(Total Power consumption).

But using GPU with FP16 was always more efficient than DLA with FP16. (I tried MAX N mode, and 15W mode. Both tested after sudo jetson_clocks)

Of course DLA’s power consumption was low, but VDDRQ, SOC, CPU all these stuffs also needed energy, and DLA’s inference time was slower, so total power consumption was bad.

Could you give me some examples that I can check DLA is more energy efficient??

Thanks.

=====================================
I’m really sorry.
I found if I use other networks, such as GoogleNet, using DLA can be more energy efficient.

Thanks.

In my experience, the main benefit of DLA is when you can run it in 8 bit mode. Assuming your model will still perform well at that resolution, the DLA can run reasonably fast and efficient.
If you don’t care about the CPU at all, try taking all but one CPU core offline, and turn down the memory/GPU clocks. You can actually define your own power profiles in the /etc/nvpmodel.conf file, and apply those.

Wow…! Is it possible to use Xavier’s DLA in 8 bit mode…?

It’s supposed to, and NVIDIA has previously promised future support in TensorRT, but the benchmarks just run 16-bit:
https://devtalk.nvidia.com/default/topic/1044598/jetson-agx-xavier/jetson-agx-xavier-deep-learning-inference-benchmarks/

You should be able to use the 8-bit options in TensorRT, though:

Support for INT8 in Xavier DLA is coming soon, included in the next upcoming JetPack release. See this announcement thread:

https://devtalk.nvidia.com/default/topic/1055632/jetson-agx-xavier/new-jetson-software-modules-and-pricing/

Thanks…!!
I’ll wait for next Jetpack…!