Classification and/or detection models with DLA support

Hi,

When trying to run Classification and DetectNetV2 on DLA (Jetson Xavier), I get lots of :
“Default DLA is enabled but layer XYZ is not running on DLA, falling back to GPU”

Do you provide models (from the list, or others, either classification or detection) that supports the DLA run ?

Hi dannykario,
Do you mean you run tlt in Xavier directly?
From tlt doc, using the Transfer Learning Toolkit requires the following:

Hardware Requirements
Minimum
4 GB system RAM
4 GB of GPU RAM
Single core CPU
1 GPU
50 GB of HDD space

Recommended
32 GB system RAM
32 GB of GPU RAM
8 core CPU
4 GPUs
100 GB of SSD space

Software Requirements
Ubuntu 18.04 LTS
NVIDIA GPU Cloud account and API key - https://ngc.nvidia.com/
docker-ce installed, https://docs.docker.com/install/linux/docker-ce/ubuntu/
nvidia-docker2 installed, instructions: https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
NVIDIA GPU driver v410.xx or above
Note: DeepStream 4.0 - NVIDIA SDK for IVA inference https://developer.nvidia.com/deepstream-sdk is recommended.

Hi,

Rephrasing:

I am looking for a model (both classification and detection) that is able to run on the Xavier DLA cores, WITHOUT falling back to the GPU.
I saw in the other answers ResNet50 is capable of that.

Q1: is this indeed the case ?

Q2: If yes, How can I use it ? 2 issues here:

– Train using TLT (Do I need some special flags, or the “standard” TLT chain of detectNetv2 with resnet50 is enough) ? I’m training on 2080 Ubuntu18, trained few models all worked well on the GPU.

– Work on Xavier, but WITHOUT deep stream (unfortunately, I’m not using DS for now).

?

Thanks for the help !

Hi dannykario,
Thanks for your clarification. Actually there is not any info for DLA in tlt-getting-started-guide.
We also did not tested TLT etlt model or generated trt engine on DLA.
So, it is an unknown result for the case.

Hi,

Thanks. I looked at the https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks#trtexec, last section :

For DLA (Core 0)
$ ./trtexec --avgRuns=100 --deploy=resnet50.prototxt --fp16 --batch=8 --iterations=10000 --output=prob --useDLACore=0 --useSpinWait --allowGPUFallback

Do you know if the “resnet50.prototxt” used is equiv to the TLT resnet50 ?

Thanks for the help !

Hi dannykario,
No, it should not be.
TLT is a Python package that enables customers to fine-tune pre-trained models with their own data. Customers can then export these models for TensorRT based inference through an edge device.
In ngc.nvidia.com, there is a resnet50 pre trained model. But it just works as pre-trained model for customer to start training his own data.

Hi,

Re-explaining myself:

– I’m using TLT models (classification, detection) as base for transfer learning + train them on a server with my data. Prefer TLT, but might use other training setups if needed.

– Then, I deploy on Xavier

– I have some performance issues on the Xavier, I suspect too heavy load on the GPU and DLA not used at all

– So, I am looking for some models as a base for transfer learning, that I can use on the DLA without fall back to the GPU.

The ResNet50 you mentioned seems to be for V100 and T4 - is there some option for Xavier deploy ? Or any other relevant model + know how how to train it and export it ?

Thanks for the help !

Hi dannykario,
For TLT, all the pre-trained models are only available at ngc.nvidia.com. I want to explain that they are not specific for any device where to deploy.
The pre-trained models are just pre-trained weights and help customers train their own data.
You may use pre-trained models or may not use. If you do not use pre-trained model, just keep relevant setting unfilled in the training spec.

For how to train and export etlt model, please refer to tlt user guide.

Hi,

Sorry, but I am not following. Re-phrasing to a very focused and simple question:

Where can I download a model - with or without weights, for either classification or detection, that:

– Runs on DLA without falling back to the GPU on Xavier ? (I’m using trtexec to check this) –

Thanks again for the help !

Hi dannykario,
The pre-trained models are not relevant to “runs on DLA”.

All the pre-trained models are available at ngc.nvidia.com.
But they are just pre-trained weights for tlt training.

After training, end user can generate trt engine in order to deploy.
But in tlt user guide, there is not any info about “runs on DLA”. That means tlt does not commit to support “runs on DLA”.

Thanks.