DLA for object detection supported with TF-TRT on Xavier?

I’m trying to use tf-trt to run inference on object detection networks on the Jetson AGX Xavier Developer Kit. I wanted to understand how different models performed on the Xavier, so I tried to benchmark all of the models from the Object Detection Model Zoo. I downloaded all of the models and converted using the code found here. However, the performance numbers I get do not seem consistent with DLA inference benchmarks I’ve seen (below). The performance numbers are taken by timing the session.run() call to TensorFlow, running on the Xavier in MAX_N mode with all clocks maxed (after jetson_clocks.sh). My hypothesis is that either the conversion step or the inference step is not making use of the DLA chips, and is utilizing the GPU for the entire inference stage. Based on that, I have a few questions:

  1. Is there a way to verify my hypothesis? That is, determine whether tf-trt is executing the model on the DLA or on the GPU?
  2. Assuming the GPU is being used, is there a way to convert models such that the DLA is used as much as possible?
  3. If it is not possible to use the DLA with tf-trt yet, is there an ETA on when support may be enabled?
  4. If it is not possible to use the DLA with tf-trt yet, what is the recommended way of running tensorflow object detection models with the DLA? As this is still early access hardware, with lots of things in flux, I’m not sure what the current best practice is.
  5. Another thing to note is that only some of the models were able to convert with the conversion script (which is just calling trt.create_inference_graph internally). Batch size > 1 only seems supported on SSD topologies. Is this expected?

Object Detection Model Zoo - https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
Conversion code - GitHub - NVIDIA-AI-IOT/tf_trt_models: TensorFlow models accelerated with NVIDIA TensorRT
Performance numbers - trt_graphs - Google Sheets
Benchmark 1 - https://developer.nvidia.com/embedded/jetson-agx-xavier-dl-inference-benchmarks
Benchmark 2 - NVIDIA Jetson AGX Xavier Benchmarks - Incredible Performance On The Edge Review - Phoronix

Hi theholyhades1, DLA is not supported in TF-TRT, you would need to use the UFF workflow and import it into the TensorRT C++ API to run a TensorFlow model on the DLA’s. See here:

Note that currently, the networks that are officially verified on DLA include ResNet-50, GoogleNet, AlexNet, and LeNet. So you will probably need to have GPU fallback enabled so that unsupported layers can be run on GPU instead.

Hi Dusty,

Thanks for the clarification. I’ll try using the UFF workflow and post my results. Am I correct in assuming that the TF-TRT is currently using GPU only on the Xavier?

Yes, that is correct. FYI, here is a GitHub issue about it filed against TensorFlow master: https://github.com/tensorflow/tensorflow/issues/23437

BTW here is a tutorial of using the UFF workflow: https://github.com/NVIDIA-AI-IOT/tf_to_trt_image_classification