Accelerating custom Python code on DRIVE AGX

Hello,

I have been working on developing a custom perception application on the DRIVE AGX platform using Python, Tensorflow and PyTorch. This uses a USB webcam (Logitech) as camera input.

From an initial run on this module, I have only been able to observe a performance of 4-6FPS when run in real-time, whereas the same code on a Titan X runs at around 15FPS.

  1. Given that these are different architectures, how can I optimize my code to get it to perform in real-time on the DRIVE AGX?
  2. Is it possible to use the DLA cores (from a Python implementation) to accelerate this further? How can I speed this up from a hardware perspective?

If there’s a different place I need to post this in, please let me know.

Dear ankit,
Xavier has 1.3 TFlops of FP32 and 30 DLTOPS. It is possible that you are using FP32 precision that makes difference.
Now, you can use TensorRT C++ APIs on board to optimize your model furthur and also you can check using FP16, INT8 precision. You can offload some of the layers to DLA using TensorRT. Please check TensorRT documentation ( https://docs.nvidia.com/deeplearning/sdk/tensorrt-archived/tensorrt-511rc/tensorrt-developer-guide/index.html )for more details.

Thanks SivaRamaKrishna!

Is there a way to check what precision my current code is using?

Also, if I’m not wrong, TensorRT Python API is not supported on automotive platforms. Is that correct? https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html#fntarg_1

Dear raul_16,
If you do not explicit set FP16, INT8 mode in TensorRT, it runs in FP32 mode.
TensorRT python API is not supported in Drive platform.