TensorRT execution granularity

Description

Hello,

I am new to TensoRT and I am trying to build a simple application using TensorRT Python API. I have an important requirement for this application though: to be able to execute the inference at the granularity of layers or even operations similar to PyTorch/TensorFlow. My goal is to be able to inject control-logic between layers.

I believe I can use PyTorch to achieve this or even TensorFlow eager execution mode, but in TensorRT at some point I need to call something as Context.execute_async, which runs the inference as a black box.

Is this something that can be done in TRT? if yes, where can I find resources that can I start from?

Thanks!

Environment

TensorRT Version: 7.1
GPU Type: Volta 512 Cores (Jetson Xavier)
Nvidia Driver Version:
CUDA Version: 10.2
Operating System + Version: Ubuntu 18.04 LTS
Python Version (if applicable): 3.6

Hi @HazemAbdelhafez ,
There is wide list of samples to get started with TRT.
Also you can check the list of already supported ops here.
In case of unsupported ops, you can create custom layers.
To understand inference better, please check the link below.
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#perform_inference_python

Thanks!

Hi, thanks for the resources, however, I checked them prior to posting my question but I couldn’t find a definitive answer. It seems from the APIs that TRT allows black-box kind of inference, so you cannot define layers and execute them one-by-one similar to TF eager execution or PyTorch, is that correct?

Hi @HazemAbdelhafez,
The below link will answer your question.
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#profiling

However you can define custom layers
https://docs.nvidia.com/deeplearning/tensorrt/sample-support-guide/index.html#plugin_sample

Thanks!