I am new to TensoRT and I am trying to build a simple application using TensorRT Python API. I have an important requirement for this application though: to be able to execute the inference at the granularity of layers or even operations similar to PyTorch/TensorFlow. My goal is to be able to inject control-logic between layers.
I believe I can use PyTorch to achieve this or even TensorFlow eager execution mode, but in TensorRT at some point I need to call something as Context.execute_async, which runs the inference as a black box.
Is this something that can be done in TRT? if yes, where can I find resources that can I start from?
TensorRT Version: 7.1
GPU Type: Volta 512 Cores (Jetson Xavier)
Nvidia Driver Version:
CUDA Version: 10.2
Operating System + Version: Ubuntu 18.04 LTS
Python Version (if applicable): 3.6