Speeding up Deep Learning Inference Using TensorFlow, ONNX, and TensorRT

Originally published at: https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/

Starting with TensorRT 7.0,  the Universal Framework Format (UFF) is being deprecated. In this post, you learn how to deploy TensorFlow trained deep learning models using the new TensorFlow-ONNX-TensorRT workflow. Figure 1 shows the high-level workflow of TensorRT. Figure 1. TensorRT is an inference accelerator. First, a network is trained using any framework. After a…

I am not able to do
import engine as eng

I am getting :
ModuleNotFoundError: No module named 'engine'

What should I install for this?

Can you check this ?Download the code examples in this post.

In the section “Creating the TensorRT engine from ONNX”, please copy the code in a file engine.py then you can import that file.

1 Like

The code for doing inference using TensorRT cannot work with flask API?

an error caused at stream = cuda.Stream()
with error msg:
pycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?

when I tried to add:
cfx = cuda.Device(0).make_context()

do inference

cfx.pop()

new errors will show

any idea how to solve it?

Hi

I already have the .onnx files for these models: InceptionV1, V3 and V4.

How much will the scripts: engine.py and buildEngine.py and inference.py will have to be changed?

Have you done any example with those models?

Thank you

Hi
If you already have .onnx files, you need to modify the scripts accordingly under the same workflow in your own way.
Or, for latency benchmark, you can try with ‘trtexec’ tool referring to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/trtexec/README.md#example-4-running-an-onnx-model-with-full-dimensions-and-dynamic-shapes

1 Like

Will your code work with TensorRT 7.1?

I am checking this GIT onnx-tensorrt and that is where I found this image.

Have you done the same in your code? Or is it better to install a version lower than 7.1 to reuse your code?

Adding info: On this post, we used onnx 1.6.0 (OPSET 11).

Thanks, what about the Tensorflow version ?

1 Like

For reference, this is the Dockerfile I used:

FROM nvcr.io/nvidia/tensorflow:20.03-tf1-py3
WORKDIR /workspace
ADD requirements.txt .
RUN pip install -r requirements.txt
# docker build -t tf:20.03-tf1-py3 .
# docker run -it -u $(id -u):$(id -g) -v $(pwd):/workspace --rm tf:20.03-tf1-py3 bash

requirements.txt:


keras
keras2onnx
onnx==1.6.0
pycuda
tf2onnx
tensorrt

Keras ==2.3.1
keras2onnx==1.6.0
onnx==1.6.0
pycuda==2019.1.2
tf2onnx==1.6.0

tensorrt 7.0.0.11

These code snippets do not work with TF2.0. Can you please rectify them? Especially loadResNet.py

1 Like

Hello

I am using the TX2 which has a shared memory.
Is your code using the Unified Memory? If not, can you give me some clues on how to implement it using PyCUDA?

Thank you

Hi! We have added the TF2 code example in the post.

Has anyone run this? The code has errors:

def load_engine(trt_runtime, plan_path):
   with open(engine_path, 'rb') as f:
       engine_data = f.read()
   engine = trt_runtime.deserialize_cuda_engine(engine_data)
   return engine

engine_path does not exist. Is this supposed to be plan_path?

@loophole64 – Good catch! I’ve made that fix.

1 Like