I am learning to use TensorRT. I already have a .engine file. I cannot use “trtexec” because I am doing a benchmark, so I need to read some parameters from the kernel (ex: temperature in GPU).
Where can I find a good example which teaches me how to make inference using TensorRT with python? Line by line (loading the image, loading the labels, loading the .engine, making preprocessing, etc etc). Is there a book, a course (maybe in Deep Learning Institute)?
You can check this example for TensorRT python inference:
engine = builder.build_cuda_engine(network)
buf = engine.serialize()
with open(model.TRTbin, 'wb') as f:
# create engine
with open(model.TRTbin, 'rb') as f:
buf = f.read()
engine = runtime.deserialize_cuda_engine(buf)
# create buffer
host_inputs = 
cuda_inputs = 
host_outputs = 
cuda_outputs = 
I am analyzing the code in your GIT. I have some questions:
what is the purpose of the graph surgeon here:
Will I need it with the Inception: v1, v2, v3 and v4 models? Those models are for classifying images.
Sorry for the late update
1. The batch size is hardcoded into 1 in this sample.
You can modify it based on your use case here:
Binding is the input/output tensor used by the model.
For example, binding index=0 indicates the input image buffer here.
And index=1 or index=2 represents for output bbox and conf layer.
The batch is affected in the first dimension of the buffer.
For example, the input buffer in this sample is [batch, 3, 300, 300]
3. The python script is used for converting a .pb file into .uff file for TensorRT.
You can find a general script for the conversion here: