Now I have a trt engine which is converted from onnx2trt.
When I load this engine and directly see what its max_batch_size is, it shows 32.
However, I just wanna to test only one image, and I cannot set the engine.max_batch_size value. (Even I already set max_batch_size is 1, but the value of what I print seems like different between engine.max_batch_size and max_batch_size)
As the engine.max_batch_size is 32, it will create a wrong buffer during the allocate_buffers(engine) stage.
In the infer() stage, there is a step below:
np.copyto(self.inputs[0].host, img.ravel())
The output is
self.inputs[0].host 88473600
img.ravel() 2764800
Because of the engine.max_batch_size 32, we can know 32*2764800 = 88473600.
It makes me wrong on here.
See :
def load_engine(trt_runtime, engine_path):
with open(engine_path, 'rb') as f:
engine_data = f.read()
engine = trt_runtime.deserialize_cuda_engine(engine_data)
print("Engine.max_batch_size",engine.max_batch_size)
return engine
Output:
Engine.max_batch_size 32
I have some questions for this thing.
Why is the default of engine.max_batch_size 32?
How to setup the engine.max_batch_size? (Not normal max_batch_size)
o Linux distro ; Ubuntu 18.04
o GPU type : 1060
o Nvidia driver version : 440
o CUDA version : 10.0
o CUDNN version : 7.6.5
o Python version [if using python] : 3.6.9
o Tensorflow and PyTorch version : TF 1.14
o TensorRT version : 7.0.0.11
Default max batch size in onnx2trt is 32. Please refer below link:
You can use either -b option to generate engine with different max batch size or you can use directly TRT APIs to set the max batch size, please refer below link: