I have Nvidia Xavier and I’ve managed to convert SSD Mobile Net V2 to .trt and run inference following the steps in the below : https://github.com/pskiran1/TensorRT-support-for-Tensorflow-2-Object-Detection-Models
I have two inquiries :
-Is it possible to run an inference on a batch of images all at once ? and how to do this in python ? the infer.py in the above link only does this for single image at a time
-Is it possible to run parallel inference using cuda on python threads (I tried to do this but got broken pipe error) ? I want to run multiple thread or processes ,each doing an inference
Thanks
Ayad
Environment
TensorRT Version: GPU Type: Nvidia Driver Version: CUDA Version: CUDNN Version: Operating System + Version: Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Thanks for the reply
I still didn’t a clear response to my inquiry and I don’t see how the above link can help me .I tried to do batch inference using cuda stream and I was only able to get inference for the first image and the rest of the images result in zeros .Im using Tensorrt 8.0 on Xavier .Is batch inference possible using python .I’m using the following code ,can you please check and let me know how I can achieve batch inference
class TensorRTInfer:
“”"
Implements inference for the Model TensorRT engine.
“”"
def __init__(self, engine):
"""
:param engine_path: The path to the serialized engine to load from disk.
"""
# Load TRT engine
self.cfx = cuda.Device(0).make_context()
self.stream = cuda.Stream()
self.engine = engine
self.context = self.engine.create_execution_context()
# Setup I/O bindings
self.inputs1 = []
self.outputs1 = []
self.allocations1 = []
for i in range(self.engine.num_bindings):
name = self.engine.get_binding_name(i)
dtype = self.engine.get_binding_dtype(i)
shape = self.engine.get_binding_shape(i)
size = np.dtype(trt.nptype(dtype)).itemsize * batch_size
for s in shape:
size *= s
allocation1 = cuda.mem_alloc(size)
binding1 = {
'index': i,
'name': name,
'dtype': np.dtype(trt.nptype(dtype)),
'shape': list(shape),
'allocation': allocation1,
}
self.allocations1.append(allocation1)
if self.engine.binding_is_input(i):
self.inputs1.append(binding1)
else:
self.outputs1.append(binding1)
self.outputs2 = []
for shape, dtype in self.output_spec():
shape[0]=shape[0] *batch_size
self.outputs2.append(np.zeros(shape, dtype))
print("done building..")
def input_spec(self):
"""
Get the specs for the input tensor of the network. Useful to prepare memory allocations.
:return: Two items, the shape of the input tensor and its (numpy) datatype.
"""
return self.inputs[0]['shape'], self.inputs[0]['dtype']
def output_spec(self):
"""
Get the specs for the output tensors of the network. Useful to prepare memory allocations.
:return: A list with two items per element, the shape and (numpy) datatype of each output tensor.
"""
specs = []
for o in self.outputs1:
specs.append((o['shape'], o['dtype']))
return specs
def h_to_d(self, batch):
self.batch = batch
cuda.memcpy_htod_async(self.inputs1[0]['allocation'], np.ascontiguousarray(batch))
def destory(self):
self.cfx.pop()
def d_to_h(self):
for o in range(len(self.outputs2)):
cuda.memcpy_dtoh_async(self.outputs2[o], self.outputs1[o]['allocation'], self.stream)
print(self.outputs2[2])
return self.outputs2
def infer_this(self):
self.cfx.push()
self.context.execute_async(batch_size=1,bindings=self.allocations1, stream_handle=self.stream.handle)
self.cfx.pop()
Looks like your code not handling inference of batch properly.
Previous link I shared to give batch size (greater than 1) dynamically.
Please refer below sample to run inference on a batch of images,