Run inference on a batch of images & parallel inference using cuda on python threads

aiad789 · November 10, 2021, 7:54pm

Description

Hello ,

I have Nvidia Xavier and I’ve managed to convert SSD Mobile Net V2 to .trt and run inference following the steps in the below :
https://github.com/pskiran1/TensorRT-support-for-Tensorflow-2-Object-Detection-Models
I have two inquiries :
-Is it possible to run an inference on a batch of images all at once ? and how to do this in python ? the infer.py in the above link only does this for single image at a time
-Is it possible to run parallel inference using cuda on python threads (I tried to do this but got broken pipe error) ? I want to run multiple thread or processes ,each doing an inference

Thanks
Ayad

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

NVES · November 11, 2021, 5:08am

Hi,
The below link might be useful for you
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#thread-safety

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.

Thanks!

aiad789 · November 11, 2021, 8:33am

Thank you for the links ,I will look into them .I just want to know if my inquiries are possible ?

Regards
Ayad

spolisetty · November 30, 2021, 3:24pm

Hi,

Please refer the following link for dynamic shape inputs to give dynamic batch size.

Thank you.

aiad789 · December 1, 2021, 10:46pm

Thanks for the reply
I still didn’t a clear response to my inquiry and I don’t see how the above link can help me .I tried to do batch inference using cuda stream and I was only able to get inference for the first image and the rest of the images result in zeros .Im using Tensorrt 8.0 on Xavier .Is batch inference possible using python .I’m using the following code ,can you please check and let me know how I can achieve batch inference

class TensorRTInfer:
“”"
Implements inference for the Model TensorRT engine.
“”"

def __init__(self, engine):
    """
    :param engine_path: The path to the serialized engine to load from disk.
    """

    # Load TRT engine
    self.cfx = cuda.Device(0).make_context()
    self.stream = cuda.Stream()
    self.engine = engine
    self.context = self.engine.create_execution_context()

    # Setup I/O bindings
    self.inputs1 = []
    self.outputs1 = []
    self.allocations1 = []

    for i in range(self.engine.num_bindings):
       
        name = self.engine.get_binding_name(i)
        dtype = self.engine.get_binding_dtype(i)
        shape = self.engine.get_binding_shape(i)
      
        size = np.dtype(trt.nptype(dtype)).itemsize * batch_size
        for s in shape:
            size *= s
        allocation1 = cuda.mem_alloc(size)

        binding1 = {
            'index': i,
            'name': name,
            'dtype': np.dtype(trt.nptype(dtype)),
            'shape': list(shape),
            'allocation': allocation1,
        }

        self.allocations1.append(allocation1)

        if self.engine.binding_is_input(i):
            self.inputs1.append(binding1)

        else:
            self.outputs1.append(binding1)
         
    self.outputs2 = []
    for shape, dtype in self.output_spec():
        shape[0]=shape[0] *batch_size 
        self.outputs2.append(np.zeros(shape, dtype))
    print("done building..")

def input_spec(self):
    """
    Get the specs for the input tensor of the network. Useful to prepare memory allocations.
    :return: Two items, the shape of the input tensor and its (numpy) datatype.
    """
    return self.inputs[0]['shape'], self.inputs[0]['dtype']

def output_spec(self):
    """
    Get the specs for the output tensors of the network. Useful to prepare memory allocations.
    :return: A list with two items per element, the shape and (numpy) datatype of each output tensor.
    """
    specs = []
    for o in self.outputs1:
        specs.append((o['shape'], o['dtype']))
  
    return specs

def h_to_d(self, batch):
    self.batch = batch
    cuda.memcpy_htod_async(self.inputs1[0]['allocation'], np.ascontiguousarray(batch))   
def destory(self):
    self.cfx.pop()
def d_to_h(self):      
   
    for o in range(len(self.outputs2)):
       cuda.memcpy_dtoh_async(self.outputs2[o], self.outputs1[o]['allocation'], self.stream)
    print(self.outputs2[2])
    return self.outputs2
def infer_this(self):
    self.cfx.push()
    self.context.execute_async(batch_size=1,bindings=self.allocations1, stream_handle=self.stream.handle)
    self.cfx.pop()

aiad789 · December 9, 2021, 11:55am

Any update please ?

Regards

spolisetty · January 6, 2022, 12:01pm

Hi,

Looks like your code not handling inference of batch properly.
Previous link I shared to give batch size (greater than 1) dynamically.
Please refer below sample to run inference on a batch of images,

github.com

NVIDIA/TensorRT/blob/master/samples/python/efficientnet/infer.py#L96


      
              """
              return self.inputs[0]['shape'], self.inputs[0]['dtype']
          
          
def output_spec(self):
              """
              Get the specs for the output tensor of the network. Useful to prepare memory allocations.
              :return: Two items, the shape of the output tensor and its (numpy) datatype.
              """
              return self.outputs[0]['shape'], self.outputs[0]['dtype']
          
          
def infer(self, batch, top=1):
              """
              Execute inference on a batch of images. The images should already be batched and preprocessed, as prepared by
              the ImageBatcher class. Memory copying to and from the GPU device will be performed here.
              :param batch: A numpy array holding the image batch.
              :param top: The number of classes to return as top_predicitons, in descending order by their score. By default,
              setting to one will return the same as the maximum score class. Useful for Top-5 accuracy metrics in validation.
              :return: Three items, as numpy arrays for each batch image: The maximum score class, the corresponding maximum
              score, and a list of the top N classes and scores.
              """
              # Prepare the output data

Thank you.

Topic		Replies	Views
TensorRT inference result of one image don't keep the same in high qps TensorRT tensorrt	1	601	June 29, 2022
Can TensorRT do inference in a child thread ? TensorRT	6	2199	August 11, 2020
[TensorRT] engine happed a error in multithreaded TensorRT tensorrt , cuda	2	1524	January 19, 2023
Can multiple CUDA contexts share an inference engine? TensorRT tensorrt , cuda	3	36	January 21, 2025
TensorRT multi stream TensorRT	3	2569	February 29, 2024
Loading batches with TensorRT python interface TensorRT	5	503	September 8, 2020
How can I optimize multi-batch and parallel inference in TensorRT for faster performance on high-resolution image patches? TensorRT tensorrt , cuda , ubuntu , python , cudnn , deep-learning	2	74	December 2, 2024
How to feed multiple inputs of images (batch of input images) to a Nvidia TensorRT in inference? TensorRT	4	1912	July 5, 2021
Adding multiple inference on TensorRT (Invalid Resource Handle Error) TensorRT	2	1704	December 4, 2019
Work with batch in TensorRT TensorRT tensorrt , opencv , cuda , tensorflow	20	3784	July 20, 2021

Run inference on a batch of images & parallel inference using cuda on python threads

Description

Environment

Relevant Files

Steps To Reproduce

Related topics