Loading batches with TensorRT python interface

trillian.2020.09.01 · September 6, 2020, 9:40am

Description

The TensorRT python samples include the following code for performing inference:

# This function is generalized for multiple inputs/outputs.
# inputs and outputs are expected to be lists of HostDeviceMem objects.
def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    # Run inference.
    context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()
    # Return only the host outputs.
    return [out.host for out in outputs]

Other framework (such as Tensorflow) have data loading mechanisms that load the next batch to the GPU while it is processing the current batch in order to better utilize the GPU. I couldn’t find TensorRT samples that work in this manner (only ones like the sample above). How do I implement such a mechanism using TensorRT’s python interface?

Environment

TensorRT Version: 7.0.0

AakankshaS · September 6, 2020, 6:02pm

Hi @trillian.2020.09.01,
Request you to check the below link

Thanks!

trillian.2020.09.01 · September 6, 2020, 6:06pm

Thanks!
Procedure 2 in this link is similar to the example that I sent. However, it seems that the next batch is copied to the GPU ( cuda.memcpy_htod_async) only after the stream.synchronize() was called for the current batch. That is, only after the current batch finished running and was copied back to the CPU and not while it was running. Am I missing something here?

AakankshaS · September 7, 2020, 6:07pm

Hi @trillian.2020.09.01,

You can launch the TensorRT models with separate execution context.
You can find some suggestions for TensorRT with multithread here:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#thread-safety

Thanks!

trillian.2020.09.01 · September 8, 2020, 5:22am

Thanks! So what you suggest is that I will write a second thread that pre-fetches the data of the next batch?
Is there any builtin mechanism or code sample for that? (I guess this scenario is quite common among TensorRT users who are interested in low inference time, so any time that it takes to copy data around reduces the performance improvements that we gained from optimizing the network with TensorRT).

AakankshaS · September 8, 2020, 5:32am

Hi @trillian.2020.09.01,
I am afraid, we dont have any sample available,
However you can find some assistance from the below link

Thanks!

Topic		Replies	Views
Run inference on a batch of images & parallel inference using cuda on python threads TensorRT tensorrt , cuda	6	2286	January 6, 2022
Latency when running TensorRT engine on two GPU TensorRT	9	1232	August 24, 2020
How to feed multiple inputs of images (batch of input images) to a Nvidia TensorRT in inference? TensorRT	4	1932	July 5, 2021
TF-TRT5: How to run tensorflow-tensorrt inferences with multiple GPUs TensorRT	10	3580	September 3, 2019
How to evaluate tensorrt forward inference time-consuming through python api ？ TensorRT tensorrt , cuda	1	391	June 20, 2022
Tensorrt Threads affect each other during multithreaded inference TensorRT tensorrt	16	1370	September 6, 2024
TRT5.0 Python API : how would I go to asynchronously load batches? TensorRT	3	989	October 12, 2021
TensorRT inference result of one image don't keep the same in high qps TensorRT tensorrt	1	603	June 29, 2022
How to use TensorRT by the multi-threading package of python Jetson AGX Xavier tensorrt	13	18638	October 18, 2021
how to run trt in multithreading？ Jetson TX2	15	7947	October 18, 2021

Loading batches with TensorRT python interface

Description

Environment

Related topics