Confusion about TensorRT stream.synchronize() in GPU-only inference

user15940 · October 28, 2022, 10:19am

Description

Hi, I’m trying to use tersorrt inference in a GPU-only pipeline:

def create_output_buffers(self, batch_size):

        outputs = [None] * len(self.output_names)
        for i, output_name in enumerate(self.output_names):
            idx = self.engine.get_binding_index(output_name)
            dtype = torch_dtype_from_trt(self.engine.get_binding_dtype(idx))
            if self.final_shapes is not None:
                shape = (batch_size, ) + self.final_shapes[i]
            else:
                shape = (batch_size, ) + tuple(self.engine.get_binding_shape(idx))
            device = self.torch_device_from_trt(self.engine.get_location(idx))
            output = torch.empty(size=shape, dtype=dtype, device=device)
            outputs[i] = output
        return outputs

    def execute(self, *inputs):

        batch_size = inputs[0].shape[0]

        bindings = [None] * (len(self.input_names) + len(self.output_names))

        # map input bindings
        inputs_torch = [None] * len(self.input_names)
        for i, name in enumerate(self.input_names):
            idx = self.engine.get_binding_index(name)

            inputs_torch[i] = inputs[i].to(self.torch_device_from_trt(self.engine.get_location(idx)))
            inputs_torch[i] = inputs_torch[i].type(torch_dtype_from_trt(self.engine.get_binding_dtype(idx)))

            bindings[idx] = int(inputs_torch[i].data_ptr())

        output_buffers = self.create_output_buffers(batch_size)

        # map output bindings
        for i, name in enumerate(self.output_names):
            idx = self.engine.get_binding_index(name)
            bindings[idx] = int(output_buffers[i].data_ptr())

        context = self.context.get(timeout=10)
        context.execute_async_v2(bindings, torch.cuda.current_stream().cuda_stream)
        self.context.put(context)

        outputs = output_buffers
        # torch.cuda.current_stream().synchronize()

        return outputs

as showing above, the input is a torch tensor that already in GPU device, the trt execute function only do the inference with no data d2h or h2d

As i see, torch.cuda.current_stream().synchronize() is used for hold cpu until d2h is finished, so I’m confused is it necessary to do torch.cuda.current_stream().synchronize() here when there is no data transfer between cpu and gpu?

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

spolisetty · November 2, 2022, 4:02pm

Hi,

Hope the following information may help you.

Thank you.

Topic		Replies	Views
TensorRT waiting after inference seemingly for no reason TensorRT tensorrt , cuda , performance , python	12	1559	October 20, 2022
TensorRT ROS2 Node TensorRT	1	795	November 15, 2023
Direct GPU Inference TensorRT	7	2015	October 12, 2021
Stream.synchronize() is slow (python API) TensorRT	5	2194	August 24, 2021
TensorRT inference result of one image don't keep the same in high qps TensorRT tensorrt	1	603	June 29, 2022
How do I do an inference of a tensorrt.plan model utilizing python? TensorRT tensorrt , cuda , python	1	539	March 28, 2023
Work with batch in TensorRT TensorRT tensorrt , opencv , cuda , tensorflow	20	3809	July 20, 2021
Question about Python tutorial TensorRT	3	532	October 12, 2021
Loading batches with TensorRT python interface TensorRT	5	504	September 8, 2020
ERROR: engine.cpp (370) - Cuda Error in ~ExecutionContext: 77 TensorRT	0	1056	June 19, 2019

Confusion about TensorRT stream.synchronize() in GPU-only inference

Description

Environment

Relevant Files

Steps To Reproduce

Related topics