Description
Hi, I’m trying to use tersorrt inference in a GPU-only pipeline:
def create_output_buffers(self, batch_size):
outputs = [None] * len(self.output_names)
for i, output_name in enumerate(self.output_names):
idx = self.engine.get_binding_index(output_name)
dtype = torch_dtype_from_trt(self.engine.get_binding_dtype(idx))
if self.final_shapes is not None:
shape = (batch_size, ) + self.final_shapes[i]
else:
shape = (batch_size, ) + tuple(self.engine.get_binding_shape(idx))
device = self.torch_device_from_trt(self.engine.get_location(idx))
output = torch.empty(size=shape, dtype=dtype, device=device)
outputs[i] = output
return outputs
def execute(self, *inputs):
batch_size = inputs[0].shape[0]
bindings = [None] * (len(self.input_names) + len(self.output_names))
# map input bindings
inputs_torch = [None] * len(self.input_names)
for i, name in enumerate(self.input_names):
idx = self.engine.get_binding_index(name)
inputs_torch[i] = inputs[i].to(self.torch_device_from_trt(self.engine.get_location(idx)))
inputs_torch[i] = inputs_torch[i].type(torch_dtype_from_trt(self.engine.get_binding_dtype(idx)))
bindings[idx] = int(inputs_torch[i].data_ptr())
output_buffers = self.create_output_buffers(batch_size)
# map output bindings
for i, name in enumerate(self.output_names):
idx = self.engine.get_binding_index(name)
bindings[idx] = int(output_buffers[i].data_ptr())
context = self.context.get(timeout=10)
context.execute_async_v2(bindings, torch.cuda.current_stream().cuda_stream)
self.context.put(context)
outputs = output_buffers
# torch.cuda.current_stream().synchronize()
return outputs
as showing above, the input is a torch tensor that already in GPU device, the trt execute function only do the inference with no data d2h or h2d
As i see, torch.cuda.current_stream().synchronize() is used for hold cpu until d2h is finished, so I’m confused is it necessary to do torch.cuda.current_stream().synchronize() here when there is no data transfer between cpu and gpu?
Environment
TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered