During inference, stream.synchronize() is very slow. Is there any approach to get rid of it ?
TensorRT Version: 22.214.171.124
GPU Type: T4
Nvidia Driver Version: 450
CUDA Version: 11.0
CUDNN Version: 8.2.0
Operating System + Version: CENTOS7
Python Version (if applicable): 3.7.19
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
def do_inference(context, bindings, inputs, outputs, stream, batch_size): # Transfer data from CPU to the GPU. [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs] # Run inference. context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle) # Transfer predictions back from the GPU. [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs] # Synchronize the stream stream.synchronize() # Return only the host outputs. return [out.host for out in outputs]
Data transfer between cpu & gpu and execution speed is both okay.
stream.synchronize() almost takes 90% percent of time in this function.
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered