Can't perform inference using Python API of TensorRT

Description

While following the instructions at TensorRT docs to run inference on using Python API, I am getting memory allocation error.

pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory

Environment

TensorRT Version: 7.1.3.4
GPU Type: Titan X
Nvidia Driver Version: 450.51.06
CUDA Version: 11.0
CUDNN Version: 8.0.5.39
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.6
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @mmaaz60,
Can you please try running your model using trtexec with verbose, and share the logs with us?
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks!