TensorRT error : cuMemcpyDtoHAsync failed: an illegal memory access was encountered

I’m trying to run a batch inference job with MobileNetV2 trt engine. I created this engine from the ONNX model(attached). I’m using the following trtexec command for the batch size of 128 images:

!trtexec --workspace=4096 --onnx=mobilenetv2-7.onnx --shapes=input:128x3x224x224 --saveEngine=mobilenet_engine_int8_128.trt --int8 --maxBatch=128

When I use the attached jupyter notebook to run inference. I get the following error :

LogicError Traceback (most recent call last)
in
1 # Warm up:
----> 2 trt_model.predict(dummy_input_batch) # softmax probability predictions for the first 10 classes of the first sample

/mnt/TensorRT/quickstart/IntroNotebooks/onnx_helper.py in predict(self, batch)
67 self.context.execute_async_v2(self.bindings, self.stream.handle, None)
68 # Transfer predictions back
—> 69 cuda.memcpy_dtoh_async(self.output, self.d_output, self.stream)
70 # Syncronize threads
71 self.stream.synchronize()

LogicError: cuMemcpyDtoHAsync failed: an illegal memory access was encountered

Note : I’m able to get batch sizes of 32 and 64 working. I want to use the batch size of 128 and 256 as well.

Environment

TensorRT Version: 7.2.2.3
GPU Type: A100-40GB
Nvidia Driver Version: 460.39
CUDA Version: 11.1
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorflow:21.02-tf1-py3

Relevant Files

ONNX model Link : models/vision/classification/mobilenet/model at master · onnx/models · GitHub

Jupyter Notebook attached along with the supporting code.

Steps To Reproduce

  1. Download and start the container nvcr.io/nvidia/tensorflow:21.02-tf1-py3
  2. Mount the attached jupyter notebook and supporting python and ONNX files inside the container.
  3. Install and Start jupyter notebook server.
  4. Open the MobileNetV2 notebook.
  5. Execute the commands in the notebook.
    MobileNet_Debug.zip (12.4 MB)

Hi @hkr1990,

We could reproduce the same error. We are looking into it.
Please allow us some time.

Thank you

Hi @hkr1990,

Looks like you’re using both PyTorch and PyCUDA. Its better to use PyTorch device tensors directly, and drop PyCUDA completely. For your reference similar issue,
https://github.com/NVIDIA/TensorRT/issues/1133#issuecomment-809509799

Thank you.