I’m trying to run a batch inference job with MobileNetV2 trt engine. I created this engine from the ONNX model(attached). I’m using the following trtexec command for the batch size of 128 images:
!trtexec --workspace=4096 --onnx=mobilenetv2-7.onnx --shapes=input:128x3x224x224 --saveEngine=mobilenet_engine_int8_128.trt --int8 --maxBatch=128
When I use the attached jupyter notebook to run inference. I get the following error :
LogicError Traceback (most recent call last)
in
1 # Warm up:
----> 2 trt_model.predict(dummy_input_batch) # softmax probability predictions for the first 10 classes of the first sample
/mnt/TensorRT/quickstart/IntroNotebooks/onnx_helper.py in predict(self, batch)
67 self.context.execute_async_v2(self.bindings, self.stream.handle, None)
68 # Transfer predictions back
—> 69 cuda.memcpy_dtoh_async(self.output, self.d_output, self.stream)
70 # Syncronize threads
71 self.stream.synchronize()
LogicError: cuMemcpyDtoHAsync failed: an illegal memory access was encountered
Note : I’m able to get batch sizes of 32 and 64 working. I want to use the batch size of 128 and 256 as well.
Environment
TensorRT Version: 7.2.2.3
GPU Type: A100-40GB
Nvidia Driver Version: 460.39
CUDA Version: 11.1
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorflow:21.02-tf1-py3
Relevant Files
ONNX model Link : models/vision/classification/mobilenet/model at master · onnx/models · GitHub
Jupyter Notebook attached along with the supporting code.
Steps To Reproduce
- Download and start the container nvcr.io/nvidia/tensorflow:21.02-tf1-py3
- Mount the attached jupyter notebook and supporting python and ONNX files inside the container.
- Install and Start jupyter notebook server.
- Open the MobileNetV2 notebook.
- Execute the commands in the notebook.
MobileNet_Debug.zip (12.4 MB)