Cuda Error in executeMemcpy: 1 (invalid argument)

Description

I run the code in run-cifar-engine.py (see attached files) to use the engine arch_00000.trt (see attached files) with the python API.
The engine was created from the attached onnx file via:

trtexec --onnx=arch_00000.onnx --saveEngine=arch_00000.trt

Both the build and the inference passed.

Upon execution of the following line I get the error below:

35: context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
[TensorRT] ERROR: ../rtSafe/cuda/genericReformat.cu (1294) - Cuda Error in executeMemcpy: 1 (invalid argument)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception

Things I have unsuccessfully tried:

  1. Try it on a different machine.
    Machine 1 is a Container using a V100
    Machine 2 is a jetson nano.

  2. I have searched and found the following issue on github:
    https://github.com/NVIDIA/TensorRT/issues/421
    As far as I can tell my inputs and outputs are correctly sized.

Questions:

  1. Does anyone know why this happens and how to fix it ?
    Alternatively:
  2. How do I go about debugging this ?

I would be thankful for any advice

Environment

TensorRT Version: 7.2.3-1+cuda11.1
GPU Type: Nvidia Tesla V100 32GB
Nvidia Driver Version: 465.27
CUDA Version: 11.3
CUDNN Version:
Operating System + Version: Ubuntu 20.04.2 LTS
Python Version (if applicable): 3.8.5
TensorFlow Version (if applicable):
PyTorch Version (if applicable): Model was exported to onnx from pytorch v1.5.0
Baremetal or Container (if container which image + tag): Container: nvcr.io/nvidia/tensorrt:21.05-py3

Relevant Files

https://drive.google.com/drive/folders/1hUyXW3nWyH8cEodsqB0LpmmqYAjFVS7u?usp=sharing

Steps To Reproduce

  1. Download files and go to the directory

  2. Use docker/podman to start container:

podman run -it --rm -v $(pwd):/workdir -w /workdir nvcr.io/nvidia/tensorrt:21.05-py3
or
docker run -it --rm -v $(pwd):/workdir -w /workdir nvcr.io/nvidia/tensorrt:21.05-py3
  1. python3 run-cifar-engine.py

Hi @guenther.meyer,

We could reproduce this error. Please allow us some time to work on this issue.

Thank you.

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#onnx-export

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hello,

  1. Check_model.py runs without throwing an error.
  2. I used the following command to create the log you requested:
    trtexec --onnx=arch_00000.onnx --saveEngine=arch_00000.trt --verbose > trtexec-Verbose.txt 2>&1
    The log trtexec-Verbose.txt as well as the onnx file arch_00000.onnx is available here:
    nvida-min-example - Google Drive

Did you find a solution ?

Hi,

There are a few issues in the user’s script:

  1. cuda.memcpy_htod_async(input_memory, input_buffer, stream)
    Device Interface - pycuda 2021.1 documentation
    src must be page-locked memory, see, e.g. pagelocked_empty() .
  2. The network has 2 outputs. The script needs to be updated to retain output_memory and output_buffer addresses. Else, one of the addresses gets deallocated.

Please find working script here.
working_script.py (2.5 KB)

1 Like