Description
I’m using tensorrt to run a mask-rcnn model, and using pytorch to postprocess the result. when the inference result contains more than 2 bounding boxes, and I print the result, a GPU tensor, it raises an error:“RuntimeError: CUDA error: invalid configuration argument”. But I can print the tensor after I convert it to cpu. While the inference result contains less than 2 bounding boxes, I can print the tensor in both CPU and GPU.
Can anyone help ?
Environment
Environment is a docker image : nvcr.io/nvidia/tensorrt:21.09-py3
TensorRT Version: 8.0.3.0
GPU Type: T4
Nvidia Driver Version: 450.51.06
CUDA Version: 11.4
CUDNN Version:
Operating System + Version:
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.10.0+cu113
Baremetal or Container (if container which image + tag):
Steps To Reproduce
from PIL import Image
import numpy as np
import torch
import ctypes
import tensorrt as trt
engine_path = 'engine.fp32'
image_path='COCO_val2014_000000002640.jpg'
original_image = np.array(Image.open(image_path).convert('RGB'))[:, :, ::-1]
dtype = torch.float32
outputs_tensor = {
"scores": torch.zeros(size=(100, 1), dtype=dtype, device=torch.device('cuda')),
"boxes": torch.zeros(size=(100, 4), dtype=dtype, device=torch.device('cuda')),
"labels": torch.zeros(size=(100, 1), dtype=dtype, device=torch.device('cuda')),
"masks": torch.zeros(size=(100, 1, 28, 28), dtype=dtype, device=torch.device('cuda')),
}
PLUGIN_LIBRARY = 'libmyplugins.so'
ctypes.CDLL(PLUGIN_LIBRARY)
print('init_libnvinfer_plugins success: ',
trt.init_libnvinfer_plugins(None, "")
)
with trt.Logger() as logger, trt.Runtime(logger) as runtime:
with open(engine_path, mode='rb') as f:
engine_bytes = f.read()
engine = runtime.deserialize_cuda_engine(engine_bytes)
context = engine.create_execution_context()
input_tensor = torch.as_tensor(original_image.astype("float32")).to(torch.device('cuda'))
bindings = [None] * 5
bindings[engine.get_binding_index("images")] = input_tensor.contiguous().data_ptr()
for output_name, output_tensor in outputs_tensor.items():
idx = engine.get_binding_index(output_name)
bindings[idx] = output_tensor.data_ptr()
context.execute_async(1, bindings, torch.cuda.current_stream().cuda_stream)
print('shape: ', outputs_tensor['scores'].shape)
print('scores cpu: ', outputs_tensor['scores'].cpu()[:5]) # No Error
# when the value of more than two elements are greater than the threshold, the following error will be reported
# # RuntimeError: CUDA error: invalid configuration argument .
# when less than two elements are greater than the threshold, no error
print('scores:', outputs_tensor['scores'][:5])
print(torch.randn((2, 4), device='cuda'))