Description
We are working on a Jetson Xavier NX with Jetpack 4.4.
We are trying to implement a TensorRT engine using Python and then use the whole module as a service from C++.
The Python code loads an existing TensorRT model and then receives a picture from the C++ code and uses it in the model.
We already have a similar setup that uses Python code to work with a different computing platform(Coral’s Edge TPU).
We tested the Python code as a standalone(no C++), reading a picture from a file. This method proved as working as expected.
When we integrated the Python code from C++ using boost python we are crashing while calling pycuda.driver.memcpy_htod_async with this printed :
#assertiongridAnchorPlugin.cpp,205
We checked the data format and content and it is the same in both cases(running standalone Python and running via C++)
Is there a way to understand what this assert mean and what can we do?
Environment
TensorRT Version:
GPU Type: Nvidia CUDA
Nvidia Driver Version: Jetpack 4.4
CUDA Version: 10.2.89
CUDNN Version: 8
Operating System + Version: Ubuntu 18.04 LTS
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
Snippet from the Python code:
def init(self, model, labels):
print('Initializing TensorRT engine...')
# loop over the class labels file
for row in open(labels):
# unpack the row and update the labels dictionary
(classID, label) = row.strip().split(maxsplit=1)
self.labels[int(classID)] = label.strip()
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
self.runtime = trt.Runtime(TRT_LOGGER)
self.layout = 7 # size of one-detection tuple (index, label, conf, xmin, ymin, xmax, ymax)
self.height = 300
self.width = 300
### create engine ###
with open(model, 'rb') as f:
buf = f.read()
self.engine = self.runtime.deserialize_cuda_engine(buf)
### create buffer ###
self.host_inputs = []
self.cuda_inputs = []
self.host_outputs = []
self.cuda_outputs = []
self.bindings = []
self.stream = cuda.Stream()
for binding in self.engine:
size = trt.volume(self.engine.get_binding_shape(binding)) * self.engine.max_batch_size
self.host_mem = cuda.pagelocked_empty(size, np.float32)
self.cuda_mem = cuda.mem_alloc(self.host_mem.nbytes)
self.bindings.append(int(self.cuda_mem))
if self.engine.binding_is_input(binding):
self.host_inputs.append(self.host_mem)
self.cuda_inputs.append(self.cuda_mem)
else:
self.host_outputs.append(self.host_mem)
self.cuda_outputs.append(self.cuda_mem)
self.context = self.engine.create_execution_context()
def eval(self, img, width, height):
len = width * height * 3
image = np.frombuffer(img, dtype=np.uint8, count=len)
image = image.reshape(width, height, 3)
image = (2.0/255.0) * image - 1.0
image = image.transpose((2, 0, 1))
np.copyto(self.host_inputs[0], image.ravel())
self.lock.acquire()
cuda.memcpy_htod_async(self.cuda_inputs[0], self.host_inputs[0], self.stream)