Hello,
I got an error with my code in notebook executing a tensorRT model with threading. It work well without threading. Can someone help me to solve this ?
The tensorRT model I build is with a compiled version of tensorRT 8205 on my jetson nano from source in github.
I use the compiled version of tensorRT 8205 Python API on my jetson nano too.
The tensorRT model is one create from a onnx created with Tensorflow SSD-Mobilenetv2_320x320 from lastest python script in github (ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8).
Again, it work very well if I do not use threading : I can call the engine with tensortRT api and the model is able to detect objetct with classes… I got no error.
There are preprocessing in the model and shape seems to be dynamic because dynamic appear in the log… can it be the problem ? Do i need to convert the onnx to non dynamic shape before convert it with tensorrt ?
Thanks you very much !
I upload the engine and the notebook and the onnx too.
My code is :
model.engine (9.1 MB)
Essai.ipynb (11.8 KB)
model (1).onnx (10.4 MB)
import tensorrt as trt
# Construction de la class du logger
class MyLogger(trt.ILogger):
def __init__(self):
trt.ILogger.__init__(self)
def log(self, severity, msg):
print("%s : %s" %(severity,msg))
pass
import threading
class myThread(threading.Thread):
def __init__(self, func):
threading.Thread.__init__(self)
self.func = func
def run(self):
print ("Starting ")
self.func()
print ("Exiting ")
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import threading
import time
class TRTInference:
def __init__(self,repertoire_engine):
# Initialisation du runtime TensorRT
self.logger = MyLogger()
#self.TRT_LOGGER = trt.Logger(self.logger)
trt.init_libnvinfer_plugins(self.logger, namespace="")
self.runtime = trt.Runtime(self.logger)
# Chargement du moteur
print("Chargement du moteur...")
with open(repertoire_engine, "rb") as f:
self.engine = self.runtime.deserialize_cuda_engine(f.read())
#Initialisation du context Cuda et du contexte TensorRT
self.cfx = cuda.Device(0).make_context()
self.stream = cuda.Stream()
self.context = self.engine.create_execution_context()
# Réservation de la mémoire pour l'entrée
print("Allocation mémoire...")
size_input = trt.volume(self.engine.get_binding_shape(0))*self.engine.max_batch_size
self.input_host_mem = cuda.pagelocked_empty(size_input, trt.nptype(trt.float32))
self.input_device_mem = cuda.mem_alloc(self.input_host_mem.nbytes)
# Réservation de la mémoire pour les sorties
self.output_device_mem = [];
format_sorties = [];
types_sorties = [];
for i in range(self.engine.num_bindings):
if not self.engine.binding_is_input(i):
size_output = trt.volume(self.engine.get_binding_shape(i))*self.engine.max_batch_size
output_hm = cuda.pagelocked_empty(size_output, trt.nptype(trt.float32))
self.output_device_mem.append(cuda.mem_alloc(output_hm.nbytes))
format_sorties.append(self.engine.get_binding_shape(i))
types_sorties.append(trt.nptype(self.engine.get_binding_dtype(i)))
# Récupère les adresses en GPU des buffers entrées / sorties
binding_entree = int(self.input_device_mem)
binding_sorties = []
for output_ in self.output_device_mem:
binding_sorties.append(int(output_))
self.bindings = [binding_entree, binding_sorties[0],binding_sorties[1],binding_sorties[2],binding_sorties[3]]
# Allocation de la mémoire hote pour les sorties
self.output_host_mem = []
for i in range(len(self.output_device_mem)):
self.output_host_mem.append(np.zeros(format_sorties[i],types_sorties[i]))
# Input tensor
self.image = np.zeros((320,320,3), dtype=trt.nptype(self.engine.get_binding_dtype(0)))
# Inférence
def CalculModele(self):
threading.Thread.__init__(self)
self.cfx.push()
# Copie de l'image dans le tenseur d'entrée
x = self.image.astype(np.float32)
x = np.expand_dims(x,axis=0) # (1,320,320,3)
np.copyto(self.input_host_mem,x.ravel())
# Transfert de l'entrée vers le GPU
cuda.memcpy_htod(self.input_device_mem, self.input_host_mem)
# Appel du modèle
self.context.execute(batch_size=1,bindings=self.bindings)
# Récupération des sorties
for i in range(len(self.output_host_mem)):
cuda.memcpy_dtoh(self.output_host_mem[i], self.output_device_mem[i])
self.cfx.pop()
def destory(self):
self.cfx.pop()
trt_inference_wrapper = TRTInference(repertoire_engine="model.engine")
Severity.VERBOSE : Registered plugin creator - ::BatchTilePlugin_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::BatchedNMS_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::CoordConvAC version 1
Severity.VERBOSE : Registered plugin creator - ::CropAndResize version 1
Severity.VERBOSE : Registered plugin creator - ::CropAndResizeDynamic version 1
Severity.VERBOSE : Registered plugin creator - ::DecodeBbox3DPlugin version 1
Severity.VERBOSE : Registered plugin creator - ::DetectionLayer_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::EfficientNMS_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::EfficientNMS_Explicit_TF_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::EfficientNMS_Implicit_TF_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::FlattenConcat_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::GenerateDetection_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::GridAnchor_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::GridAnchorRect_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::InstanceNormalization_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::LReLU_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::MultilevelCropAndResize_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::MultilevelProposeROI_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::DMHA version 1
Severity.VERBOSE : Registered plugin creator - ::NMS_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::NMSDynamic_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::Normalize_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::PillarScatterPlugin version 1
Severity.VERBOSE : Registered plugin creator - ::PriorBox_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::ProposalLayer_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::Proposal version 1
Severity.VERBOSE : Registered plugin creator - ::ProposalDynamic version 1
Severity.VERBOSE : Registered plugin creator - ::PyramidROIAlign_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::Region_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::Reorg_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::ResizeNearest_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::RPROI_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::ScatterND version 1
Severity.VERBOSE : Registered plugin creator - ::SpecialSlice_TRT version 1
Severity.VERBOSE : Registered plugin creator - ::Split version 1
Severity.VERBOSE : Registered plugin creator - ::VoxelGeneratorPlugin version 1
Severity.INFO : [MemUsageChange] Init CUDA: CPU +197, GPU +0, now: CPU 236, GPU 1416 (MiB)
Chargement du moteur…
Severity.INFO : Loaded engine size: 9 MB
Severity.INFO : [MemUsageSnapshot] deserializeCudaEngine begin: CPU 245 MiB, GPU 1434 MiB
Severity.VERBOSE : Using cublas a tactic source
Severity.INFO : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +246, now: CPU 404, GPU 1681 (MiB)
Severity.VERBOSE : Using cuDNN as a tactic source
Severity.INFO : [MemUsageChange] Init cuDNN: CPU +241, GPU +354, now: CPU 645, GPU 2035 (MiB)
Severity.INFO : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 645, GPU 2035 (MiB)
Severity.VERBOSE : Deserialization required 6288056 microseconds.
Severity.INFO : [MemUsageSnapshot] deserializeCudaEngine end: CPU 645 MiB, GPU 2035 MiB
Severity.INFO : [MemUsageSnapshot] ExecutionContext creation begin: CPU 838 MiB, GPU 2228 MiB
Severity.VERBOSE : Using cublas a tactic source
Severity.INFO : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +157, now: CPU 996, GPU 2385 (MiB)
Severity.VERBOSE : Using cuDNN as a tactic source
Severity.INFO : [MemUsageChange] Init cuDNN: CPU +240, GPU +240, now: CPU 1236, GPU 2625 (MiB)
Severity.VERBOSE : Total per-runner device memory is 7467008
Severity.VERBOSE : Total per-runner host memory is 203456
Severity.VERBOSE : Allocated activation device memory of size 29763584
Severity.INFO : [MemUsageSnapshot] ExecutionContext creation end: CPU 1238 MiB, GPU 2661 MiB
Allocation mémoire…
thread1 = myThread(trt_inference_wrapper.CalculModele)
# Start new Threads
thread1.start()
thread1.join()
trt_inference_wrapper.destory();
print ("Exiting Main Thread")
Starting
Severity.ERROR : 1: [pointWiseV2Helpers.h::launchPwgenKernel::532] Error Code 1: Cuda Driver (invalid resource handle)
Exiting
Exiting Main Thread