As mentioned in the developer guide: (An engine can have multiple execution contexts, allowing one set of weights to be used for multiple overlapping inference tasks)
Also I read they can be used in parallel. does that mean invoking 2 execution contexts in parallel? like for example perform two detection on two different images in parallel using the same model? if yes how to do that because I think normal threading in python does not work with tensorrt like I’ve tried to put the inference process in a thread and it failed till I have done the following:
class MyEngine(object):
# Initialize the Engines
def __init__(self, engine_paths, classes_paths, source, imgsz, img_label, vid_size, ...):
self.cfx = cuda.Device(0).make_context()
...
TRT_LOGGER is Used for Logging
logger = trt.Logger(trt.Logger.WARNING)
logger.min_severity = trt.Logger.Severity.ERROR
# runtime is an Instance of the TensorRT runtime
runtime = trt.Runtime(logger)
trt.init_libnvinfer_plugins(logger,'') # initialize TensorRT plugins
self.stream = cuda.Stream()
def inference(self):
...
class myThread(threading.Thread):
def __init__(self, func, args):
threading.Thread.__init__(self)
self.func = func
self.args = args
def run(self):
self.func(*self.args)
class Predictor(MyEngine):
def __init__(self, engine_paths, classes_paths, source, imgsz, img_label, vid_size, ...):
super(Predictor, self).__init__(engine_paths, classes_paths, source, imgsz, img_label, vid_size, ...)
pred = Predictor(engine_paths=engines_paths, classes_paths=classes_paths, ...)
mainThread = myThread(pred.inference, [])
mainThread.daemon = True
mainThread.start()
I have done that after several trial and error and searching in many documents and codes, I did not understand why it did work, but it did work ! I’m still learning so can any one please explain to me why it worked and thank you.