Description
- Instead of using pycuda, i am using pytorch tensor as input and output data.
- if i run the script with multiprocess, several process always initail failed(return -9)
This issue may be about CUDA Context:torch creates context using runtime API, while tensorrt creates context using driver api.
I have tested lots of demo, but all failed.
Why the process does not throw exception, but quit(or maybe killed)?
Environment
TensorRT Version: 7.1
GPU Type: 2080Ti
Nvidia Driver Version: 455.45
CUDA Version: 11.0
CUDNN Version: 8.0.4
Operating System + Version: Ubuntu 18.04.5 LTS
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.7.0+cu110
Baremetal or Container (if container which image + tag): nvcr_io_nvidia_tensorrt_20.09-py3
Relevant Files
import os
import tensorrt as trt
import torch
import multiprocessing as mp
os.environ["CUDA_VISIBLE_DEVICES"] = "4"
class TrtInfer(object):
def __init__(self, trt_file):
print(F"trt_file:{trt_file}")
G_LOGGER = trt.Logger(trt.Logger.ERROR)
with open(plan_file, "rb") as f, trt.Runtime(G_LOGGER) as runtime:
self.engine = runtime.deserialize_cuda_engine(f.read())
print("build engine succeed")
assert (self.engine)
self.context = self.engine.create_execution_context()
print("build context succeed")
# several process initial failed(exidcode=-9)
x = torch.cuda.FloatTensor(8)
print("create torch tensor")
self.bindings = [None, int(x.data_ptr())]
print("initial succeed")
def infer(self, data):
pass
def init_recognition(thread_id):
trt_file = r"/data3/deeplearning/models/test.fp16.trt"
trt_infer = TrtInfer(trt_file)
print("init_recognition success(thread-{})".format(thread_id))
if __name__ == '__main__':
mp.set_start_method('spawn')
handle_process = []
for i in range(5):
pro = mp.Process(target=init_recognition, args=(i + 1, ))
pro.daemon = True
handle_process.append(pro)
for p in handle_process:
p.start()
# time.sleep(5)
for p in handle_process:
p.join()
for p in handle_process:
print("exitcode: {}".format(p.exitcode))
print("FINISH")
resunlt:
Steps To Reproduce
- you can use any tensorrt model(trt_file)
- run this script, then several process will initial failed(return -9)
- if you comment the line(torch.cuda.FloatTensor), the script can run successfully.