Tensorrt inference with pytorch tensor(data_ptr)

Description

  1. Instead of using pycuda, i am using pytorch tensor as input and output data.
  2. if i run the script with multiprocess, several process always initail failed(return -9)

This issue may be about CUDA Context:torch creates context using runtime API, while tensorrt creates context using driver api.

I have tested lots of demo, but all failed.
Why the process does not throw exception, but quit(or maybe killed)?

Environment

TensorRT Version: 7.1
GPU Type: 2080Ti
Nvidia Driver Version: 455.45
CUDA Version: 11.0
CUDNN Version: 8.0.4
Operating System + Version: Ubuntu 18.04.5 LTS
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.7.0+cu110
Baremetal or Container (if container which image + tag): nvcr_io_nvidia_tensorrt_20.09-py3

Relevant Files

import os

import tensorrt as trt

import torch
import multiprocessing as mp


os.environ["CUDA_VISIBLE_DEVICES"] = "4"

class TrtInfer(object):
    def __init__(self, trt_file):
        print(F"trt_file:{trt_file}")

        G_LOGGER = trt.Logger(trt.Logger.ERROR)
        with open(plan_file, "rb") as f, trt.Runtime(G_LOGGER) as runtime:
            self.engine = runtime.deserialize_cuda_engine(f.read())
            print("build engine succeed")
        assert (self.engine)
        self.context = self.engine.create_execution_context()
        print("build context succeed")

        # several process initial failed(exidcode=-9)
        x = torch.cuda.FloatTensor(8)
        print("create torch tensor")
        self.bindings = [None, int(x.data_ptr())]
        print("initial succeed")

    def infer(self, data):
        pass


def init_recognition(thread_id):
    trt_file = r"/data3/deeplearning/models/test.fp16.trt"

    trt_infer = TrtInfer(trt_file)
    print("init_recognition success(thread-{})".format(thread_id))


if __name__ == '__main__':
    mp.set_start_method('spawn')

    handle_process = []
    for i in range(5):
        pro = mp.Process(target=init_recognition, args=(i + 1, ))
        pro.daemon = True
        handle_process.append(pro)

    for p in handle_process:
        p.start()
        # time.sleep(5)

    for p in handle_process:
        p.join()

    for p in handle_process:
        print("exitcode: {}".format(p.exitcode))

    print("FINISH")

resunlt:

Steps To Reproduce

  1. you can use any tensorrt model(trt_file)
  2. run this script, then several process will initial failed(return -9)
  3. if you comment the line(torch.cuda.FloatTensor), the script can run successfully.

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Thanks!

Hi @jolly.ming2005,

This is more like a issue with PyTorch instead of TensorRT.
Could you please try moving the torch.cuda.... command before engine deserialization?
Or manually do a import pycuda.autoinit in init_recognition() before the new process starts doing something.

Thank you.