Tensorrt inference with pytorch tensor(data_ptr)

jolly.ming2005 · June 9, 2021, 9:43am

Description

Instead of using pycuda, i am using pytorch tensor as input and output data.
if i run the script with multiprocess, several process always initail failed(return -9)

This issue may be about CUDA Context：torch creates context using runtime API, while tensorrt creates context using driver api.

I have tested lots of demo, but all failed.
Why the process does not throw exception, but quit(or maybe killed)?

Environment

TensorRT Version: 7.1
GPU Type: 2080Ti
Nvidia Driver Version: 455.45
CUDA Version: 11.0
CUDNN Version: 8.0.4
Operating System + Version: Ubuntu 18.04.5 LTS
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.7.0+cu110
Baremetal or Container (if container which image + tag): nvcr_io_nvidia_tensorrt_20.09-py3

Relevant Files

import os

import tensorrt as trt

import torch
import multiprocessing as mp


os.environ["CUDA_VISIBLE_DEVICES"] = "4"

class TrtInfer(object):
    def __init__(self, trt_file):
        print(F"trt_file:{trt_file}")

        G_LOGGER = trt.Logger(trt.Logger.ERROR)
        with open(plan_file, "rb") as f, trt.Runtime(G_LOGGER) as runtime:
            self.engine = runtime.deserialize_cuda_engine(f.read())
            print("build engine succeed")
        assert (self.engine)
        self.context = self.engine.create_execution_context()
        print("build context succeed")

        # several process initial failed(exidcode=-9)
        x = torch.cuda.FloatTensor(8)
        print("create torch tensor")
        self.bindings = [None, int(x.data_ptr())]
        print("initial succeed")

    def infer(self, data):
        pass


def init_recognition(thread_id):
    trt_file = r"/data3/deeplearning/models/test.fp16.trt"

    trt_infer = TrtInfer(trt_file)
    print("init_recognition success(thread-{})".format(thread_id))


if __name__ == '__main__':
    mp.set_start_method('spawn')

    handle_process = []
    for i in range(5):
        pro = mp.Process(target=init_recognition, args=(i + 1, ))
        pro.daemon = True
        handle_process.append(pro)

    for p in handle_process:
        p.start()
        # time.sleep(5)

    for p in handle_process:
        p.join()

    for p in handle_process:
        print("exitcode: {}".format(p.exitcode))

    print("FINISH")

resunlt:

Steps To Reproduce

you can use any tensorrt model(trt_file)
run this script, then several process will initial failed(return -9)
if you comment the line(torch.cuda.FloatTensor), the script can run successfully.

NVES · June 9, 2021, 10:07am

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.4 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Thanks!

spolisetty · June 11, 2021, 4:55am

Hi @jolly.ming2005,

This is more like a issue with PyTorch instead of TensorRT.
Could you please try moving the torch.cuda.... command before engine deserialization?
Or manually do a import pycuda.autoinit in init_recognition() before the new process starts doing something.

Thank you.

Topic		Replies	Views
Tensorrt crash when using pytorch simultaneously TensorRT tensorrt , pytorch , python	3	1063	June 1, 2022
TensorRT Python Client Runtime Error TensorRT	6	1538	September 19, 2019
PyTorch CUDA tensors as TRT engine bindings TensorRT tensorrt	4	3311	October 12, 2021
How to use TensorRT in python multiprocessing environment? TensorRT cuda	1	1352	February 17, 2021
Can tensorrt do inference in python thread or subprocess? TensorRT	3	1206	October 12, 2021
How to use TensorRT by the multi-threading package of python Jetson AGX Xavier tensorrt	13	19295	October 18, 2021
Error in cuda when trying to inference via multiprocessing TensorRT	2	1768	November 14, 2021
Torch & TRT Hybrid Python solution - TRT Python migration from pycuda to cuda-python interface TensorRT cudnn	2	395	June 3, 2024
Yolov7 inferencing using multiprocess and tensorrt TensorRT	1	1204	April 28, 2023
Error when trying to upload TensorRT engine in Python subprocess TensorRT	1	1382	June 7, 2020

Tensorrt inference with pytorch tensor(data_ptr)

Description

Environment

Relevant Files

Steps To Reproduce

Related topics