How to use tensorrt with celery?

HI, everyone:
I am a new tensorrt programmer.

Description

If I run tensorrt demo standalone, it’s fine. But when i use tensorrt with celery,pycuda context has something wrong. I know celery subprocess is prefork,not spawn, have some methods to solve this situation?

key code:

celery task.py
@celery.signals.worker_process_init.connect
def worker_process_init(sender, **kwargs): #worker init
import onnx
import onnx_tensorrt.backend as backend
import pycuda.driver as cuda
import pycuda.autoinit
model1 = onnx.load("/data/new/tensorRT/ir152_test_op11.onnx")
engine1 = backend.prepare(model1, device=‘CUDA:0’) # ERROR is here

Trace:

Segmentation fault: 11

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x39008a) [0x7f47213b208a]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x31b3f46) [0x7f47241d5f46]
[bt] (2) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f477523d4b0]
[bt] (3) /usr/local/lib/python2.7/dist-packages/torch/lib/libcaffe2.so(std::_Hashtable<std::string, std::pair<std::string const, std::pair<std::unordered_set<std::string const*, std::hash<std::string const*>, std::equal_to<std::string const*>, std::allocator<std::string const*> >, std::string> >, std::allocator<std::pair<std::string const, std::pair<std::unordered_set<std::string const*, std::hash<std::string const*>, std::equal_to<std::string const*>, std::allocator<std::string const*> >, std::string> > >, std::__detail::_Select1st, std::equal_tostd::string, std::hashstd::string, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::clear()+0x79) [0x7f46877112a9]
[bt] (4) /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so(+0x458a7) [0x7f465d4ac8a7]
[bt] (5) /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so(+0x1407a6) [0x7f465d5a77a6]
[bt] (6) /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so(+0xf8837) [0x7f465d55f837]
[bt] (7) /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so(+0xf39ce) [0x7f465d55a9ce]
[bt] (8) /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so(+0xfb6c5) [0x7f465d5626c5]
[bt] (9) /usr/local/lib/python2.7/dist-packages/onnx/onnx_cpp2py_export.so(+0x167e2d) [0x7f465d5cee2d]

PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

[2020-09-18 12:39:15,811: ERROR/MainProcess] Process ‘ForkPoolWorker-10’ pid:75998 exited with ‘signal 6 (SIGABRT)’

Environment

TensorRT Version:7.0.0.11
GPU Type: Tesla P4
Nvidia Driver Version: 384.130
CUDA Version: 9.0
CUDNN Version: 7.4.3
Operating System + Version:ubuntu16.04
Python Version (if applicable): 2.7.12
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @ppdoll,
Please refer to the below link to get started with TensorRT Python API.

Thanks!