How to use TensorRT in python multiprocessing environment?

I’ve created a process pool using python’s multiprocessing.Pool with an initializer to init all tensorRT stuff. Here is creating a pool:

import multiprocessing as mp
def create_pool(model_files, batch_size, num_process):
    _pool = mp.Pool(num_process, my.init_process, (model_files, ), batch_size)
    return _pool

Here is my init_process:

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit

gTrtContext = None
TRT_LOGGER = trt.Logger()

def init_process(model_files, batch_size):
    **with trt.Builder(TRT_LOGGER) as builder:**
        with builder.create_network() as network:
            with trt.CaffeParser() as parser:
                ......

when runs to with trt.Builder(TRT_LOGGER) as builder, it reports an error:
[TensorRT] ERROR: CUDA initialization failure with error 3. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

I’m pretty sure that my environment is OK, because TensorRT/samples/python/int8_caffe_mnist/sample.py runs perfectly ok.

I’ve tried this at ubuntu 18.04&cuda 10.2 with tensorrt 7 and ubuntu 16.04&cuda10.1 with tensorrt 6, all with the same error.