CuPy error when pushing / popping pycuda context

I am using Python and tensorRT to perform inference with CUDA. I’d like to use CuPy to preprocess some images that I’ll feed to the tensorRT engine. The preprocessing function, called my_function, works fine as long as tensorRT is not run between different calls of the my_function method (see code below). Specifically, the issue is not strictly related by tensorRT but by the fact that tensorRT inference requires to be wrapped by push and pop operations of the pycuda context.

With respect to the following code, the last execution of my_function will raise the following error:

  File "/home/ubuntu/myfile.py", line 188, in _pre_process_cuda
    img = ndimage.zoom(img, scaling_factor)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/cupyx/scipy/ndimage/interpolation.py", line 482, in zoom
    kern(input, zoom, output)
  File "cupy/core/_kernel.pyx", line 822, in cupy.core._kernel.ElementwiseKernel.__call__
  File "cupy/cuda/function.pyx", line 196, in cupy.cuda.function.Function.linear_launch
  File "cupy/cuda/function.pyx", line 164, in cupy.cuda.function._launch
  File "cupy_backends/cuda/api/driver.pyx", line 299, in cupy_backends.cuda.api.driver.launchKernel
  File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_HANDLE: invalid resource handle

Note: in the following code I haven’t reported the entire tensorRT inference code. In fact, simply pushing and popping a pycuda context generates the error

Code:

import numpy as np
import cv2
import time
from PIL import Image
import requests
from io import BytesIO
from matplotlib import pyplot as plt
import cupy as cp
from cupyx.scipy import ndimage
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit


def my_function(numpy_frame):
    dtype = 'float32'
    img = cp.array(numpy_frame, dtype='float32')
    # print(img)
    img = ndimage.zoom(img, (0.5, 0.5, 3))
    img = (cp.array(2, dtype=dtype) / cp.array(255, dtype=dtype)) * img - cp.array(1, dtype=dtype)
    img = img.transpose((2, 0, 1))
    img = img.ravel()
    return img


# load image
url = "https://www.pexels.com/photo/109919/download/?search_query=&tracking_id=411xe21veam"
response = requests.get(url)
img = Image.open(BytesIO(response.content))
img = np.array(img)

# initialize tensorrt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)
cfx = cuda.Device(0).make_context()


my_function(img)  # ok
my_function(img)  # ok

# ----- TENSORRT ---------
cfx.push()
# .... tensorrt inference....
cfx.pop()
# ----- TENSORRT ---------

my_function(img)  # <---- error

I even tried to do it other ways, but unfortunately with the same result:

cfx.push()
my_function(img)  # ok
cfx.pop()

cfx.push()
my_function(img)  # error
cfx.pop()