Image preprocessing in cuda

hi i have a working inference pipeline that uses a tensorrt model that looks as follows:

while(camera)
    img = camera.getimg()
    img_input = preprocess_img(img)
    result = model(img_input)

this gives about 10fps, but if i change the preprocess function to a simple resize i get 20 fps.

this is the preprocess function

 def preprocessimage(image_raw, size=None):
   """ 
    input:
       original image in bgr format using cv2.imwrite
    output:
       image numpy: resized image numpy format
    misc: 
       output numpy array in row major order
   """

    image_resized = (
        cv2.resize(cv2.cvtColor(image_raw, cv2.COLOR_BGR2RGB), (512, 512)).astype(np.float32)
        / 255.0
    )

    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    image = (image_resized - mean) / std # imagenet normalization
    image = np.transpose(image, [2, 0, 1]) # HWC to CHW format:
    image = np.expand_dims(image, axis=0) # CHW to NCHW format
    # Convert the image to row-major order, also known as "C order":
    image = np.array(image, dtype=np.float32, order="C")
    return image

the tensorrt samples only normalize using numpy, I would prefer normalization using pycuda so i can get a nice speedup.
is there any sample code that has efficient normalization code which is able to run inside python?

ps: i am unable to use deepstream since my camera does not only output color images.

Hi,

We do have some example for CUDA pre-processing but doesn’t bind into python or pyCUDA.

If python is essential, maybe you can check OpenCV GPU modules as alternative:
https://docs.opencv.org/4.4.0/d8/d34/group__cudaarithm__elem.html

Thanks.

Thanks for the response! I will try using pycuda with the provided kernels in the future, for now the speed is acceptable.