Image preprocessing in cuda

hi i have a working inference pipeline that uses a tensorrt model that looks as follows:

    img = camera.getimg()
    img_input = preprocess_img(img)
    result = model(img_input)

this gives about 10fps, but if i change the preprocess function to a simple resize i get 20 fps.

this is the preprocess function

 def preprocessimage(image_raw, size=None):
       original image in bgr format using cv2.imwrite
       image numpy: resized image numpy format
       output numpy array in row major order

    image_resized = (
        cv2.resize(cv2.cvtColor(image_raw, cv2.COLOR_BGR2RGB), (512, 512)).astype(np.float32)
        / 255.0

    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    image = (image_resized - mean) / std # imagenet normalization
    image = np.transpose(image, [2, 0, 1]) # HWC to CHW format:
    image = np.expand_dims(image, axis=0) # CHW to NCHW format
    # Convert the image to row-major order, also known as "C order":
    image = np.array(image, dtype=np.float32, order="C")
    return image

the tensorrt samples only normalize using numpy, I would prefer normalization using pycuda so i can get a nice speedup.
is there any sample code that has efficient normalization code which is able to run inside python?

ps: i am unable to use deepstream since my camera does not only output color images.


We do have some example for CUDA pre-processing but doesn’t bind into python or pyCUDA.

If python is essential, maybe you can check OpenCV GPU modules as alternative:


Thanks for the response! I will try using pycuda with the provided kernels in the future, for now the speed is acceptable.