Image preprocessing in cuda

kevin.delnoye · September 10, 2020, 3:48pm

hi i have a working inference pipeline that uses a tensorrt model that looks as follows:

while(camera)
    img = camera.getimg()
    img_input = preprocess_img(img)
    result = model(img_input)

this gives about 10fps, but if i change the preprocess function to a simple resize i get 20 fps.

this is the preprocess function

 def preprocessimage(image_raw, size=None):
   """ 
    input:
       original image in bgr format using cv2.imwrite
    output:
       image numpy: resized image numpy format
    misc: 
       output numpy array in row major order
   """

    image_resized = (
        cv2.resize(cv2.cvtColor(image_raw, cv2.COLOR_BGR2RGB), (512, 512)).astype(np.float32)
        / 255.0
    )

    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    image = (image_resized - mean) / std # imagenet normalization
    image = np.transpose(image, [2, 0, 1]) # HWC to CHW format:
    image = np.expand_dims(image, axis=0) # CHW to NCHW format
    # Convert the image to row-major order, also known as "C order":
    image = np.array(image, dtype=np.float32, order="C")
    return image

the tensorrt samples only normalize using numpy, I would prefer normalization using pycuda so i can get a nice speedup.
is there any sample code that has efficient normalization code which is able to run inside python?

ps: i am unable to use deepstream since my camera does not only output color images.

AastaLLL · September 11, 2020, 3:36am

Hi,

We do have some example for CUDA pre-processing but doesn’t bind into python or pyCUDA.

github.com

dusty-nv/jetson-inference/blob/master/c/tensorConvert.cu#L180


      
          cudaError_t cudaTensorNormBGR( void* input, imageFormat format, size_t inputWidth, size_t inputHeight,

          						 float* output, size_t outputWidth, size_t outputHeight,

          						 const float2& range, cudaStream_t stream )

          {

          	return launchTensorNorm<true>(input, format, inputWidth, inputHeight, output, outputWidth, outputHeight, range, stream);

          }

          

          

          // gpuTensorNormMean

          template<typename T, bool isBGR>

          __global__ void gpuTensorNormMean( T* input, int iWidth, float* output, int oWidth, int oHeight, int stride, float2 scale, float multiplier, float min_value, const float3 mean, const float3 stdDev )

          {

          	const int x = blockIdx.x * blockDim.x + threadIdx.x;

          	const int y = blockIdx.y * blockDim.y + threadIdx.y;

          

          	if( x >= oWidth || y >= oHeight )

          		return;

          

          	const int m  = y * oWidth + x;

          	const int dx = ((float)x * scale.x);

          	const int dy = ((float)y * scale.y);

If python is essential, maybe you can check OpenCV GPU modules as alternative:
https://docs.opencv.org/4.4.0/d8/d34/group__cudaarithm__elem.html

Thanks.

kevin.delnoye · September 11, 2020, 1:21pm

Thanks for the response! I will try using pycuda with the provided kernels in the future, for now the speed is acceptable.

Topic		Replies	Views
PoseNet PreProcessing? Jetson Nano jetson-inference	3	16	March 17, 2025
How to change the preprocessing of jetson.inference.imagenet Jetson TX2 tensorrt , cuda	4	623	July 13, 2022
Jetson.utils load numpy array into GPU memory Jetson TX2	8	2334	October 18, 2021
Getting image bits to GPU for Inference (DetectNet) Jetson Nano	12	3349	October 15, 2021
Cuda Error when running Tensorrt 3 on complete test set. TensorRT	3	845	May 2, 2018
How to share GPU buffer between VPI, and Jetson Inference Detect in a child thread? Jetson Xavier NX vpi	4	925	April 21, 2023
Preprocess with hardware acceleration Jetson Xavier NX tensorrt	2	331	February 10, 2023
Tensor.cuda() low fps Jetson Xavier NX tensorrt , fps	4	572	June 21, 2023
CUDA is so slow Jetson Nano opencv	5	1318	June 30, 2022
Run inference on a batch of images & parallel inference using cuda on python threads TensorRT tensorrt , cuda	6	2278	January 6, 2022

Image preprocessing in cuda

Related topics