Setup Information
• Hardware Platform - GPU
• DeepStream Version - 5.0 (using Docker image nvcr.io/nvidia/deepstream:5.0-20.07-triton
)
• TensorRT Version - 7.0.0-1+cuda10.2
• NVIDIA GPU Driver Version (valid for GPU only) - Driver Version: 455.32.00 CUDA Version: 11.1
• Issue Type - Question
Problem/Use case:
I am working on using Pytorch model with Triton Inference Server. However, in the Pytorch code, there is a particular transform implemented on the input image before feeding to the model for classification.
I have referred to the following 2 posts :
But I still couldn’t understand the explanation/math equation mentioned in these 2 posts.
The image transformation in PyTorch are like below:
self.mean = [0.485, 0.456, 0.406]
self.std = [0.229, 0.224, 0.225]
normalize = transforms.Normalize(mean=self.mean, std=self.std)
test_transform = transforms.Compose([
transforms.Resize(cfg.resize),
transforms.ToTensor(),
normalize, ])
I would like to reflect the above transformation (Resize & Normalization) similarly on Deepstream pipeline with Triton Inference Server (nvinferserver). Hence, I filled up the configuration values to as below:
preprocess {
network_format: IMAGE_FORMAT_RGB
tensor_order: TENSOR_ORDER_LINEAR
maintain_aspect_ratio: 1
normalize {
scale_factor: 0.017353
channel_offsets: [123.69, 116.31, 103.52]
}
}
I would assume that the range of input pixel to this network would be [0, 255]. I have two main transformation needing guidance and help.
A. Normalization
For Pytorch, the values of pixel before applying std and mean is [0, 1] (reference). Upon applying the Normalize
transform, I expect a range of [-2.146, 2.279] for channel 0, [-2.018, 2.407] for channel 1 and [-1.796, 2.628] for channel 2.
By using the resultant value from the equation provided by @AastaLLL in this post, the normalized pixel range is totally different than the normalized pixel range in PyTorch. Substituting the said mean and factor [-0.393065, 0.39616] for channel 0.
So I tried another different approach, which is to reverse calculate the net-scale-factor and mean from the following equation with the expected result and the range of input value.
e.g: Channel 0
norm_pix = net-scale-factor * ( x - mean )
eqn 1 : -2.146 = net-scale-factor * (0 - mean)
eqn 2 : 2.279 = net-scale-factor * (255 - mean)
-2.146 = net-scale-factor * (-mean)
2.279 = 255(net-scale-factor) - net-scale-factor * (-mean)
255(net-scale-factor) = 4.425
Solving by linear equation, i would get net-scale-factor = 0.017353. Solving mean
gives me 123.69
B. Image Resize
I am not particularly sure of this aspect. The nearest property in pre-processing block that I could try are below:
frame_scaling_hw: FRAME_SCALING_HW_DEFAULT
frame_scaling_filter: 1
# frame scaling using Bilinear (check NvBufSurfTransform_Inter enum)
Questions
My questions would be :
- Is the assumption of input pixel range [0, 255] to the pre-processing block valid?
- In (A. Normalization section), would solving via the linear equation to get the
net-scale-factor
andmean
approach appropriate to be used? - For layer
transforms.Resize(cfg.resize)
, which part of preprocessing property should I modify to reflect the resize operation?
Any feedback and guidance is very much appreciated in reflecting this entire transform operation.
Thanks!