**Setup Information**

• Hardware Platform - GPU

• DeepStream Version - 5.0 (using Docker image `nvcr.io/nvidia/deepstream:5.0-20.07-triton`

)

• TensorRT Version - 7.0.0-1+cuda10.2

• NVIDIA GPU Driver Version (valid for GPU only) - Driver Version: 455.32.00 CUDA Version: 11.1

• Issue Type - Question

**Problem/Use case:**

I am working on using Pytorch model with Triton Inference Server. However, in the Pytorch code, there is a particular transform implemented on the input image before feeding to the model for classification.

I have referred to the following 2 posts :

But I still couldn’t understand the explanation/math equation mentioned in these 2 posts.

The image transformation in PyTorch are like below:

```
self.mean = [0.485, 0.456, 0.406]
self.std = [0.229, 0.224, 0.225]
normalize = transforms.Normalize(mean=self.mean, std=self.std)
test_transform = transforms.Compose([
transforms.Resize(cfg.resize),
transforms.ToTensor(),
normalize, ])
```

I would like to reflect the above transformation (Resize & Normalization) similarly on Deepstream pipeline with Triton Inference Server (nvinferserver). Hence, I filled up the configuration values to as below:

```
preprocess {
network_format: IMAGE_FORMAT_RGB
tensor_order: TENSOR_ORDER_LINEAR
maintain_aspect_ratio: 1
normalize {
scale_factor: 0.017353
channel_offsets: [123.69, 116.31, 103.52]
}
}
```

I would assume that the range of input pixel to this network would be [0, 255]. I have two main transformation needing guidance and help.

**A. Normalization**

For Pytorch, the values of pixel before applying std and mean is [0, 1] (reference). Upon applying the `Normalize`

transform, I expect a range of [-2.146, 2.279] for channel 0, [-2.018, 2.407] for channel 1 and [-1.796, 2.628] for channel 2.

By using the resultant value from the equation provided by @AastaLLL in this post, the normalized pixel range is totally different than the normalized pixel range in PyTorch. Substituting the said mean and factor [-0.393065, 0.39616] for channel 0.

So I tried another different approach, which is to reverse calculate the net-scale-factor and mean from the following equation with the expected result and the range of input value.

e.g: Channel 0

```
norm_pix = net-scale-factor * ( x - mean )
eqn 1 : -2.146 = net-scale-factor * (0 - mean)
eqn 2 : 2.279 = net-scale-factor * (255 - mean)
-2.146 = net-scale-factor * (-mean)
2.279 = 255(net-scale-factor) - net-scale-factor * (-mean)
255(net-scale-factor) = 4.425
```

Solving by linear equation, i would get net-scale-factor = 0.017353. Solving `mean`

gives me 123.69

**B. Image Resize**

I am not particularly sure of this aspect. The nearest property in pre-processing block that I could try are below:

```
frame_scaling_hw: FRAME_SCALING_HW_DEFAULT
frame_scaling_filter: 1
# frame scaling using Bilinear (check NvBufSurfTransform_Inter enum)
```

**Questions**

My questions would be :

- Is the assumption of input pixel range [0, 255] to the pre-processing block valid?
- In (
*A. Normalization section*), would solving via the linear equation to get the`net-scale-factor`

and`mean`

approach appropriate to be used? - For layer
`transforms.Resize(cfg.resize)`

, which part of preprocessing property should I modify to reflect the resize operation?

Any feedback and guidance is very much appreciated in reflecting this entire transform operation.

Thanks!