What does infer-dims do?

It will convert and scale first based on network input, here by network width and height 224*224, you can see the FAQ about how nvinfer works.
and for RGB or BGR format input, the channel is 3, for gray input, the channel is 1.