Creating a Grayscale model and Augmentation Pipeline

We are currently working on creating a grayscale model utilizing the latest versions of TAO. I have successfully create the model within TAO but the performance is rather poor compared to it’s RGB counterpart. Our data is heavily based on Geometric shaping and shouldn’t be impacted by color information. Because of that I have a few questions on the grayscale processes from the TAO side:

  • We are currently converting our RGB to grayscale offline. However, there is a channel output parameter in the config for the online augmentation pipeline. My question is then, does TAO also perform some type of grayscaling on the image if we make that channel_output parameter 1, and if so what is the grayscale function being used.

  • If it is the case that TAO is performing grayscale, if I turn off online augmentation, will it stop this internal conversion from happening, or will it happen regardless if we set the image_type to GRAY_SCALE and image_order to l

  • Is the reason that PNG type is suggested specifically due to the lossy nature of JPG image types or is their a different reason

The network is a Faster RCNN trained with Resnet 34, training on 4xA100 server.

For training grayscale model, you can prepare the grayscale dataset and then refer to the spec file ( default_spec_resnet18_grayscale.txt) in the faster_rcnn notebook.

As stated above, the issue is not that we cannot train grayscale, it is that we are concerned that while using the online augmentation pipeline with the channel_output parameter set to 1 that TAO is performing another grayscale function on top of the one we are currently performing. We would like to continue to use the online augmentation, but if the above case is true then we will have to switch to offline augmentation as the grayscale method that we use is highly specific to our dataset and applying another, unknown method for merging color channels would harm our performance.

For faster_rcnn, only random horizontal flip is implemented in online augmentation.
There is not spatial and color augmentation. So, it will not harm your current performance.

Thank you. Just for clarification, do you mean that regardless of RGB or Grayscale, Faster RCNN only supports horizontal flipping? Or is this only when using Grayscale?

For RGB or grayscale.

So the only online augmentation available for any Faster RCNN model in TAO is Horizontal Flipping? Is this not documented somewhere?

We will improve the document.