Is there any comprehensive documentation on preprocessing for the NGC models? E.g. the TrafficCamNet model page simply states:
Input
RGB Image 960 X 544 X 3 (W x H x C)
I think this can be rather misleading for several reasons:
For example there is an official post by @Morganh about using tlt-converter
to get engines from .etlt
files from NGC. That post uses CHW format, which contradicts the NGC. You could argue that the NGC docs use column major format, however that is also not stated.
Further the input is not simply a raw RGB image. It needs to be normalized according via the ImageNet pixel statistics. There are several conventions, and buried in the forums @Morganh posted about the correct one. However that post only includes a reference to third party libraries, and no direct info about the intended way.
Due to not being able to find explicit documentation, I am now reverse engineering the jetson-inference GitHub repo by @dusty_nv. He implemented some CUDA kernels to preprocess the input correctly. Those seem to be for an earlier version of DetectNet though, so I am not completely confident.
As you know deep learning models won’t throw any errors if the inference image statistics do not match with the training statistics. They will simply cease to work, or even worse, perform rather bad. Thus I believe it is paramount to have documentation that states the preprocessing steps exactly.
For reference, I am implementing object detection as part of a larger C++ computer vision pipeline via TensorRT, and do not have access to third party libraries.