I already looked in the docs and deepstream-app code several times. What I found from there is that we pass the network and it’s configuration to nvinfer plugin. The plugin evaluate the net, parse the results and fill Metadata structure which is the ouput from the plugin.
I am interested in ‘how nvinfer evaluates the ResNet10 model?’ and ‘how it parse the output?’. With this knowledge I will be able to evaluate the model outside of deepstream-app and I can compare results to ours.
From the resnet10.prototxt file we can see that network accept input with shape (batchsize=1, channels=3, height=368, width=640) and the net has 2 output layers. First one is with shape (batchsize=1,4,23,40) and the second one - (batchsize=1,16,23,40). First one probably contains some probabilities/confidences, the second one - bboxes. I don’t know how to parse this 2 output matrices and can’t find any information related to that.
I am new in this area and maybe I am missing something