Issues during GroundingDino inference

• Hardware L40s
• Network Type Grounding DINO
• TLT Version TAO 5.5

During the training of the grounding model, the metrics all appear normal, but during actual inference, different prompt combinations seem to have a significant impact on the results.

What could be the reason for this?

For example, when there is only one prompt as “text,” the bounding box for “text” appears, but if a “dotted text” prompt is added, the bounding box for “text” no longer shows up. ( [“text”] vs [“text”, “dotted text”] ) What might be causing this? During evaluation, the metrics seem to look normal.

Please try some other prompt. For example, “text with dotted” .

Maybe I didn’t explain my question clearly, sorry about that. My question is that during training, everything works fine on the evaluation set, but when using different prompt combinations, it has a significant impact on the results. For example, using [‘1’, ‘2’, ‘3’] yields good results, but using [‘1’, ‘2’] or [‘1’, ‘2’, ‘3’, ‘4’] leads to poorer performance. I have already pre-trained the model, so what could be the main reason for the variation in performance with different prompts? Thanks~

Please try to set a lower conf_threshold in the inference spec file.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.