Issues during GroundingDino inference

• Hardware L40s
• Network Type Grounding DINO
• TLT Version TAO 5.5

During the training of the grounding model, the metrics all appear normal, but during actual inference, different prompt combinations seem to have a significant impact on the results.

What could be the reason for this?

For example, when there is only one prompt as “text,” the bounding box for “text” appears, but if a “dotted text” prompt is added, the bounding box for “text” no longer shows up. ( [“text”] vs [“text”, “dotted text”] ) What might be causing this? During evaluation, the metrics seem to look normal.

Please try some other prompt. For example, “text with dotted” .