• Hardware L40s
• Network Type Grounding DINO
• TLT Version TAO 5.5
During the training of the grounding model, the metrics all appear normal, but during actual inference, different prompt combinations seem to have a significant impact on the results.
What could be the reason for this?
For example, when there is only one prompt as “text,” the bounding box for “text” appears, but if a “dotted text” prompt is added, the bounding box for “text” no longer shows up. ( [“text”] vs [“text”, “dotted text”] ) What might be causing this? During evaluation, the metrics seem to look normal.