Nanoowl inference takes more time than nanoowl official github shows

Thank you for updating the Jetson Gen AI Lab demos!

I tried to run nanoowl on Jetson AGX Orin 64GB, when I use owl_image_encoder_patch32.engine , it takes about 40~60 ms to inference, but the official github says it will reach 95fps, almost 10 ms for one inference?


nanoowl official github performance:

When I change the model to owlv2-base-patch16-ensemble.engine(image size=900, patch_size=16), the inference time become almost ten times the owlvit takes, inference average time is about 400+ms :

it seems image encoding takes almost time, but the encode part of owlvit and owlv2 seems not changed so much, before using tensorRT to engine model, original transformer owlvit and owlv2 inference is almost same.
I split the nanoowl into several parts, encode image takes most of the time:

the inference time just include predictor.encode_text() and predictor.predict()

the Power mode is MAXN now, what should I do to reach the fps performance as the official github shows?

---------------------------just change the language---------------------

在Orin 64G上运行nanoowl,使用owl_image_encoder_patch32.engine,按照官方github实例的代码运行,每帧耗时约40+ms,但是官方github该模型的运行能够达到95FPS。

将engine模型替换成owlv2,image_size=900, patch_size=16,同样转化为fp16的engine模型,每帧耗时几乎为owlvit-vitB/32的十倍,但是未进行tensorRT加速前,transformer库代码测试出的owlvit和owlv2的模型耗时几乎是相同的。


现在使用的Orin的Power mode已经是MAXN模式了



Have you also maximized the clock to the maximum?


Thank you for your reply!

I used the jtop CTRL to change ‘Jetson Clocks’ from ‘inactive’ to ‘running’, nothing changed.

I’m not sure this ‘Jetson Clocks’ is what you mentioned ‘clock’?


Yes, jetson_clocks is what we indicate.
We need to give it a try internally and update more info wit you later.