Nanoowl inference takes more time than nanoowl official github shows

Hello,
Thank you for updating the Jetson Gen AI Lab demos!

I tried to run nanoowl on Jetson AGX Orin 64GB, when I use owl_image_encoder_patch32.engine , it takes about 40~60 ms to inference, but the official github says it will reach 95fps, almost 10 ms for one inference?

image

nanoowl official github performance:

When I change the model to owlv2-base-patch16-ensemble.engine(image size=900, patch_size=16), the inference time become almost ten times the owlvit takes, inference average time is about 400+ms :
image

it seems image encoding takes almost time, but the encode part of owlvit and owlv2 seems not changed so much, before using tensorRT to engine model, original transformer owlvit and owlv2 inference is almost same.
I split the nanoowl into several parts, encode image takes most of the time:
image

the inference time just include predictor.encode_text() and predictor.predict()

the Power mode is MAXN now, what should I do to reach the fps performance as the official github shows?

---------------------------just change the language---------------------

在Orin 64G上运行nanoowl,使用owl_image_encoder_patch32.engine,按照官方github实例的代码运行,每帧耗时约40+ms,但是官方github该模型的运行能够达到95FPS。

将engine模型替换成owlv2,image_size=900, patch_size=16,同样转化为fp16的engine模型,每帧耗时几乎为owlvit-vitB/32的十倍,但是未进行tensorRT加速前,transformer库代码测试出的owlvit和owlv2的模型耗时几乎是相同的。

计算推理耗时仅包括了predictor.encode_text()和predictor.predict()两部分,图像读写都不包括。

现在使用的Orin的Power mode已经是MAXN模式了

请问我要怎样改进才能获得官方95FPS的帧率呢?

Hi

Have you also maximized the clock to the maximum?

Thanks.

Thank you for your reply!

I used the jtop CTRL to change ‘Jetson Clocks’ from ‘inactive’ to ‘running’, nothing changed.

I’m not sure this ‘Jetson Clocks’ is what you mentioned ‘clock’?

Hi,

Yes, jetson_clocks is what we indicate.
We need to give it a try internally and update more info wit you later.

Thanks.

Hi,

Thanks for your patience.

Which sample do you use?
We tested GitHub - NVIDIA-AI-IOT/nanoowl: A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT. repo but there is no info (detect min time…) as you mentioned.

Thanks.

Hi,

Thank you for your reply!

I use the sample from GitHub - NVIDIA-AI-IOT/nanoowl: A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT., the detect info is calculate by using python time module.
The sample I used just like:

import time
predictor=OwlPredictor(“google/owlvit-base-patch32”,
image_encoder_engine="data/owlvit-base-patch32-image-encoder.engine”)
tic = time.time()
output = predictor.predict(image=image, text=[“mouse”], threshold=0.1)
toc = time.time()

I used 70+ 640*480 RGB image to run the inference, and calculate the minimum of toc-tic as detect min time, and the average toc-tic as detect mean time.

As show in this image:

if using average inference time to calculate the fps, the fps should be about 18fps(1/0.055(detect mean time)) for OWL-ViT(ViT-B/32)

the FPS is calculate by 1/(average time used for inference per image), am I wrong?

How do you calculate the FPS for the nanoowl model? Could you just give a official script for us to test the FPS?

Thanks!

Hi,

Could you try to benchmark the model with a loop?
It’s expected that the initial launch of the GPU kernel will take longer.

For example

# warmup
for i in range(10) :
    output = predictor.predict(image=image, text=[“mouse”], threshold=0.1)

tic = time.time()
for i in range(100) :
    output = predictor.predict(image=image, text=[“mouse”], threshold=0.1)
toc = time.time()
# calucate average time over 100 iteration

Thanks.

Hi,

Thank you for your reply!

Actually, the inference time is calculated with a loop, and cast the first inference time result, just like the warmup process.

I test the script with my method, with a RGB image size (320, 240):

And tested with the script you gived, also with the same RGB image size (320, 240):

It seems not much difference between two results?

The Power mode is MAXN, and the Jetson Clocks is running.

Waiting for your reply!

Hi,

Are you able to share the profiling source with us?
So we use the same profiler and no difference involved?

Thanks.

Hi,

from nanoowl.owl_predictor import OwlPredictor
import PIL
import time

predictor = OwlPredictor(
    "google/owlvit-base-patch32",
    image_encoder_engine="data/owl_image_encoder_patch32.engine"
)

image = PIL.Image.open("./000000001503.jpg")

time_list = []

for i in range(10):
    output = predictor.predict(image=image, text=["mouse"], threshold=0.1)
tic = time.time()
for i in range(100):
    #tic = time.time()
    output = predictor.predict(image=image, text=["mouse"], threshold=0.1)
    #toc = time.time()
toc = time.time()

avg_time = (toc-tic)/100
print(f'\nthe average inference time is:{avg_time}')

#time_list.append(toc-tic)
#print(toc -tic)

#print(output)
#print(time_list[-10:])
#time_list = time_list[1:]
'''print(f'detect mean time:{sum(time_list)/len(time_list)}')
print(f'detect min time:{min(time_list)}')
print(f'detect max time:{max(time_list)}')'''

Thank you~

Hi,

Thanks, we need to reproduce this issue internally.
Will update more info with you later.

Looking forward to your results~

Hi,

In your sample, the predict runs for one image per time.
But since the TensorRT engine is built for batch 32, it should be able to run 32 image per inference.

In such case, the fps should increase a lot.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.