Why mobilenetv2 inference time takes too much time?

yjkim2 · October 26, 2020, 9:13am

Hi,

I tried to measure the mobilenet_v2 inference time on xavier. (Batch size = 1)

The result was

Experiements: cpu
START: mobilenet
3.234218406677246sec
Experiements: cuda
START: mobilenet
0.27671399116516116sec

When using only CPU for inference, it tooks 3.23 seconds, and, I think, it is too much time than expected.

In this paper, https://arxiv.org/pdf/2005.05085.pdf Fig.6, MobilenetV2 tooks less than 1 second on different other devices.

Then, why xavier-pytorch-cpu needs much more time on inferencing mobilenet_v2 than above devices/environments?

(p.s. I typed ‘sudo jetson_clocks’ before measure inference time.)

Any advice will be very appreciated.

Thanks in advance.

YJ.

Following code is what I used for measuring inference time on xaiver.

import sys
import time
import subprocess
import torch
import torchvision

def download_models():
    models = {}
    models["mobilenet"] = torchvision.models.mobilenet_v2(pretrained=True)
    return models

if __name__ == "__main__":
    models = download_models()

    for device in ['cpu', 'cuda']:
        print("Experiements:", device)

        for name, model in models.items():
            
            print("START:", name)
            model = model.to(device)
            model.eval()
            n = 20
            total_time = 0

            for i in range(n):
                img = torch.rand(1, 3, 224, 224)

                if device == 'cuda':
                    img = img.cuda()
                start = time.time()
                output = model(img)
                end = time.time()
                total_time += (end - start)

            print(total_time/n)

dusty_nv · October 26, 2020, 6:25pm

Hi @yjkim2, can you run sudo jetson_clocks beforehand?

Also, ignore the first couple iterations of the benchmark, as PyTorch may take longer to startup and load all the code the first time a model is executed. So it is recommended to do a warm-up period first that is un-timed. Also you probably want to be running more iterations than just 20, say a few hundred.

I’m not sure how “PyTorch Mobile” from the above chart varies from normal PyTorch, or to what extent ARM CPU optimizations are enabled in the PyTorch build. The primary use-case of PyTorch on Jetson is using CUDA/cuDNN.

yjkim2 · October 27, 2020, 1:58am

Hi @dusty_nv, Thank you for your reply.

Actually, I ran sudo jetson_clocks before running this code.

I have tried printing inference time on every iteration, but it still shows 2.4sec~4.4sec.

Also, I tried running more iteration: 100, the average inference time still takes about 3 sec.

I also measured googlenet inference time, which is much lower than 1sec.

I still wonder why mobilenetv2 takes too much time on inference.

Thanks.

dusty_nv · October 27, 2020, 4:33pm

From looking at the PyTorch Mobile website, it looks like PyTorch Mobile is using XNNPACK/QNNPACK and may be using torch.jit or libtorch (C++) for more optimized inference of the model. XNNPACK/QNNPACK isn’t enabled on the PyTorch wheels that I provide because of build errors when those are enabled. Our PyTorch wheels enable the CUDA/cuDNN performance.