Jetson AGX Xavier shows unstable inference time

I am using Jetson Xavier (32GB) to run inference on a CRNN model.The model can be found here

I am using fp32 mode for inference.The results are accurate when compared to pytorch inference but the inference time varies drastically to process the same batch multiple times.

Setup -

Folder demo2 consists of 500 images on which inference will be run in batches of 32.crnn.trt is the tensorrt file generated from onnx model(crnn.onnx) that was exported from above mentioned repository from pytorch.

Code -

        import sys

        from torch.utils.data import DataLoader

        import time
         //standard inference module shipped with jetson
        from inference import allocate_buffers,get_engine,do_inference

        import pickle

        import cv2

        import numpy as np

        import dataset_trt

        from PIL import Image

        from dataset_trt import LoadImages,Pad

        data = {}

        `  //Loads images after resizing in batches of 32 without shuffle 
detectloader = DataLoader(LoadImages(transform=Pad(100, 32, 'whole'), image_files_dir='ocr_recognition/data/demo2/'),`

                                  batch_size=32,shuffle=False)

        engine = get_engine('ocr_recognition/crnn.onnx', 'ocr_recognition/crnn.trt')
        //inference is run 3 times over 500 images(i.e. 3 epochs)
        for i in range(3):

            print(i)

            for filename,image in detectloader:

                image = image.numpy()

                with engine.create_execution_context() as context:

                    inputs,outputs,bindings,stream = allocate_buffers(engine)

                    inputs[0].host = image

                    t1 = time.time()

                    output = do_inference(context,bindings,inputs,outputs,stream)

                    print(time.time() - t1)

Output of above print statement(time in sec)

0
0.903141975402832
0.11624932289123535
0.1136171817779541
0.09180784225463867
0.09296345710754395
0.09299278259277344
0.08098387718200684
0.05849766731262207
0.0542445182800293
0.04758453369140625
0.047617435455322266
0.04759478569030762
0.04703259468078613
0.04756903648376465
0.047551631927490234
0.04757833480834961
0.04760384559631348
0.04701590538024902
0.04694771766662598
1
0.047575950622558594
0.04764819145202637
0.04700660705566406
0.047077178955078125
0.04758882522583008
0.04157447814941406
0.041478633880615234
0.04138970375061035
0.04059720039367676
0.04153728485107422
0.04144930839538574
0.04150533676147461
0.041487932205200195
0.04096221923828125
0.04104185104370117
0.041502952575683594
0.040570974349975586
0.041085004806518555
0.04155445098876953
2
0.04109358787536621
0.04118943214416504
0.04118227958679199
0.03726077079772949
0.03625345230102539
0.035823822021484375
0.03543353080749512
0.036386966705322266
0.03538227081298828
0.03543853759765625
0.03527665138244629
0.03669238090515137
0.03589630126953125
0.036310672760009766
0.03545975685119629
0.03621697425842285
0.03635263442993164
0.03638172149658203
0.03630399703979492

As you can see times in the first epoch are drastically higher.Same batches across epochs take different times.Moreover these times are largely depending upon runs.Getting different times for different runs(ranging between 150ms to 35ms) for same batch.Isn’t this pretty odd?Am I screwing up somewhere?

Did you set nvpmodel to MAXN and activated jetson-clocks before running inference?

$ sudo nvpmodel -m 0
$ sudo ${HOME}/jetson_clocks.sh

Hi,

As dkreutz mentioned, please set the device into performance mode and check it again.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

Running jetson_clocks did it.Thanks.Is there any way of setting clock rates programatically using python(except running a sh file)?Do you expose any API to set it to max?

Hi,

Please find this sample for information:

Thanks.

Thanks!