Hello AastaLLL,
After executing the command $ sudo jetson_clocks
, the inference time remains fixed at the minimum value for each subsequent inference. However, the first inference still takes longer. Do you have any suggestions for this?
I wrote a program that uses segment and detect AI models to infer an image, and logs the time taken for inference. From the logs, taking the detect AI model as an example:
[TRT] Loaded engine size: 21 MiB
The model has been loaded.
However, the first inference time is much longer than subsequent inference times:
m_xTRTDetectorBase InferenceModel time: 1430.8ms
The inference time for the second and subsequent times is approximately 2x ms:
m_xTRTDetectorBase InferenceModel time: 22.289ms
Do you have any suggestions?
20240410
-----:-----:load segamentation model OCRSegmentModel.engine
-----:-----:…InitModel START…
[TRT] Registered plugin creator - torch2trt::GridAnchor_TRT version 1
[TRT] Registered plugin creator - torch2trt::GridAnchorRect_TRT version 1
[TRT] Registered plugin creator - torch2trt::NMS_TRT version 1
[TRT] Registered plugin creator - torch2trt::Reorg_TRT version 1
[TRT] Registered plugin creator - torch2trt::Region_TRT version 1
[TRT] Registered plugin creator - torch2trt::Clip_TRT version 1
[TRT] Registered plugin creator - torch2trt::LReLU_TRT version 1
[TRT] Registered plugin creator - torch2trt::PriorBox_TRT version 1
[TRT] Registered plugin creator - torch2trt::Normalize_TRT version 1
[TRT] Registered plugin creator - torch2trt::ScatterND version 1
[TRT] Registered plugin creator - torch2trt::RPROI_TRT version 1
[TRT] Registered plugin creator - torch2trt::BatchedNMS_TRT version 1
[TRT] Registered plugin creator - torch2trt::BatchedNMSDynamic_TRT version 1
[TRT] Registered plugin creator - torch2trt::FlattenConcat_TRT version 1
[TRT] Registered plugin creator - torch2trt::CropAndResize version 1
[TRT] Registered plugin creator - torch2trt::DetectionLayer_TRT version 1
[TRT] Registered plugin creator - torch2trt::EfficientNMS_TRT version 1
[TRT] Registered plugin creator - torch2trt::EfficientNMS_ONNX_TRT version 1
[TRT] Registered plugin creator - torch2trt::EfficientNMS_TFTRT_TRT version 1
[TRT] Registered plugin creator - torch2trt::Proposal version 1
[TRT] Registered plugin creator - torch2trt::ProposalLayer_TRT version 1
[TRT] Registered plugin creator - torch2trt::PyramidROIAlign_TRT version 1
[TRT] Registered plugin creator - torch2trt::ResizeNearest_TRT version 1
[TRT] Registered plugin creator - torch2trt::Split version 1
[TRT] Registered plugin creator - torch2trt::SpecialSlice_TRT version 1
[TRT] Registered plugin creator - torch2trt::InstanceNormalization_TRT version 1
[TRT] [MemUsageChange] Init CUDA: CPU +224, GPU +0, now: CPU 285, GPU 2750 (MiB)
[TRT] Loaded engine size: 31 MiB
[TRT] Using cublas as a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +162, now: CPU 445, GPU 2913 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +241, GPU +239, now: CPU 686, GPU 3152 (MiB)
[TRT] Deserialization required 2524450 microseconds.
[TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +29, now: CPU 0, GPU 29 (MiB)
[TRT] Using cublas as a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 686, GPU 3152 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 686, GPU 3152 (MiB)
[TRT] Total per-runner device persistent memory is 17122304
[TRT] Total per-runner host persistent memory is 55952
[TRT] Allocated activation device memory of size 8193024
[TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +24, now: CPU 0, GPU 53 (MiB)
input
mask
-----:-----:…InitModel END…
-----:-----:load detection model OCRDetectModelAuto.engine
-----:-----:…InitModel START…
[TRT] Plugin creator already registered - torch2trt::GridAnchor_TRT version 1
[TRT] Plugin creator already registered - torch2trt::GridAnchorRect_TRT version 1
[TRT] Plugin creator already registered - torch2trt::NMS_TRT version 1
[TRT] Plugin creator already registered - torch2trt::Reorg_TRT version 1
[TRT] Plugin creator already registered - torch2trt::Region_TRT version 1
[TRT] Plugin creator already registered - torch2trt::Clip_TRT version 1
[TRT] Plugin creator already registered - torch2trt::LReLU_TRT version 1
[TRT] Plugin creator already registered - torch2trt::PriorBox_TRT version 1
[TRT] Plugin creator already registered - torch2trt::Normalize_TRT version 1
[TRT] Plugin creator already registered - torch2trt::ScatterND version 1
[TRT] Plugin creator already registered - torch2trt::RPROI_TRT version 1
[TRT] Plugin creator already registered - torch2trt::BatchedNMS_TRT version 1
[TRT] Plugin creator already registered - torch2trt::BatchedNMSDynamic_TRT version 1
[TRT] Plugin creator already registered - torch2trt::FlattenConcat_TRT version 1
[TRT] Plugin creator already registered - torch2trt::CropAndResize version 1
[TRT] Plugin creator already registered - torch2trt::DetectionLayer_TRT version 1
[TRT] Plugin creator already registered - torch2trt::EfficientNMS_TRT version 1
[TRT] Plugin creator already registered - torch2trt::EfficientNMS_ONNX_TRT version 1
[TRT] Plugin creator already registered - torch2trt::EfficientNMS_TFTRT_TRT version 1
[TRT] Plugin creator already registered - torch2trt::Proposal version 1
[TRT] Plugin creator already registered - torch2trt::ProposalLayer_TRT version 1
[TRT] Plugin creator already registered - torch2trt::PyramidROIAlign_TRT version 1
[TRT] Plugin creator already registered - torch2trt::ResizeNearest_TRT version 1
[TRT] Plugin creator already registered - torch2trt::Split version 1
[TRT] Plugin creator already registered - torch2trt::SpecialSlice_TRT version 1
[TRT] Plugin creator already registered - torch2trt::InstanceNormalization_TRT version 1
[TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 679, GPU 3144 (MiB)
[TRT] Loaded engine size: 21 MiB
[TRT] Using cublas as a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 683, GPU 3149 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 684, GPU 3149 (MiB)
[TRT] Deserialization required 162880 microseconds.
[TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +21, now: CPU 0, GPU 74 (MiB)
[TRT] Using cublas as a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 683, GPU 3149 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 684, GPU 3149 (MiB)
[TRT] Total per-runner device persistent memory is 16474624
[TRT] Total per-runner host persistent memory is 140144
[TRT] Allocated activation device memory of size 175657472
[TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +183, now: CPU 0, GPU 257 (MiB)
center
data
offset
size
-----:-----:…InitModel END…
image show zoom value = 0.418537
setToCenter
m_fImageStartX = -16.1366
m_fImageStartX = -16.1366
--------->segamentation InferenceModel time : 43.186ms
128X416
time:1417.09
>>> m_xTRTDetectorBase InferenceModel time : 1430.8ms
--------->segamentation InferenceModel time : 37.337ms
128X416
time:9.257
>>> m_xTRTDetectorBase InferenceModel time : 22.289ms
--------->segamentation InferenceModel time : 38.425ms
128X416
time:8.876
>>> m_xTRTDetectorBase InferenceModel time : 24.096ms