Data persistence on jetson NX

I’m using a Jetson NX with JetPack 4.6.1 to perform AI inference (PyTorch + CenterNet) using camera triggers for image capture. I’ve noticed that when the system is idle for a period and then triggered to capture images, the inference time for the first captured image is consistently longer than for subsequent images. I suspect this might be due to the GPU releasing resources actively.

Is there a similar data persistence feature available on the Jetson platform that can be configured to ensure consistent inference times for each captured image? Thank you.

Link to reference: Driver Persistence :: GPU Deployment and Management Documentation

Hi,

Have you fixed the clocks with jetson_clocks?
Since by default Jetson uses dynamic frequency, GPU might start with a lower clock if a certain idle beforehand.

You can try to fix the clock to maximal for a test:

$ sudo jetson_clocks

Thanks.

Hello AastaLLL,

Thanks for your reply. The default setting of power mode is already “Mode 20W 6CORE”.
Is it the maximum of clock?

Thanks

Hello AastaLLL,

After executing the command $ sudo jetson_clocks, the inference time remains fixed at the minimum value for each subsequent inference. However, the first inference still takes longer. Do you have any suggestions for this?

I wrote a program that uses segment and detect AI models to infer an image, and logs the time taken for inference. From the logs, taking the detect AI model as an example:

[TRT] Loaded engine size: 21 MiB
The model has been loaded.

However, the first inference time is much longer than subsequent inference times:

m_xTRTDetectorBase InferenceModel time: 1430.8ms

The inference time for the second and subsequent times is approximately 2x ms:

m_xTRTDetectorBase InferenceModel time: 22.289ms

Do you have any suggestions?

20240410
-----:-----:load segamentation model OCRSegmentModel.engine
-----:-----:…InitModel START…
[TRT] Registered plugin creator - torch2trt::GridAnchor_TRT version 1
[TRT] Registered plugin creator - torch2trt::GridAnchorRect_TRT version 1
[TRT] Registered plugin creator - torch2trt::NMS_TRT version 1
[TRT] Registered plugin creator - torch2trt::Reorg_TRT version 1
[TRT] Registered plugin creator - torch2trt::Region_TRT version 1
[TRT] Registered plugin creator - torch2trt::Clip_TRT version 1
[TRT] Registered plugin creator - torch2trt::LReLU_TRT version 1
[TRT] Registered plugin creator - torch2trt::PriorBox_TRT version 1
[TRT] Registered plugin creator - torch2trt::Normalize_TRT version 1
[TRT] Registered plugin creator - torch2trt::ScatterND version 1
[TRT] Registered plugin creator - torch2trt::RPROI_TRT version 1
[TRT] Registered plugin creator - torch2trt::BatchedNMS_TRT version 1
[TRT] Registered plugin creator - torch2trt::BatchedNMSDynamic_TRT version 1
[TRT] Registered plugin creator - torch2trt::FlattenConcat_TRT version 1
[TRT] Registered plugin creator - torch2trt::CropAndResize version 1
[TRT] Registered plugin creator - torch2trt::DetectionLayer_TRT version 1
[TRT] Registered plugin creator - torch2trt::EfficientNMS_TRT version 1
[TRT] Registered plugin creator - torch2trt::EfficientNMS_ONNX_TRT version 1
[TRT] Registered plugin creator - torch2trt::EfficientNMS_TFTRT_TRT version 1
[TRT] Registered plugin creator - torch2trt::Proposal version 1
[TRT] Registered plugin creator - torch2trt::ProposalLayer_TRT version 1
[TRT] Registered plugin creator - torch2trt::PyramidROIAlign_TRT version 1
[TRT] Registered plugin creator - torch2trt::ResizeNearest_TRT version 1
[TRT] Registered plugin creator - torch2trt::Split version 1
[TRT] Registered plugin creator - torch2trt::SpecialSlice_TRT version 1
[TRT] Registered plugin creator - torch2trt::InstanceNormalization_TRT version 1
[TRT] [MemUsageChange] Init CUDA: CPU +224, GPU +0, now: CPU 285, GPU 2750 (MiB)
[TRT] Loaded engine size: 31 MiB
[TRT] Using cublas as a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +158, GPU +162, now: CPU 445, GPU 2913 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +241, GPU +239, now: CPU 686, GPU 3152 (MiB)
[TRT] Deserialization required 2524450 microseconds.
[TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +29, now: CPU 0, GPU 29 (MiB)
[TRT] Using cublas as a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 686, GPU 3152 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 686, GPU 3152 (MiB)
[TRT] Total per-runner device persistent memory is 17122304
[TRT] Total per-runner host persistent memory is 55952
[TRT] Allocated activation device memory of size 8193024
[TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +24, now: CPU 0, GPU 53 (MiB)
input
mask
-----:-----:…InitModel END…
-----:-----:load detection model OCRDetectModelAuto.engine
-----:-----:…InitModel START…
[TRT] Plugin creator already registered - torch2trt::GridAnchor_TRT version 1
[TRT] Plugin creator already registered - torch2trt::GridAnchorRect_TRT version 1
[TRT] Plugin creator already registered - torch2trt::NMS_TRT version 1
[TRT] Plugin creator already registered - torch2trt::Reorg_TRT version 1
[TRT] Plugin creator already registered - torch2trt::Region_TRT version 1
[TRT] Plugin creator already registered - torch2trt::Clip_TRT version 1
[TRT] Plugin creator already registered - torch2trt::LReLU_TRT version 1
[TRT] Plugin creator already registered - torch2trt::PriorBox_TRT version 1
[TRT] Plugin creator already registered - torch2trt::Normalize_TRT version 1
[TRT] Plugin creator already registered - torch2trt::ScatterND version 1
[TRT] Plugin creator already registered - torch2trt::RPROI_TRT version 1
[TRT] Plugin creator already registered - torch2trt::BatchedNMS_TRT version 1
[TRT] Plugin creator already registered - torch2trt::BatchedNMSDynamic_TRT version 1
[TRT] Plugin creator already registered - torch2trt::FlattenConcat_TRT version 1
[TRT] Plugin creator already registered - torch2trt::CropAndResize version 1
[TRT] Plugin creator already registered - torch2trt::DetectionLayer_TRT version 1
[TRT] Plugin creator already registered - torch2trt::EfficientNMS_TRT version 1
[TRT] Plugin creator already registered - torch2trt::EfficientNMS_ONNX_TRT version 1
[TRT] Plugin creator already registered - torch2trt::EfficientNMS_TFTRT_TRT version 1
[TRT] Plugin creator already registered - torch2trt::Proposal version 1
[TRT] Plugin creator already registered - torch2trt::ProposalLayer_TRT version 1
[TRT] Plugin creator already registered - torch2trt::PyramidROIAlign_TRT version 1
[TRT] Plugin creator already registered - torch2trt::ResizeNearest_TRT version 1
[TRT] Plugin creator already registered - torch2trt::Split version 1
[TRT] Plugin creator already registered - torch2trt::SpecialSlice_TRT version 1
[TRT] Plugin creator already registered - torch2trt::InstanceNormalization_TRT version 1
[TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 679, GPU 3144 (MiB)
[TRT] Loaded engine size: 21 MiB
[TRT] Using cublas as a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 683, GPU 3149 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 684, GPU 3149 (MiB)
[TRT] Deserialization required 162880 microseconds.
[TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +21, now: CPU 0, GPU 74 (MiB)
[TRT] Using cublas as a tactic source
[TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 683, GPU 3149 (MiB)
[TRT] Using cuDNN as a tactic source
[TRT] [MemUsageChange] Init cuDNN: CPU +1, GPU +0, now: CPU 684, GPU 3149 (MiB)
[TRT] Total per-runner device persistent memory is 16474624
[TRT] Total per-runner host persistent memory is 140144
[TRT] Allocated activation device memory of size 175657472
[TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +183, now: CPU 0, GPU 257 (MiB)
center
data
offset
size
-----:-----:…InitModel END…
image show zoom value = 0.418537
setToCenter
m_fImageStartX = -16.1366
m_fImageStartX = -16.1366
--------->segamentation InferenceModel time : 43.186ms

128X416
time:1417.09
>>> m_xTRTDetectorBase InferenceModel time : 1430.8ms

--------->segamentation InferenceModel time : 37.337ms

128X416
time:9.257
>>> m_xTRTDetectorBase InferenceModel time : 22.289ms

--------->segamentation InferenceModel time : 38.425ms

128X416
time:8.876
>>> m_xTRTDetectorBase InferenceModel time : 24.096ms

Hi,

Since some initialization and module loading is required for the first inference, it’s expected to take a long time.
You can add some warmup loops to avoid this.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.