How to shorten the image process time

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.6
[.] DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
[.] Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
[.] other

SDK Manager Version
1.9.2.10884
[.] other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
[.] other

I am using the NvMedia APIs for image processing. For example, I first use the Nvmedia2D APIs to convert the YUV422 image to YUV420, then use the NvmediaLDC APIs to do some distortion correction, and finally use the NvmediaJPEGEncode APIs to compress the image into JPEG format.

Due to the sample code , I need to do NvSciSyncWait between the operations above(Nvmedia2D/NvmediaLDC/NvmediaIJPE).However, using NvSciSync will significantly increase the processing time (reaching 60-70ms in the case of multiple cameras running in parallel), which is unacceptable.
perf stats: 10ms for one single pipeline,If 7 cameras are used in parallel, then each camera will take about 50ms to process.

When using NPPI library, I usually can use the same CUDA context for multiple image processing operations and only call cudaStreamSynchronize once at the end. However, when using NvMedia, I found that it seems necessary to call NvSciSyncWait between each step, which makes the overall image processing time longer. Is there a way to avoid this and reduce the time consumed by synchronization?

code sample.txt (6.9 KB)
here is the sample code.

Dear @zhixin.zhou,
As your pipeline make use of different engines, AFAIK, NvSciSync wait call in each stage makes sure we wait till the operation is finished by engine. I will check internally and let you know if any call can be avoided.

processing time (reaching 60-70ms in the case of multiple cameras running in parallel), which is unacceptable.

May I know what is the accepted range for your use case? Also, do you have any perf statistics for each operation in pipeline?

Dear @SivaRamaKrishnaNV
Thank you for the reply.
Here is the perf statistic. Achieving a processing time of below 30ms while simultaneously handling around 7-8 camera feeds is acceptable for me.

Hello,any update?

Dear @zhixin.zhou,
NvSciSyncFenceWait() is needed only after IJPE, not required after each stage.

We need to ensure that dependencies between the stages as established correctly via fences. The EOF fence from Stage X needs to be fed to Stage X + 1 as a pre-fence. Also need to ensure that Stage X waits for Stage X + 1 to consume its output before re-using the buffer for another task (using EOF fences from Stage X + 1).

1 Like

OK,thanks. I will try.

So it means the NvSciSyncFence can be shared by every engine?
For example, for this pipeline: Nvmedia2D->NvmediaLdc->NvmediaIJPE, I can get the EOF fence of Nvmedia2D and feed it to NvmediaLdc, then get the EOF fence of NvmediaLdc and feed it to NvmediaIJPE. All last, wait the EOF fence of NvmediaIJPE.
Is my understanding correct?

Dear @zhixin.zhou,
Yes.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.