NvMediaImageLock() returns over 35ms

Software Version
DRIVE OS Linux 5.2.6
DRIVE OS Linux 5.2.6 and DriveWorks 4.0
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other

Target Operating System
Linux
QNX
other

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)
other

SDK Manager Version
1.7.0.8846
other (1.1.0-6343)

Host Machine Version
native Ubuntu 18.04
other

Hi all,

We use NvSIPL framework to capture VUYX format Images from our camera, IMX390 (2K, 30FPS).
Basically, our cord implementation is based on “samples/nvmedia/nvsipl/test/camera/*”.

Following the sample,CVnSIPLConsumer.hpp", we call NvMediaImageLock() in OnFrameAvailable().

Without backgroud processes,NvMediaImageLock() returns within 10ms.
But, With backgroud processes (especially image processing, such as distortion correction with GPU) it returns over 35ms inspite of enough cpu resources. So we cannot receive images from SIPL at 30FPS.

We can’t analyze inside of NvMediaImageLock(), so we can’t find what is making it worse.
Could you tell me what NvMediaImageLock does, what might be making it worse?

We set our camera’s SurfFormatAttr with
NVM_SURF_FMT_DEFINE_ATTR(surfFormatAttrs);
NVM_SURF_FMT_SET_ATTR_YUV (surfFormatAttrs, VUYX,NONE,PACKED,UINT,8,PL);

Is there any performans improvement points at above settings?
ex. Are PACKED and PL best choice?

Dear @Tabito.Suzuki,
Is it possible to test on DRIVE OS 5.2.6? Do you consistently notice this issue for each frame or once while?

Could you tell me what NvMediaImageLock does, what might be making it worse?

using SIPL APIs increase CPU load. So I guess if the background processes also need more CPU usage. It could effect the performance of your application. Could you just check the CPU load using top with less interval with/with out background process. Also just check how much cpu load is needed just for back ground process?

Dear @SivaRamaKrishnaNV

Thanks for your reply.

We notice this issue for each frame.Background process‘s loads are 40-50% per core (htop).We use 2 cameras per xavier, so we guess these processes don’t disturb Sipl.

Without backgroud process, Sipl’s load is about 50% per camera.
On the other hand, with background process, Sipl’s load increases to 90% per camera strangely. What happens in sipl?

Using nsight systems, we can see ioctl’s heavy load(up to 90%) in OnFrameAvailable, in contrast to NvMediaImageLock’s light load.(less than 5%).

We can’t change OS version easily, since the change would bring several effects to our project.

Dear @Tabito.Suzuki,
Could you share steps to reproduce this issue. Let me check on my side.

I didn’t see NvMediaImageLock() is called in OnFrameAvailable() in the sample application. Can you elaborate on it?

Did you mean the issue is about a ioctl() call in OnFrameAvailable() instead of NvMediaImageLock() now?
If yes, could you point out which API function calls the ioctl() indirectly?

As @SivaRamaKrishnaNV suggested, if possible, please provide a way to replicate this problem with our sample application.

Dear @SivaRamaKrishnaNV, @VickNV
Thank you for your reply.
I hava preparing for sharing steps to reproduce this issue. Please give me time.

I didn’t see NvMediaImageLock() is called in OnFrameAvailable() in the sample application. Can you >elaborate on it?

I see NvMediaImageLock() at following path.

OnFrameAvailable()⇒(CFileWriter)m_pFileWrite->WriteBufferToFile()⇒NvMediaImageLock()

Dear @SivaRamaKrishnaNV , @VickNV

We evaluated NvMediaImageLock’s latency with ‘pseudo’ image processing processes(30FPS) on the same xavier.

XavierA: 30FPS CapturingProcess(based on CFileWriter includes NvMediaImageLock) & N ‘Pseudo’ Image Processing Processes
XavierB: not used

Without image processing processes, NvMediaImageLock returns in 5ms (sometimes over 20ms)
On the other hand, with image processing processes, NvMediaImageLock often returns over 20ms.

‘Pseudo’ image processing program is as follows.

#include <unistd.h>
#include <cstdint>
#include <cstring>
#include <iostream>
#include <chrono>

typedef struct {
  uint8_t  rawbuf[8][1936*1216*4];
} SIPLDataSoA;
SIPLDataSoA data;
SIPLDataSoA *p = &data;
using namespace std;
using namespace std::chrono;
class Timer {
	system_clock::time_point start_;
	uint64_t interval_ = 0;
 public:
	Timer(uint64_t interval) : interval_(interval) {}
	~Timer() {}
	void Start() {
		start_ = system_clock::now();
	}
	uint64_t CalcSleep() {
		uint64_t duration = duration_cast<microseconds>(system_clock::now() - start_).count();
		return (interval_ - duration > 0) ? interval_ - duration: 0;
	}
};
int main(){
  constexpr uint64_t kMax = 1000000000;
  Timer timer(33000);
  for (uint64_t i = 0; i < kMax; i++) {
		timer.Start();
		uint8_t ii = i % 8;  
    for (uint32_t j = 0; j < 1936 * 1216 * 4; ++j) {
      p->rawbuf[ii][j] = j;
    }
		if (i % 30 == 0) cout << i << " done" << endl;
		usleep(timer.CalcSleep());
  }
};

Is there any way to prevent deterioration of latency ?

Dear @Tabito.Suzuki,
Thank you for sharing your analysis. It is known that NvMediaImageLock() call involves CPU computation/usage.

Are you running nvsipl_camera sample in parallel to above process to reproduce this? Also, how did you time the NvMediaImageLock()(like clock() or gettimeofday()?)

Dear @SivaRamaKrishnaNV

Are you running nvsipl_camera sample in parallel to above process to reproduce this?

We didn’t try to reproduce this with nvsipl_camera instead of our 30FPS CapturingProcess program.

how did you time the NvMediaImageLock()(like clock() or gettimeofday()

We use std::chrono::system_clock().

Dear @Tabito.Suzuki.
Could you check keeping a timer around NvMediaImageLock() call and check running your image processing operation in parallel to confirm if the same be noticed in the sample too.

Dear @SivaRamaKrishnaNV

Could you check keeping a timer around NvMediaImageLock() call and check running your image processing operation in parallel to confirm if the same be noticed in the sample too.

No problem. I will do that.