Lowering run time of PyrLK optical flow in VPI

while running the PyrLK optical flow algorithm of VPI, i found something weird.
the algorithm takes less than 1 milisecond to run, closer even to 0.5 milisecond, but accessing the data using cpu.lock can take 10x times, sometimes even up to 6 miliseconds. i would like some help accessing the data faster if its possible
VPI version : 2.3.9
language : python
code samples:

while True:
1.   ret,cvFrame = inVideo.read()
2.   frame = vpi.asimage(...) #convert frame to grayscale using backend CUDA
3.   curFeatures , status = optflow(frame)
4.   with curFeatrues.rlock_cpu():
          feature_data = tuple(curFeatures.cpu()[0])

run time :

  1. 3 milisec
  2. 0.2 milisec(<200 microsec)
  3. 0.5 millisec(<500 microsec)
  4. 2.5-4 millisec
    i found no other way to access the data inside the feature array and using lock takes too long, anyone knows a better way of doing it?

Hi,

Have you maximized the device’s performance first?
You can find a script below:

https://docs.nvidia.com/vpi/2.3/algo_performance.html#maxout_clocks

Thanks.

interesting,
it lowered that specific run time to about the same as the OF time. so both 3 and 4 are about 0.5-0.7 milliseconds. thank you

Hi,

Just want to double-confirm.
Do you get the expected perf after running the clocks.sh script?

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.