I’m trying to come up with a good way of doing motion detection in video images. I’m succesful in capturing the optical flow meta from every buffer, but I’m having a hard time making sense of the values.
What I understand from the (very sparse?) docs is that every block of 4x4 (by default) pixels generates an x, y vector pair that’s telling something about the movement of THAT block of 4x4 pixels, right? So I would say that if the vector says (-15, 0) that this block has moved 15 pixels to the left. Or if it says (200, 10) that it has moved 200 pixels to the right and 10 pixels down. Assuming x0, y0 is top left corner.
But in reality I encounter (6400., 128.)? The video has a resolution of 1920x1080 so how would a pixel(block) move by 6000 pixels? Or do these values mean something else entirely?
With all the widespread use of optical flow technology, there must be some sort of documentation about this, right? Somewhere?
I’m amazed I can’t find anything. All documentation is about the technical side of things, which is fine. But what does the data MEAN exactly? What does a vector of (-6400, 128) actually say?
Hi @willemvdkletersteeg, I already reached out to the people who might be able to help here and point you to some resources, but people at NVIDIA tend to be busy.
One thing to help me would be if you could point me to the, as you say “sparse” docs that got you started on this, where the basic description of created vector pairs can be found. There is quite a selection of Video and OF docs at NVIDIA, knowing where you are at might allow me already to point you to additional resources. Or find the people who wrote the doc and ask at the source.
These were helpful in a technical sense. This made it possible to get/obtain opticalflow metadata in our application.
Then I started reading this article:
Which is more about the performance side of things. Which was informative and a good read, but still has no explanation of the factual meaning of the metadata.
Also:
Did not give me any definitive answer. Or I’m missing something, of course I’m not perfect and I have only quickly scanned some documents and haven’t read everything.
So in essence: I understand optical flow conceptually, but I need some sort of reference to the actual numbers in the metadata. What is the scale of these numbers? What to expect in what situations? Is it concrete pixel/pixelgroup movement or is it some form of relative number?
I think it’s a fairly easy question. Basic information, it seems like. So that’s why I’m pretty frustrated I can’t seem to find an answer on this.
At least i will not link you to any of the above anymore, which I initially intended :-)
I will try to find the right people and point this lack in our documentation out to them.
But please be a little patient still, I can’t promise how fast I get feedback.
Quick update. Did you download the OF SDK and check out the Programming guide?
Flow vector is represented by a 32-bit value with each horizontal and vertical component
being a 16-bit value. The lowest 5 bits holding fractional value, followed by a 10-bit integer value
and the most significant bit being a sign bit.
As it happens, following that description, 6400>>5 == 200
I know, the explanation might be a bit clearer, but this would explain the confusion.
And in the SDK package you will also find a sample object tracker, NvOFTracker. I would think that code should also shed some light on the exact flow vector properties.