State of affairs with computer vision on Jetson devices

I wanted to start this topic to reach out to the community, and maybe get some feedback from others, and maybe some inspiration. I’ve picked up a Jetson Nano few months ago, and the hardware is very impressive - but the software in my opinion leaves something to be desired.

My simple application would be to detect large movements on the camera picture, and save pictures or videos for some time. General idea is to catch some nice birds flying to our bird feeder! Generally the aim is to implement a self-contained, performant application, using the full capabilities of the Jetson, and leaving room later on for inference, and other interesting tasks.

I’ve been doing this project for like… weeks now? And I’m finding myself in situations, where I feel that the APIs created by NVIDIA are great in themselves, but they just don’t plug together. As sources are not available, and documentation is skim on real-world use-cases, I find myself debugging endlessly. Though I’m doing things the HarderWay™, I would expect that eventually I would find a way.

VisionWorks:

I’ve started with VisionWorks (not knowing yet, that it was deprecated), and actually liked it pretty much.

  • I could pull in images (though not through libargus)
  • I could build a decent, and performant pipeline
  • I could easily display intermittent images
  • Realized that the library is deprecated
  • Realized that there is no easy way to export images to other systems, or use hardware blocks to encode JPEGs

The other guys:

So after I realized that VisionWorks is not going to works, found in the forum, that there are mainly three ways of doing things:

  • Use OpenCV

    • Pro: used it before
    • Con: seemed hard to integrate with EGL, NvBuffers,
    • Con: display performance is not great
    • Con: Jetson have some dedicated hardware unutilized by OpenCV (e.g. VIC)
  • Use VPI

    • Pro: can wrap lots of input stream types
    • Pro: should be very specific to Jetson, very performance oriented
    • Con: altough advertised as the successor of VisionWorks, it has a much smaller focus, and it’s not just simply a port over to it (though I recently found that combined with CUDA NPP, you can technically make large portions of it)
    • Con: haven’t found out-of-the box display mechanism
    • Con: imports all the things, but doesn’t write anything about how to export things
  • Use DeepStream

    • Pro/Con: it’s based in GStreamer, which can be a blessing or a curse
    • Pro: you can build reusable components in it
    • Con: you have to understand the entireity of GStreamer, and handle hundreds of lines of boilerplate to do even the simplest things
    • Personal con: It’s still hard for me to conceive an architecture with DS, especially as I have dynamic elements (e.g. when movement is detected by the filter, it would forward the image?). Most probably doable, just it needs weeks and weeks of studying, and debugging.

In general, my feeling is that every time I start to use an NVIDIA provided solution, is that I’m running into a dead end with the majority of them - libraries are more than happy to accept my data, but there is nothing provided to escape your data from the libraries.
It feels for me that libraries are developed in a very siloed way, and there is no no coordination, and synergy between them. I’m sad about this, as this device for this price point is absolutely great, and it has insane potential.

So, what are your impressions about developing on Jetson?