Stereo Vision from Scratch

Hi, I am trying to build a stereo vision solution that can run in real time (at least >15 FPS, though >20 is definitely more preferable) on a System-on-a-chip, and the Jetson TX1 seems like my best bet. Regrettably existing stereo vision cameras, for example StereoLabs Zed and e-con Systems Tara, lack in range (near-field and far-field respectively) to such a degree that using them for evaluation is impossible. Furthermore the field-of-view of most systems I have seen is lacking.

Due to this I am investigating the feasibility of building a stereo camera from scratch, and I was mostly wondering if anyone here might have tried this. If so I would very much appreciate some pointers or warnings about various pitfalls.

I have also a few specific questions:
What would be the most reasonable interface for the cameras, GigE, USB3 or Firewire? I would definitely go for two identical cameras, so I would need to get a PCIE-card for an extra port or two in any case (as far as I have gathered). To achieve both wide FoV as well as fair near- and far-field range the cameras would need to have >0.8 Mpixel sensors and I would think that I would need them to produce at least 30 FPS.

What kind of depth map production rate could be expected at such resolutions? The only real depth map rate I have been able to find regarding any Jetson board is this one, citing 7 FPS using a TK1 and WVGA resolution (~0.4 Mpixel) without “performance optimizations” (such as CUDA). This is obviously too small a rate as well as resolution in my case, but by using optimizations and the TX1 instead of TK1 I reckon that >15 FPS might be possible?

Lastly, if there is some stereo camera that I might have missed I am all ears. I have listed most of what I have already looked at below.

Thank you for your time!

(Stereo cameras I know about: BumbleBee2, ZED, Tara, DUO3D, Nerian, Ensenso, Leopard modules and probably a few I can not recall at this moment)

(The only camera I have found to fit my specifications is Point Grey’s Bumblebee2 (, however based on the firewire interface, dated and poor-compatibility software and limited testimonies from people that have made it work (this is basically it:, it basically seems like a nightmare to get up and running. The pricing at 20.000 €, though manageable, is not helping either.)

You should be able to get 15fps @ 1280*720 on TX1 by using an OpenCV optimizations. You can use any wide angle (CS Mount) lens on both See3CAM_10CUG and See3CAM_11CUG.

Thank you for your answer, the framerate sounds promising and I am looking into getting a Jetson TX2 now instead as well, so that should help further.

Based on your camera recommendations I take it as you suggest using USB3 as the interface. I will do some further research into the camera hardware this week and keep your cameras in mind :)


I have been able to run the visionworks SGBM stereo vision sample at 30 FPS with two webcams. I know SGBM isn’t the most accurate, so maybe you could run a neural network for stereo vision processing. I believe a few networks have been able to achieve >30 FPS on a titan X so that might be around 10 FPS on a tx2?

Alright, thank you for your comment. Top accuracy is actually not of large importance (in spite of the relatively large resolution I mentioned). I haven’t really gotten into what exact stereo matching algorithm to use yet, but SGBM seems like a decent compromise so your result with that is reassuring! What resolution did you use for that?