Jetson for using a trained NN as superior RPI replacement?

I’m pretty new to programming (coming from a stats background) so please forgive any ignorance on my part. Also i don’t really have a ready access to a community of people working on similar problems where i live.

I have trained a CNN classifier and i’m looking to load the model and the weights onto a standalone device as proof of concept. I need to move the model off my local device and onto a standalone piece of kit - single board computers look like an attractive way to go. My goal is to use the classifier on images captured in real time.

Originally i was looking at the raspberry pi (mainly because ive heard of it before / have some arduino experience) but after browsing around forums etc it would seem that there could be a few problems:

  1. FPS rate
  2. Time to do classification
  3. the definition of the images / low resolution

After a bit more research it would seem like the best option for me to move this forward is the NVIDIA Jetson TX1 in terms of performance and price.

I’m wondering:
Are my assumptions ok?
If i have missed any alternatives (i saw the fathom USB but it doesnt appear to be in production)
Does anyone have any experience using the Jetson with Python/Keras - it looks like there is support for Caffe but i havent seen much for Keras.
I have googled around this and it seems theoretically feasible to take this approach however i have limited experience and am looking for some feedback.

i the imaging devices that are supported on the tx1 are expensive. i suppose this would depend on how deep your pockets are. even a tk1 would offer performance better than the rpi and cheaper than the tx1. i’d suggest looking at the stuff actually supported on tx1 and decide for yourself before you begin buying stuff. good luck.

Here is the NVIDIA DL solution for your reference,

also, here is more info you could get yourself oriented,

A brief comment to your questions,

  1. FPS rate
    Usually we are working to maintain 30fps as that’s a common video stream. This is for your object detection or classification. However, if you also need to track frame to frame, there will be more overhead and that will depend on your processing power in order to maintain 30fps.

  2. Time to do classification
    Again, if that’s 30fps you are talking about, then it will be 33.333 ms for per frame classification.

  3. the definition of the images / low resolution
    A quite standard video frame image size is 1080p which is 1920x1080. However, this is a big image size to process and thus a common input size to NN is 960x540, ie. 1/4 of original size. This should be sufficient to be the input size for further layers of convolution, downsampling such as max pooling to much smaller image size and finally to your output classification.