cuDNN deep learning with a cluster of Nvidia Jetson TK1/TX1 boards?

I was able to install cDNN with Caffe framework on a single Jetson TX1 and a single Jetson TK1 board. However, how do I expend this to a cluster? I have two Jetson TX1 boards (configured as mpi cluser), plus an old cluster consisting of 6 Jetson TK1 boards.

How do I use computational power of several dev. boards in the cluster for deep learning? Have anyone done that with cuDNN and Caffe?

