Does it make sense to compile tensorflow for jetson tx2 with MPI support?
I am running distributed inference on two jetsons and observe significant network delay that reduces performance gain to nothing.
There are a couple of optimizations available in tensorflow to speed up network communication but it seems nothing is directly applicable to Jetsons? I’ve tried ‘grpc+verbs’ and ‘grpc+gdr’ so far.
Am I missing something?