Performance difference of tensorRT versus nvcaffe+cuDNN

The jetson-inference repository on GitHub recommends doing net surgery to translate a trained caffe network to tensorRT for doing real-time inference on the Jetson TX2:

The same repository also includes instructions for building nvcaffe with 16-bit cuDNN support:

I’m not using one of the three pre-defined models in tensorRT; I’m using a custom network topology that I’ve already trained.
Would there be a performance benefit from translating this caffe model to tensorRT to run inference on the Jetson, or would I see approximately the same performance using nvcaffe?

(My network is fully convolutional, and uses the deconvolution layer as part of inference, which may not even be supported in tensorRT yet?)

So I compiled nvcaffe and tried to load my model. Apparently, although Caffe has several years of maturity, nvcaffe doesn’t …

Error parsing text-format caffe.NetParameter: 264:14: Message type "caffe.LayerParameter" has no field named "crop_param".

Any updates on the performance comparison between caffe and tensorrt ?