Opencv code 4x slower on tx1 over tk1

I used the latest version of jetpack for the tx1 and the latest version for tk1. When I run the code on my tk1 I get a steady 20 fps, however on the tx1 it drops to 2-4 fps.

Is there some optimization setting in the tx1 for opencv that I’m not aware of? I am unsure of what to do to narrow down what could be causing this problem. Would appreciate any help.

https://devtalk.nvidia.com/default/topic/901337/cuda-7-0-jetson-tx1-performance-and-benchmarks/

Third post down gives some info.