Need help in selection of GPU to accelerate opencv cuda implementation of optical flow


We are doing industrial inspection and in one of the projects we use the OpenCV cuda library. We are interested in speeding up an optical flow algorithm and need help for choosing the right GPU.

Currently we are using the cuda implementation of farneback optical flow of opencv to calculate dense flow between two frames of size ~ 1000x3500. With algorithm parameters set according to our application needs we are getting the following processing times on two platforms

Laptop: i7-7700 HQ 2.8GHz CPU and GTX 1050TI (4GB) mobile GPU
Processing time: 0.24 seconds

Desktop: i7-4790 3.8GHz CPU and GTX 960 (4GB) desktop GPU
Processing time: 0.21 seconds

We want that processing time to be about 0.07 seconds roughly a speed up of 3x on the desktop.

Would a better GPU( like 1080TI, 1070) help us get there?

I have seen comparisons between GPUs but I am not sure how cuda cores/ memory bandwidth translate into performance boost of opencv optical flow calculation.

Any help is much appreciated.