For your initial goal a simple 2D correlation along a representative pixel column and row will work.
CUDA is fast enough to perform a correlation even without using an FFT.
If you also want to track rotation, maybe the optical flow code from the link below might be able to generate an estimated displacement per pixel, which you could try to match up against a combined rotation/displacement matrix.
The translatory displacement would be obtained by averaging over all pixel displacement values in the X and Y directions.
For matching against a rotation matrix I can imagine a search in parameter space, similar to how a (generalized) Hough transform is able to identify lines and other shapes in a 2D image. Your parameters are x,y coordinates for the center of rotation and the rotation angle. So that’s three dimensional parameter space to search in. It is certainly not a simple problem.
Are there any better and simpler methods for estimating the rotation matrix, given two camera images?