I think its unlikely that anyone can give you a complete answer unless either you provide a complete code example, or someone decides to write a lengthy tutorial for you.
The only “build options” specific to RTX that come to mind are the ones you already mentioned: it’s always good practice to compile for the architecture(s) you intend to run on, e.g. compute_75/sm_75 for a cc7.5 GPU.
Here is a brief tutorial. I would say there are two general categories of issues here.
Those arising from machine calculation methods. For example comparing
double, or with or with FMA contraction, are a couple examples. Such differences can easily give rise to differences in results, whether we are talking about comparing CPU to GPU or some other comparison. In addition, other more complex operations such as sin(), cos(), and other “library” match functions may simply be implemented differently on two different “machines” giving rise to different results.
Those arising from order of operations (algorithm calculation methods). Floating point operations don’t always have all the characteristics of basic math operations that we learned about in grade school/middle school. If you compare a serial algorithm/realization to a parallel algorithm/realization, its often the case that the math doesn’t get done in exactly the same order. This can give rise to differences. A possible item to read here is this floating point whitepaper
Because the two GPUs you mention generally have different sizes (e.g. differing numbers of SMs, for example) a nice ninja-tuned parallel reduction that scopes out the size of the GPU being run on, then uses e.g. a grid-stride loop to size the grid to match the GPU, and then doing a parallel reduction, will likely give at least slightly different results depending on the GPU it is run on.
I doubt that is actually the issue in your case; I don’t know what the issue is in your case. You’re welcome to discuss it further, but I’m unlikely to provide any further response unless you provide a short, complete example that demonstrates the issue.
There are many questions on various forums that fit this general description (CPU/GPU differences), here is one example. Yes, I’m aware that one doesn’t specifically focus on GPU-GPU differences. Here is one that discusses GPU-GPU results differences, in a general way. I’m sure you can find others.