I am working on an algorithm for research and will publish it if results are good. I developed the initial code on the laptop (it has 950M compute 5.0), however, it has small device memory so CPU version of the code is faster than GPU. I tried running the code on K40c and K40m (compute 3.5). In all three cases, I used same dataset in spite of which the computation is different. Is this common? Is my algorithm behaving differently depending on which GPU device I am using? I don’t have a whole lot of experience programming CUDA but assume it should not be the case.
Please suggest how I can go about identifying and fixing the problem.