Runtime trouble moving legacy code from CUDA 6.5 to 8.0

Just to close this off, in the end I did identify a race condition that presumably never caused trouble on older hardware/software somehow. It did require compiling with CUDA 10 and running cuda-memcheck there (although the code causes more problems with CUDA 10 it runs far enough to get racecheck through the relevant errors). I do not know why cuda-memcheck broke with CUDA 8 but not CUDA 10, but there we go. Thanks for the help!