I am having a problem when I use texture memory in my MPI program but only when I run with more than 1 rank per gpu. I use 2 ranks per 1 gpu on a single host for simplicity in debugging. I suspect that the problem has to do with using global variable for texture reference and that the cuda driver is getting confused somehow and both ranks end up sharing the same texture which is incorrect.
The results come out different each run which is consistent with a race to initialize the texture.
cuda-memcheck reports no issue even though this should cause some out of bounds for one of the ranks which I guess is because texture’s wrap around?
I did verify that the indices are computed correctly.
I don’t have the issue when using global memory.
Can any one confirm or deny that it is possible for the driver to get confused between global texture reference across mpi ranks on the same host?