using texture memory and mpi

I am having a problem when I use texture memory in my MPI program but only when I run with more than 1 rank per gpu. I use 2 ranks per 1 gpu on a single host for simplicity in debugging. I suspect that the problem has to do with using global variable for texture reference and that the cuda driver is getting confused somehow and both ranks end up sharing the same texture which is incorrect.

The results come out different each run which is consistent with a race to initialize the texture.
cuda-memcheck reports no issue even though this should cause some out of bounds for one of the ranks which I guess is because texture’s wrap around?
I did verify that the indices are computed correctly.
I don’t have the issue when using global memory.

Can any one confirm or deny that it is possible for the driver to get confused between global texture reference across mpi ranks on the same host?

I doubt that this is possible. Each MPI rank is running a separate CUDA process - they can’t even read each other’s memory! (without using the new IPC API).

There are many other ways to explain race conditions. You can quickly prove to yourself that this is not the problem by replacing the texture fetches with global memory loads.

that’s what I thought! Thanks for the sanity check. I’ve now gone back and found that I have issues with one mpi rank. my tex2D(…) lookup isn’t giving me the correct value…

Turns out my texture dimensions didn’t meet the alignment requirement to use tex2D. So I was getting the wrong value back.

I had to allocate the memory for the texture using cudaMallocPitch, cudaMemcpy2D, and then it fine worked.