identical code on multiple GPUs attached to the same board. how to do p2p memaccess?


i hope you are doing good. I am using a small piece of code [which i cannot seem to paste here as then submission of this post wasn’t working] as a testbench to find out if i can use the p2p memory access routines to use 8 GPUs on a single machine to solve a problem.

My code is called by a fortran code where i set the device according to the MPI rank issued to a node. Further the fortran code calls a C code that looks for GPUs in the system and finds out which peers it can access. Then each GPU (uniquely identified by the mpi rank) initiates a transfer to its next neighbor to which peer access is possible.

What i notice is that instead of copying from the peer the GPUs seem to access their own pointers. I guess this is because all GPUs use the same code and hence the same pointer names. Is this true?

If yes then how can i use a code that runs on each GPU ( and has same variable names on each GPU) to access peer memory?

kindly suggest/advise.
thanks in advance
with kind regards
nvidia_query_p2p.txt (6.29 KB)


i am attaching the code for my testbench. could someone please suggest if it is possible to utilise peer2peer memory access in GTX580 GPUs (4 of them connected to the same IOH) when identical code runs on each GPU (so same names for all pointers as opposed to different in the CUDA SDK example).

in anticipation of a reply and thanks in advance.

rohit (4.14 KB) (1.15 KB)

First of all you need to build and run this application in 64 bit machine. in windows you need a TCC mode driver (For Tesla) or you can go for Linux OS and NVIDIA driver.

Check you device has peer to peer access using the function cudaDeviceCanAccessPeer()

Then enable the Peer to Peer access in each GPU to the other one

cudaDeviceEnablePeerAccess(gpuid[1], 0);
cudaDeviceEnablePeerAccess(gpuid[0], 0);

Copy can be done
cudaMemcpy(g1, g0, buf_size, cudaMemcpyDefault);

After usage Disable peer access (also unregisters memory for non-UVA cases)