Need example to disable nvlink

I am doing test that comparing nvlink enabled and disabled.

I only know need to write some value into the register, but I don’t know how and the exact way to disable nvlink.

[url]https://devtalk.nvidia.com/default/topic/1043497/cuda-programming-and-performance/dgx-1-using-pcie-only-instead-of-nvlink/[/url]

Has there been any feature addition for this as of late? I was wondering if there is an implicit way to have the CUDA driver transparently switching from NVLink to GPUDirect-RDMA? I am looking to do some benchmarking between the two utilizing cuda-aware MPI (the openmpi impl).

There is no way to disable NVLink. If Peer activity is enabled between two GPUs that are directly connected by NVLink, then NVLink will be used for peer transfers. There are no alternatives.

What happens if I call cudaDeviceDisablePeerAccess? Will the driver honor that setting, and bypass nvlink? Will also bypass PCIe peer-to-peer?

Yes, it will honor it for the process/context that called it (wouldn’t it be broken otherwise?) That doesn’t mean it will apply to other activity, from, say, other processes. In that case, for transfers within that process/context, the data would flow as described already in the previously linked thread. I also wouldn’t be surprised if it returned an error code if you had not previously called the enable function. But I haven’t tested that. It certainly would not make a lot of sense to call the disable function if you had not previously called the enable function.

Cool- And just to confirm, if I were to use a managed allocation (using USM) and try to do a memcpy (kind=cudaMemcpyDefault) to allocations on different devices, your saying the driver WILL do an intermediate copy back to the host instead of a DMA transfer from one GPU to another directly, when peer functionality is deactivated?

Since the forum limits the replies then suggest to edit the previous submission, let me elaborate here on the below response:

3:55pm : Source and Destination are managed memory. The managed memory I would hope would do exactly what an application code would do to determine what type of copy behavior is best and would prioritize (NVLink, PCIE P2P, or host copies) the most optimal way to move forward. Are you saying there is a considerable disparity between performance in using managed and unmanaged memory? If so, do you recommend not using USM?

The managed memory system may make its own decisions about what to do. I wouldn’t use managed memory if I were interested in the most precise control over copy behavior. For a complicated scenario like that, I’m not going to try to read a single english sentence and assume that my picture of the code is the same as yours. I’ll just give one example, it’s not clear to me if you are talking about a single managed allocation to other non-managed allocations, or if all the allocations in question are managed. Anyway, I probably won’t be able to respond further here.

Do you have a working example of it, using NVBit?

Hi Andrei,

Unfortunately, I dont have an example because we stopped pursuing the NVBIT route do to an easier way using ompi. Ompi has flags to turn it off when using cuda-aware MPI. If you are using cuda-aware MPI you can use the -mca btl_smcuda_use_cuda_ipc 0 flag.

Maybe the driver can disable the NVLink. Try the following steps

  1. touch /etc/modprobe.d/nviadia.conf
  2. echo “options nvidia NVreg_NvLinkDisable=1” >> /etc/modprobe.d/nviadia.conf