GPUDirect - for host to host rdma using cuda, does the output from cudaDeviceCanAccessPeer need to be 1?

Question: GPUDirect - For host to host rdma using cuda to work, does the output from cudaDeviceCanAccessPeer need to be 1?

–background–

I’m trying to do RDMA from gpu memory to gpu memory between two hosts.

The HCA is a ConnectX-5 VPI setup with Eth fabric / openibd.

Output of lspci -t: see attachment

The computers are just Dell Optiplex 9020s with their latest bios.

The OS is RHEL 7.6 linux (kernel 3.10.0-957.el7.x86_64),

nv_peer_memory_1.0-8 (compiled from -master),

cuda_10.1.168_418.67_linux,

OFED 4.6-1.0.1.1

The GPUs are Quadro K600s (compute level 3.0).

The following is a full dump of the gpu capabilities (same gpu on both boxes):

Name = Quadro K600

uuid = 0x441B3084

luid[0] = 0x0

luid[1] = 0x0

luid[2] = 0x0

luid[3] = 0x0

luid[4] = 0x0

luid[5] = 0x0

luid[6] = 0x0

luid[7] = 0x0

luidDeviceNodeMask = 0

totalGlobalMem = 1029963776

sharedMemPerBlock = 49152

regsPerBlock = 65536

warpSize] = 32

memPitch = 2147483647

maxThreadsPerBlock = 1024

maxThreadsDim[3] = 1024,1024,64

maxGridSize[3] = 2147483647,65535,65535

clockRate = 875500 KHz

totalConstMem = 65536

major compute capability = 3

minor compute capability = 0

textureAlignment = 512

texturePitchAlignment = 32

deviceOverlap = 1

multiProcessorCount = 1

kernelExecTimeoutEnabled = 0

integrated = 0

canMapHostMemory = 1

computeMode = 0

maxTexture1D = 65536

maxTexture1DMipmap = 16384

maxTexture1DLinear = 134217728

maxTexture2D[2] = 65536,65536

maxTexture2DMipmap[2] = 16384,16384

maxTexture2DLinear = 65000,65000,1048544

maxTexture2DGather[2] = 16384,16384

maxTexture3D[3] = 4096,4096,4096

maxTexture3DAlt[3] = 2048,2048,16384

maxTextureCubemap = 16384

maxTexture1DLayered[2] = 16384,2048

maxTexture2DLayered[3] = 16384,16384,2048

maxTextureCubemapLayered[2] = 16384,2046

maxSurface1D = 65536

maxSurface2D[2] = 65536,32768

maxSurface3D[3]) = 65536,32768,2048

maxSurface1DLayered[2] = 65536,2048

maxSurface2DLayered[3] = 65536,32768,2048

maxSurfaceCubemap = 32768

maxSurfaceCubemapLayered[2] = 32768,2046

surfaceAlignment = 512

concurrentKernels = 1

ECCEnabled = 0

pciBusID = 5

pciDeviceID = 0

pciDomainID = 0

tccDriver = 0

asyncEngineCount = 1

unifiedAddressing= 1

memoryClockRate = 891000

memoryBusWidth = 128

l2CacheSize = 262144

maxThreadsPerMultiProcessor = 2048

streamPrioritiesSupported = 0

globalL1CacheSupported = 0

localL1CacheSupported = 1

sharedMemPerMultiprocessor = 49152

regsPerMultiprocessor = 65536

managedMemory = 1

isMultiGpuBoard = 0

multiGpuBoardGroupID = 0

hostNativeAtomicSupported = 0

singleToDoublePrecisionPerfRatio = 24

pageableMemoryAccess = 0

concurrentManagedAccess = 0

computePreemptionSupported = 0

canUseHostPointerForRegisteredMem = 0

cooperativeLaunch = 0

cooperativeMultiDeviceLaunch= 0

sharedMemPerBlockOptin = 49152

pageableMemoryAccessUsesHostPageTables = 0

directManagedMemAccessFromHost = 0

lspciv.txt (36.5 KB)

Seems to be answered here

https://devtalk.nvidia.com/default/topic/1057575/gpudirect-question-cudadevicecanaccesspeer-information/