Are there any plans to update OFED driver in nvidia containers?
ERROR: Detected MOFED driver 4.7-1.0.0, but this container has version 4.4-1.0.0.
Unable to automatically upgrade this container.
Use of RDMA for multi-node communication will be unreliable.
NOTE: MOFED driver was detected, but nv_peer_mem driver was not detected.
Multi-node communication performance may be reduced.
The 19.11 release does include 4.7.
(PS: You’ll still want to get nv_peer_mem installed if you’re doing multi-node. If you’re not doing multi-node, then the MOFED driver version doesn’t matter either.)
I do have nvidia_peer_memory-1.0-8.x86_64 installed
rpm -qa | grep nvidia_peer
Driver Version: 418.87.01
CUDA Version: 10.1
module isnt loaded for some reason… :( let me fix that and pull 19.11 to test. Thanks.