NCCL error on multi machine. transport/p2p.cu :515 WARN failed to open CUDA IPC handle : 30 unknown error

Hi all,
I tried to use NCCL train a net.
It works well on one machine.
But it encounter a issue on two machines with 8 Tesla P40 each.
The error massage is like:
transport/p2p.cu :515 WARN failed to open CUDA IPC handle : 30 unknown error
INFO init.cu:484->1
INFO init.cu:542->1
‘unhandled cuda error’

However, if I disable p2p and shm, it works though the performance decrease a lot.
Anyone can help me fix this problem?

Thanks!