Deploying FL on multiple computers with NVFlare

I am trying to run NVFlare as a realistic setup with multiple computers. After the provisioning steps, I ran the server and clients, admin by startup package. The sever is started but the client and admin computers yielded the communication error (grpc communication error).

2022-01-05 21:37:08,624 - Communicator - ERROR - Action: client_registration grpc communication error. retry: 1500, First start till now: 0.0013239383697509766 seconds.
2022-01-05 21:37:08,624 - Communicator - ERROR - Could not connect to server: ourserverdomain:8765 Setting flag for stopping training. failed to connect to all addresses

I try listing up the listening ports on the server by the nmap and it showed up 127.0.1.1:8002 which means the server is listening only to the localhost but not another computer. This makes me wonder whether the current NVFlare support running realistic scenario or only POC (prove of concept) ? Please help me to solve this problem, thank you.

Thanks for your interest in NVIDIA FLARE, and welcome to the forums!

When provisioning a realistic setup with multiple computers, the server name defined in project.yml should be the fully qualified domain name (FQDN) where the server can be reached via DNS. See the default project.yml linked below. You would replace “example.com” with your server’s domain name.
Provisioning in NVIDIA FLARE — NVIDIA FLARE 2.0 documentation

If you do not have a DNS entry for your server, you can use the server hostname. In this case, you need to add an entry to the client and admin /etc/hosts file to associate the server’s IP address with this hostname:

<server IP address> <server hostname>

This will allow you to connect from client to server using just the hostname rather than a fully qualified domain name.

Please let me know if you have any questions - happy to help troubleshoot!

-Kris

Hi,

I have exactly the same issue. Did you manage to solve it ?

Gonzalo Quintana

Hello Gonzalo,

I had the same problem when i was trying to use only the server hostname in the project.yml. When I used the FQDN, the connection was succcessfully established.

Hi Cassie,

Thank you very much for your answer.

Could you please tell me how did you obtain the FQDN of your host?

By using the command “hostname --fqdn”, I obtain the same hostname that I put in the proyect.yml file (and that I have in the /etc/hosts file). This is why I only had the hostname in the project.yml file.

Thanks in advance!

Gonzalo

Hi,
In my case, I asked the IT engineer of my organization to create a subdomain for my PC under the organization’s domain in order to have FQDN.
It’s not my domain so I don’t know if there are other solutions for this.

Hi Cassie,

Ok thanks for your answer!

Would you mind telling me what version of NVFlare are you using ?

Best,

Gonzalo