Clara train federated learning - server launches correctly but client cannot connect

Hi all,

I’m trying to launch a federated training using Clara Train v4 but I cannot establish a server-client connection.

I generate the server and clients packages using the provision tool and the following project.yml file:

project.yml (2.3 KB)

I can launch the server correctly (as shown by the terminal output):

server.txt (441 Bytes)

However, when I try to launch the client and to connect it to the server, it fails:

client_error.txt (941 Bytes)

I launch the client’s and the server’s dockers with the following files:

client_docker.txt (753 Bytes)
server_docker.txt (583 Bytes)

Do you have any clues on could may be going wrong ?

Thanks in advance,

Gonzalo Quintana

Hi Gonzalo,

Welcome to the forums, and thanks for your interest in Clara Train!

In your project.yml file, you are using MyServer for the cn: in the server configuration section, but the client error message shows your client is trying to connect to lxbuc-ama16:8004.

In order for the client to make the connection to the server, the client needs to be able to resolve the server at the hostname or fully qualified domain name (FQDN) provided in the server’s cn: entry. If your client systems are able to connect the server using the lxbuc-ama16 domain name, then just use that for the cn: in the project.yml server config. If not, you can edit the /etc/hosts file on the client systems to associate the server’s IP address with whatever hostname you provide for the cn:, like:

<server IP address> <server hostname used for cn:>

This will allow you to connect from client to server using whatever hostname you provide rather than a fully qualified domain name.

-Kris

Hi Kris,

Thank you very much for your answer.

There was an error in the files I uploaded. Actually, the “cn” field in project.yml was already set to “lxbuc-ama16”, but the connection cannot be established.

I’ve added the line with the server IP address and the server hostname to the /etc/hosts file but it still doesn’t work. I also used the ping command to verify if the client reaches the server correctly with it’s hostname, and it does.

I’m pretty blocked with this issue and I’m running out of ideas. Maybe there is something wrong when running the dockers ?

Best,

Gonzalo

Hi,

I have inspected the ports that are being listened by the server with the command “netstat -tulpn | grep LISTEN” and I get the following result:

I can see that the server is actually listening to the ports 8003 and 8004 coming from all the IP addresses (which seems to be ok). However, what surprises me is the fact that the TCP6 protocol is being used.

Is this expected ? Or NVFlare should use TCP4 ?

Thanks in advance for all your help.

Best,

Gonzalo