Federated learning with secure_train set to true

Hi Team,
I am able to do a successful training of a model from the data present on the client-side by setting the secure_train to false using the Nvidia Clara federated learning. But when the secure_train is set to true along with the SSL certificate on the server and the client which are created by following the instruction stated in this link (https://docs.nvidia.com/clara/tlt-mi/clara-train-sdk-v3.0/nvmidl/additional_features/federated_learning.html#instructions-on-how-to-create-the-self-signed-ssl-certificate-authority-and-server-clients-certificates) the client is not able to connect with the server.

Moreover, I couldn’t find any documents on the web which helps us to set up secure federated learning.

Setup for Federated learning:

  • Cloud Platform:- GCP
  • 1 server and 1 client in the same geographical region and subnet.

Hi
Thanks for your interest in clara train. I am glad you managed to get started with FL.

For your issue, You are on the right track to generate certifications, You might have missed a step in the certification creation. As a first step to debug I would recommend that you use sample fl mmar from NGC https://ngc.nvidia.com/catalog/models/nvidia:med:clara_mri_fed_learning_seg_brain_tumors_br16_t1c2tc_no_amp/files This already has the certifications in the resources folder.

We have recognized that some steps in the FL process as generating certifications is a bit hard. Therefore, we have simplified the process by adding a provisioning tool. This will be in the coming release of clara train V3.1 targeting end of October.
So please stay tuned

Hi Aharoni,
Thanks for the information. Will be waiting for V3.1.
Just wanted to know one more thing that when we are doing a secure train is it necessary to have a domain name or we can use the server IP of the machines.

Hi

For testing multiple clients and server within the same docker you should use localhost. For realistic setup with each client running different docker on same or different physical machine you should use server full name

In V3.0 you could use the ips however in V3.1 you can NOT use IPs as it would be part of the ssl signature

Hope that helps