Federated learning - brats segmentation


We are trying to implement Federated learning on brats_segmentation model as shown here: https://ngc.nvidia.com/models/ea-nvidia-clara-train:clara_pt_brain_mri_segmentation

The MMAR does not have the config_fed_client.json and config_fed_server.json.

We utilized and adapted these files from Clara Train 3.1 MMARs.
config_fed_server.json (1.5 KB) config_fed_client.json (719 Bytes)

The following error is received on starting server,

Starting Admin Server flc1 on Port 8003
Server has been started.
2021-02-21 16:06:00,581 - ClientManager - INFO - Client: New client org1-a@ joined. Sent token: 8489aace-7fa6-48a8-bd30-4cf780fa7200.  Total clients: 1
2021-02-21 16:06:19,481 - ClientManager - INFO - Client: New client org1-b@ joined. Sent token: a71751ee-d132-4df9-a43c-0520bf50b6ce.  Total clients: 2
Check server status.
Error processing config /workspace/startup/../run_20/mmar_server/config/config_train.json: local variable 'trainer' referenced before assignment
Traceback (most recent call last):
  File "server/sai.py", line 368, in start_server_training
  File "utils/wfconf.py", line 163, in configure
  File "utils/wfconf.py", line 158, in configure
  File "utils/wfconf.py", line 154, in _do_configure
  File "apps/fed_learn/fl_conf.py", line 197, in finalize_config
UnboundLocalError: local variable 'trainer' referenced before assignment
FL server execution exception: local variable 'trainer' referenced before assignment
2021-02-21 16:07:53,145 - BaseServer - INFO - Stopping server training...
2021-02-21 16:07:53,146 - ServerModelManager - INFO - closing the model manager
2021-02-21 16:07:53,146 - BaseServer - INFO - Round time: 158 second(s).

Couple of questions:

Q1. What are the valid configurations for config_fed_server.json, config_fed_client.json in case of this brats_segmentation MMAR? How do we resolve the above error?

Q2. How do we decide which pre_processors and post_processors have to be used for a particular MMAR?


Any clue on this?

Thanks for your interest in clara train SDK.
it seems the error is related to the config_train.json, can you attach your full mmar you are using

Also we recommend you going through the notebook examples found at clara-train-examples/NoteBooks/FL at master · NVIDIA/clara-train-examples · GitHub
This provides an easy set up to test FL in a single docker.

For Q2: pre_processors and post_processors. unless you want to write your own handler you should stick with the out of the box one. Is there anything special you want to do ?

Also Please make sure you are using the latest sdk v3.1.01.