Contribution of clients when aggregation weight is set to 0 and 1

Please explain the following scenario:

Aggregation weights applied:
org1-a: 0
org1-b: 1

org1-a data applied on org1-a model gets the following results

  • {‘org1-a’: {‘org1-a’: {‘validation’: {‘validation mean dice’: 0.4593188166618347, ‘validation loss’: -0.08476769179105759}}}}

org1-b data applied on org1-a model gets the following results

  • {‘org1-b’: {‘org1-a’: {‘validation’: {‘validation mean dice’: 0.32660916447639465, ‘validation loss’: -0.23998698592185974}}}}

org1-a data applied on org1-b model gets the following results

  • {‘org1-a’: {‘org1-b’: {‘validation’: {‘validation loss’: -0.08627358824014664, ‘validation mean dice’: 0.4237763285636902}}}}

org1-b data applied on org1-b model gets the following results

  • {‘org1-b’: {‘org1-b’: {‘validation’: {‘validation loss’: -0.10886269062757492, ‘validation mean dice’: 0.36573535203933716}}}}

org1-a data applied on server model gives the following results:

  • {‘org1-a’: {‘server’: {‘validation’: {‘validation mean dice’: 0.31541159749031067, ‘validation loss’: -0.08959338814020157}}}}

org1-b data applied on server model gives the following results:

  • {‘org1-b’: {‘server’: {‘validation’: {‘validation loss’: -0.13982881605625153, ‘validation mean dice’: 0.22080902755260468}}}}

As we can see from the above results, org1-b data applied on the org1-b model gives a ‘validation mean dice’ of 0.36573535203933716 and org1-b data applied on org1-a model gives us validation mean dice: 0.32660916447639465.

Why is the org1-b data on server model giving lower validation mean dice i.e. 0.22080902755260468 even though we are using org1-b aggregation weight as 1 (And so the server model should be completely dependant/replica of org-1b model)?

Does the server model start with some predefined base model which we are not aware of?

Hi. This can happen because the client’s model validated here is actually selected based on the best local validation score. The server model evaluated here is just the latest state the model was in when completing FL training. You can try using the ModelSelectionHandler (v3.1) or InTimeModelSelectorHandler (v4.0) on the server to get a “best” FL model which then corresponds to the global checkpoint that achieved the highest average validation scores across all clients.

FYI, the clients always start training based on the model initialized by the server. If a client starts training later, it will still pull the server model of the current round and fine-tune from that during local training.

Thanks hroth3hm8y,

So how and where do we set FL to use InTimeModelSelectorHandler? Can we specify it in the config?

Here


it is.