Thank You for providing this forum.
Our team has been trying Federated learning framework provided by Nvidia. We are trying to deploy sample notebook provided on spleen segmentation. We are using GCP compute instance. However, our training is crashing after 1 federated round. We are using instance with 2 GPUs and n1-standard-32 machine type. GPUS are NVIDIA Tesla P4. We have following questions regarding that -
Is this a common problem?
Is there a machine type or GPU size that you would recommend?
We have been struggling since a week. Your response will be highly appreciated.