Hi, I’m trying to run MaxDiffusion (SDXL) on GCP and it OOMs. Tried with ICI DP=8 or FSDP=8, per_device_batch_size=1, it OOMs on a single H100 node (8xH100). SDXL is a relative small model with around 4B parameters and it should not OOM on a single node.
Here is the repo:
Here is the config & command:
export CHECKPOINTS=gs://{MY_GCS_BUCKET}/maxdiffusion_gpu/config_only/models–stabilityai–stable-diffusion-xl-base-1.0
export CHECKPOINTS_LOCAL_DIR=/tmp/maxdiffusion_gpu/config_only
export CHECKPOINTS_LOCAL=/tmp/maxdiffusion_gpu/config_only/models–stabilityai–stable-diffusion-xl-base-1.0
export LD_LIBRARY_PATH=/usr/local/cuda-12.6/compat:$LD_LIBRARY_PATH && pip install .[training] && mkdir -p $CHECKPOINTS_LOCAL_DIR && gsutil -m cp -R $CHECKPOINTS $CHECKPOINTS_LOCAL_DIR & $CHECKPOINTS_LOCAL/unet && python -m src.maxdiffusion.train_sdxl src/maxdiffusion/configs/base_xl.yml hardware=gpu run_name=$RUN_NAME output_dir=gs://$OUTPUT_PATH train_new_unet=true train_text_encoder=false cache_latents_text_encoder_outputs=true max_train_steps=20 ici_fsdp_parallelism=1 pretrained_model_name_or_path=$CHECKPOINTS_LOCAL per_device_batch_size=1