Greetings,
I am presently training a transformer model from ground up using one A100 with 40GB RAM.
I am planning to use the new DGX/Spark to run my notebook.
I am reading that VRAM fit is the most important piece and today I am capped to ~38 GB reserved in a single A100.
Does anyone know if the Spark box is a 4090 (24 GB) or similar ?
It would just mean more smaller batches with gradient accumulation as part of the training optimization.
Wondering if anyone has tried this already and what is your take.
I have confirmed my order today, and yet to receive the unit. I am also re-considering if the 4k spent was wise given the VRAM constraint ?