Training transformer on DGX Spark

Greetings,

I am presently training a transformer model from ground up using one A100 with 40GB RAM.
I am planning to use the new DGX/Spark to run my notebook.
I am reading that VRAM fit is the most important piece and today I am capped to ~38 GB reserved in a single A100.

Does anyone know if the Spark box is a 4090 (24 GB) or similar ?
It would just mean more smaller batches with gradient accumulation as part of the training optimization.

Wondering if anyone has tried this already and what is your take.

I have confirmed my order today, and yet to receive the unit. I am also re-considering if the 4k spent was wise given the VRAM constraint ?

Hi

Spark has128 GB of coherent unified system memory, shared between the CPU and GPU. This architecture allows the DGX Spark to load and run large models directly without the overhead of system-to-VRAM data transfers.

Thanks. What can I expect in terms of comparable performance with respect to a A100 with 40GB RAM in the DGX spark ?

Performance is slower due to the slower memory type. There are several benchmark reviews on YouTube that discuss this.