Memory/initalization improvements?

So, I’ve gotten the hang of DGX Sparks and the storage amount is incredible! But what is the point of so much storage, but you limited to NIM dockers with Nvidia API for memory efficiency.

When downloading your own models (not ollama 🙄), it appears vllm take up most of the compute and if you add additional computes like vlm models have a hard time running together. in the background, which causes memory errors.

Are their pure DGX Sparks compatible models that are quantized enough to be run togethers or trained enough that i loads fast if ran with as idle?

Summary:

Nice to run more than one model at a time due to slow development time when I unload one model to load another.

If anyone is confused about my question please feel free to ask for futher clarity.