Memory/initalization improvements?

xmountain · November 3, 2025, 5:58pm

So, I’ve gotten the hang of DGX Sparks and the storage amount is incredible! But what is the point of so much storage, but you limited to NIM dockers with Nvidia API for memory efficiency.

When downloading your own models (not ollama 🙄), it appears vllm take up most of the compute and if you add additional computes like vlm models have a hard time running together. in the background, which causes memory errors.

Are their pure DGX Sparks compatible models that are quantized enough to be run togethers or trained enough that i loads fast if ran with as idle?

Summary:

Nice to run more than one model at a time due to slow development time when I unload one model to load another.

If anyone is confused about my question please feel free to ask for futher clarity.

Topic		Replies	Views
The New Parallel Forall Technical Blog	1	329	November 12, 2013
NVlink memory CUDA Programming and Performance	3	3770	December 11, 2018
nvidia xavier docker : shared inference model between containers Docker and NVIDIA Docker	0	625	December 5, 2019
GPU usage CUDA Setup and Installation	5	1132	October 30, 2017
Use all graphics memory in DGX Station Frameworks (archived) tensorflow	1	499	October 11, 2019
Tensorflow model memory issues with new docker version 19.01 Docker and NVIDIA Docker	1	646	September 12, 2019
Is possible multiples GPUs work as one with more memory via NVlink? cuDNN	2	3234	April 27, 2021
Reduce TensorFlow GPU usage Jetson TX2	10	1353	October 18, 2021
Welcome Back, Parallel Forall! Technical Blog	0	292	August 25, 2020
GPU Memory size change from 16GB to 32GB Docker and NVIDIA Docker	0	530	January 31, 2020

Memory/initalization improvements?

Related topics