I wrote a custom node that loads .safetensors using the fastsafetensors library directly to VRAM from storage, bypassing any copies. This works exactly as expected on DGX Spark, loading the BF16 version of FLUX.2 in only a few seconds:
I wrote a custom node that loads .safetensors using the fastsafetensors library directly to VRAM from storage, bypassing any copies. This works exactly as expected on DGX Spark, loading the BF16 version of FLUX.2 in only a few seconds: