Image to Video Generation using the Spark vs. PC

Hey everyone, I’m trying to troubleshoot something with image-to-video generation on my DGX Spark and wanted to see if anyone here has run into the same thing.I’m testing WAN 2.2 image-to-video natively on the Spark, not through ComfyUI, and I’m seeing a pretty big performance gap. A simple 5-second I2V generation is taking 30+ minutes on the Spark and it eats a huge amount of memory while it runs.For comparison, I loaded ComfyUI on my PC and used the same reference image with the prompt, “woman puts on a red jacket.” On my PC, ComfyUI finished the 5-second video in about 3 minutes and followed the prompt correctly. The PC also didn’t seem to use anywhere near the same amount of memory. My PC has an Intel i9-14900K, MSI Ventus RTX 5070 12GB GDDR7, 32GB DDR5 6000MHz RAM, a Samsung 9100 PRO 4TB PCIe 5.0 NVMe drive, and an Acer FA200 4TB PCIe 4.0 NVMe drive.That’s what has me confused. The DGX Spark has far more unified memory available, but the native generation path is much slower and appears to consume memory aggressively, while my desktop finishes the same general WAN 2.2 I2V test much faster through ComfyUI.Has anyone here compared native WAN 2.2 I2V generation on DGX Spark against ComfyUI on a desktop RTX GPU? I’m trying to figure out if this is expected behavior because of the native/Diffusers-style pipeline, model loading, scheduler/settings, CPU/GPU offload, unified memory behavior, or if I may have something misconfigured on the Spark.I’m especially curious whether ComfyUI is using a more optimized workflow or acceleration path, and whether anyone has found a native Spark setup that gets closer to ComfyUI performance.Not trying to bash the Spark at all. I’m just trying to understand the right way to run I2V efficiently on it and whether native generation is currently the wrong path compared to ComfyUI.

DGX Spark uses a unified memory architecture, so workflows designed for low‑VRAM GPUs in WAN 2.2 cannot be applied directly without modification. For example, block swap is necessary on a 5070, but on DGX Spark it causes a significant performance drop.

it doesn’t appear to the apples-to-apples. My understanding of your configuration is

  • DGX Spark: native WAN 2.2 pipeline
  • RTX 5070 PC: WAN 2.2 through ComfyUI

Is it possible to compare

  • WAN 2.2 in ComfyUI on both Spark and 5070, same workflow/settings, or
  • native WAN 2.2 on both machines, same model/resolution/frames/steps/precision/offload settings.

Memory bandwidth is the main factor: Spark has ~273 GB/s of LPDDR5X (shared CPU+GPU), your 5070 has ~672 GB/s GDDR7. Video diffusion is heavily bandwidth-bound, so a ~2.5× gap is expected.

But 3 min vs 30+ min is more than that — bandwidth alone should put Spark at ~7–10 min. The rest is probably the pipeline. ComfyUI has years of optimizations (tiled VAE, sequential offload that overlaps with compute, attention slicing). A “native” Diffusers run keeps everything resident, and on Spark’s 128 GB unified memory that becomes a memory-bandwidth penalty rather than a fit problem.

Try running the same WAN 2.2 workflow through ComfyUI on the Spark itself. If it lands in the 7–15 min range, the difference was the pipeline, not the hardware. If it’s still 30+ min, that’s a separate issue (kernel fallbacks, missing sm_121 ops) worth its own thread.

Spark’s value isn’t peak speed on bandwidth-bound workloads — it’s running things you can’t fit on a 12 GB card. Different tool for a different job.

My personal advice - change model for LTX 2.3. Much faster, and the temporal consistency is finally reaching that cinematic “sweet spot” we’ve been waiting for.

The code to run these video models has been highly optimized for block swapping, and highly NOT optimized for Spark. Plus, the larger pool of unified Spark memory is relatively slow.

Try running the same ComfyUI workflow on both machines… I suspect you will wind up with 3 minutes vs. 10 minutes.