Qwen3.5-122B-A10B on single Spark: up to 51 tok/s (v2.1 — patches + quick-start + benchmark)

Will you be looking into https://atlasinference.io/#models ? As its for Spark and RTX specifically and a lot smaller, things like call overhead could be more easily addressed / not become an issue :)