Apperently it increases speed vs Flashinfer by 2x.
Has anabody already tested it with eugrs vllm spark container or will this even be relevant for us?
Thanks.
Requirements: “SM90 or above” - it isn’t specific to SM90. In fact, there’s good reason to believe this will be helpful to us; Hopper era optimizations for FP8 deliver basically identical scaling on GB10.
lol, pretty big. I’m currently wrapping up some unrelated projects, so can’t be as active with the project and forums as I would like to, but I will have more time in the coming weeks :)
Thanks, bookmarked it. Unfortunately my DGX Spark is tied up with a fine-tuning run for the next week, so I’ll take a proper look once it’s free. I’ll try to review it on my Mac in the meantime, though I’m not sure that’ll be enough to tell whether it’s worth integrating into the DGX Spark setup.