Just saw this (GitHub - Tencent/hpc-ops: High Performance LLM Inference Operator Library), is in its infancy so lacking a lot of features / supported quants of the big boy alternatives, but reported performance gains look nice… hopefully those would also work on our sparks.
1 Like
Thanks for informing the community. Hopefully it can help some people with their projects