cosinus
3
As for the prices:
As for the “biggest LLM” - may be you check out first what you might get when running different LLMs over here:
using the famous eugr vllm tools (makes running them in a cluster much easier).
And if you want to go for some more speed (not always) have a look over here:
llama.cpp is handy for single spark use. For agentic use vLLM should be better.
And if you have too much money or a lot of YouTube subscribers:
AFAIR Alex did run Kimi K2 and Qwen3.5 397B - just need 8 Sparks.
2 Likes