Best Mix of Models/Services on a Single Spark?

Thank you all for the work and discussion from the community, it’s been very helpful in improving the usability of my spark. At the moment, I do a lot of spinning up and down containers, reconnecting to my spark to run demos for people I’m working with, etc.

I’d like to us the spark in conjunction with a pi5 or other lightweight hardware to work on systems that can scale when run in the cloud and pointed at standard APIs, or be used in small scale cases where privacy and data ownership are a concern.

In your experience working with the spark so far, how would you try to achieve:

  • API endpoint to enable a chat interface (like with Open-WebUI, LibreChat) with preferably with vision capability, and preferably some prompt injection security
  • API endpoint for various applications that use text generation (Open Deep Research, Speakr, opencode, etc., and could probably be the same endpoint as chat)
  • Document intelligence endpoint (something like docling) for chat and RAG
  • ASR with segmentation, probably WhisperX (it didn’t appear to have an arm/cuda build, and I haven’t had a chance to make that happen, but this is the best one I’ve found so far)
  • Preferably with API keys and a way to track utilization

Thanks!

vllm-playground looks really promising, but I haven’t played with it yet. They just integrated vllm-omni support I believe.

1 Like

I use:

  • llama-swap to switch inference engines/models on the fly
  • LiteLLM Proxy as a single OpenAI compatible endpoint/gateway - routes calls to models on Spark, my other servers and cloud models with fallback, etc. It also supports Claude Code out of the box and can act as a proxy. And it keeps utilization stats/tracks costs (if applicable), etc.
  • OpenWebUI for chat/RAG/tool calling
3 Likes