Can someone on AGX Orin 64GB try Ollama with --verbose with some large model like dolphin mixtral and see how many tokens/sec they get?
@i_love_nvidia there is now an ollama container in jetson-containers with images available for JP5 and JP6:
- jetson-containers/packages/llm/ollama at dev · dusty-nv/jetson-containers · GitHub
dustynv/ollama:r36.2.0
dustynv/ollama:r35.4.1
ollama uses llama.cpp though and is not expected to have the best performance (about only half the performance of MLC/TVM that you see on Benchmarks - NVIDIA Jetson Generative AI Lab)
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.