LLMs token/sec

Can someone on AGX Orin 64GB try Ollama with --verbose with some large model like dolphin mixtral and see how many tokens/sec they get?

@i_love_nvidia there is now an ollama container in jetson-containers with images available for JP5 and JP6:

ollama uses llama.cpp though and is not expected to have the best performance (about only half the performance of MLC/TVM that you see on Benchmarks - NVIDIA Jetson Generative AI Lab)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.