LLMs token/sec

i_love_nvidia · April 7, 2024, 5:55pm

Can someone on AGX Orin 64GB try Ollama with --verbose with some large model like dolphin mixtral and see how many tokens/sec they get?

dusty_nv · April 8, 2024, 1:40pm

@i_love_nvidia there is now an ollama container in jetson-containers with images available for JP5 and JP6:

ollama uses llama.cpp though and is not expected to have the best performance (about only half the performance of MLC/TVM that you see on Benchmarks - NVIDIA Jetson Generative AI Lab)

system · May 7, 2024, 8:17am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Want to run a Local LLM on Nvidia Jetson AGX Orin Jetson AGX Orin generative_ai	3	1318	July 17, 2024
TensorRT for Large Language Models Jetson AGX Orin	2	583	September 11, 2023
Workflow to add newer Multimodal LLMs to evaluate Jetson AGX Orin generative_ai	3	27	October 3, 2024
Xavier AGX NanoLLM Compatible?!?! Jetson AGX Xavier generative_ai	3	229	May 15, 2024
Running Ollama / llama3.1 on Jetson AGX Xavier 16gb is it possible? how-to? Jetson AGX Xavier generative_ai , llama-31-8b-instruct	8	797	October 19, 2024
@Dusty_nv has anyone managed to get Ollama running with llama3.2-vision yet? Jetson AGX Orin cuda , generative_ai , llama	6	145	December 14, 2024
Ollama is running slow on Jetson AGX Orin Dev-kit (32G) Jetson AGX Orin generative_ai	2	1041	February 29, 2024
Unable to Utilize GPU for LLM on NVIDIA Jetson AGX Orin Jetson AGX Orin generative_ai	4	139	July 4, 2024
Ollama timing out when attempting to use GPU instead of CPU Jetson AGX Orin cuda , jetson-inference , generative_ai	9	2057	August 27, 2024
Performance Issues with LLM model on NVIDIA Jetson Orin NX (16GB) Jetson Orin NX generative_ai	2	509	June 13, 2024

LLMs token/sec

Related topics