Jetson orin nano insanely slow inference speed?

I have the Jetson orin nano 8gb and all I can figure is I must be doing something horribly wrong.

Trying to run inference with Ollama on even the smallest models (phi3 for example) is insanely slow. Like totally unusable slow. This IS while trying to use the GPU. I am getting maybe 1 token per 10 or 15 seconds with it. Mean while running the same model on a Raspberry pi I am getting about the same speed as I get with GPT4 on their website. Am I doing something wrong, do I possibly have a defective unit, or am I completely misunderstanding the capabilities of the unit?

I have tried this with R35.5.3 and R36.2. Just now seeing that R36.3 is out so I will give that a go but in the mean time any guidance is appreciated.

Same problem for me, I tried the ollama container like in the tutorial and it is slower then on a regular pc.

1 Like

Hi,

Please follow our sample below:

It’s expected Orin Nano can reach ~16 tokens/sec with llama2 7B model.

Thanks.