Which container - jetson container or nvidia container?

I barely know anything about Docker, except to have successfully downgraded to v37 for my Jetson Orin Nano SDK. No error messages so far but I need guidance on the recommended one, please.

The instructions at https://www.jetson-ai-lab.com/tutorial_openwebui.html worked fine. I am able to download Ollama compliant models, but the query responses are returned as a paragraph worth of gray bars - no actual text, but then several minutes later (in some instances) the bars morph into proper text that represent the answers. What is the error on my part? Thanks.

Regards.
P.S.
Sorry for the two-part question but perhaps I messed somewhere in deploying Open WebUI with the pre-built Docker container. Thanks for your understanding.

Hi,

The docker v28.0.1+ can work on Jetson and you don’t need to downgrade anymore.

Do you meet this issue constantly?
It’s expected that the first test to be slower as it might need to download or initiate the environment.

Thanks.

Hello @AastaLLL,

Allow me to back up a little bit on the Docker situation before I answer your specific question because that might shed some light on some self-inflicted issues at my end.

After receiving my Nano SDK (from Seeed in late Jan - I mention this specifically because there may be some doubts about this SKU based on some constrained supply threads in this forum, but the box clearly stated SDK):

  • Flashed microSD and booted and completed the standard Ubuntu install
  • Rebooted the obligatory 3 times (including applying JetPack 6)
  • Happy to see MAXN SUPER
  • Couldn’t run Docker v28; read the relevant guides (Jetson) and was successful in deploying Docker v27; not any wiser, so I have stuck with v27 (and thanks to Jetson Developer team to guide me into using Docker after all these years of challenges on other single board computers)
  • Ollama is running as a service outside Docker; runs rather well in CLI mode too
  • Open WebUI is running in a container using instructions from https://www.jetson-ai-lab.com/tutorial_openwebui.html

Sorry, for this long-winded preamble but now back to your original question:

  • Yes, the issue is fully repeatable
  • My limited knowledge tells me that the Ollama response is fast based on visual interpretation of the bars versus text in the response paragraph
  • The rendering by “Javascript?” is the issue?
  • After an extended lapse of time (over 5+ minutes, guessing here), partial text is revealed; after even more time, more text is revealed - all responses are correct (the queries are on dimensions of objects in the solar system) with math formulae but even simple text-only responses display the same delays
  • Since I was in headless mode, I thought that DNS (or other network latency) could be an issue (I have Pihole/unbound/OPNsense in the resolution path) because I had to do some network changes but local DNS queries are never pushed out to the WAN
  • So I turned to local access directly on the Jetson (using the default 127.0.0.1:8080 address) but the manifestation did not change - gray bars instead of text and then after several minutes a few lines reveal themselves
  • So my simple conclusion is that the access method is not the issue but something to do with Open WebUI; not sure that this a bug, so I have refrained from posting an issue at their GitHub site - will do if you advise as such

Bottom line, I don’t have any specific Docker version dependencies. I simply want to use some UI on the Jetson for LLM self-paced learning. There is no arm64/aarch64 for MSTY.

Thanks again.

Regards.

Hi,

Do you mean each time the query is sent, the same behavior occurs? (gray bar ->(some minutes)-> proper text).
If so, would you mind sharing a video or picture to capture the issue you face and share it with us?

Thanks.

Sure, let me get my ducks in a row and I’ll upload some screen shots (since it is mostly static) with approximate latency numbers. Thanks.

Regards.

Hello @AastaLLL,

I feel that it is related to my choice of models, which is perfectly understandable.

llama3.2 (3.2b?) has no issue, it even seems to look ahead before I hit the Send key. On the other hand, gemma:latest (9b) continues to display the issue (see attached screenshot) for approximately a minute or so for this query, then displays the response. Other queries hang for much longer. I didn’t get the parameter size of either the llama or the gemma models (just chose the first one in the list - number in parentheses above should be correct).

Please consider the issue resolved. I will simply have to be careful about the choice of models (especially on the Nano). At least now that I understand that I have to try out different models and compile my own list of usable models, I am satisfied with the status quo. Appreciate your nudge in the right direction for me.

Regards.

P.S.
llama does display gray bars, but it is over in a flash.

One other observation:

No issue with gemma while using Ollama for the same query on the Nano. The issue manifests only when using Open WebUI.

Regards.

Hi,

Yes, the issue is related to the model size.
We usually recommend to try the model with size <4B on the Orin Nano.

Gemma 9B is too big for the Orin Nano.
But there is a Gemma 2B model, could you give it a try?
Is the Gemma ran with Ollama also a 9B version?

You can find our testing for different model in the link below:

Thanks.

OK. Thanks for the heads up on Models page. That is a great list. It will take some time to step through (given all the quantization granularity, too).

The gemma3:4b didn’t show any improvement. As you have indicated, I will have to go down a size or two. I will continue to step lower but I have sufficient guidance from you on what steps I need to perform. Thanks.