Hi Alex,
I’m assuming this is referring to an issue with the Hybrid RAG example project? If so, the support megathread is here.
You are viewing the Service logs. Select the dropdown on the left hand side of the Output widget, scroll down, and select “Chat” under applications
. You can then view the proper logs there.
Looks like you are running into an issue with starting the local inference server? Here are some common issues we see users facing:
-
Authentication. Make sure you are either using an ungated model (nvidia/Llama3-ChatQA-1.5-8B), or have your Hugging Face token configured as a project secret with access to the gated models you are interested in running.
-
Out of memory. Make sure you are running locally on a GPU enabled system and that your GPU is not running any other processes that may interfere with running the model.
-
Timeout. The most common issue we see is the application times out on server start. The default timeout is set to 90s but depending on hardware you may need more time. You can edit this in code under
code/chatui/chat_client.py
, scrolling to the bottom and extending the timeout.