[SUPPORT] Workbench Example Project: Hybrid RAG

bipul.personal7 · September 23, 2024, 1:22am

Thank you for your response.
Again, there is issue with Running Models on NVIDIA GPU - RuntimeError: No HIP GPUs Available
I’ve been able to successfully download both Ungated nvidia/Llama3-ChatQA-1.5-8B and Gated mistralai/Mistral-7B-Instruct-v0.2 models, but whenever I attempt to start the server, I encounter the following error:

RuntimeError: No HIP GPUs are available rank=0
xxxx-xx-xxxxx:45:29.577451Z ERROR text_generation_launcher: Shard 0 failed to start
xxxx-xx-xxxxx:45:29.577477Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart
I’m using an NVIDIA GEFORCE RTX 4060 and it seems the server is attempting to initialize HIP (which is typically for AMD GPUs) rather than CUDA(please correct me if i am wrong). Could you suggest what steps I can take to resolve this and which models I should download that will work best with my NVIDIA GPU?

edwli · September 23, 2024, 11:21pm

Hi, thanks for reaching out. The screenshot and the output logs both tell me the model weights are still being pulled down. These are very large–perhaps you are facing a slow network connection?

I also see a Detected system rocm in the output logs–can you verify you are running with NVIDIA GPUs and not AMD? Thanks!

And yes local RAG was a completely separate project that consists of functionality that is a subset of this project and has since been replaced by this one.

edwli · September 23, 2024, 11:25pm

Hi, yes following up from my previous comment above, it seems like from the output logs of the Hugging Face TGI server, it is detecting AMD hardware.

Detected system rocm

Typically with NVIDIA GPUs, you would see Detected system cuda instead. If running CPU only, it would say Detected system cpu. Are you absolutely sure you have no other GPU hardware on this system? Can you run nvidia-smi to see the GPUs, or open up the settings and see the dedicated GPU(s) on the system information?

bipul.personal7 · September 24, 2024, 9:46am

2024-09-24 08:58:07.311 | INFO | text_generation_server.utils.import_utils:<module>:75 - Detected system rocm

Yes, I’m using NVIDIA GPUs. You can see in the screenshot nvidia-smi output.
when I try to download a model from Hybrid RAG: Chat UI it says ‘Detected system rocm’ .
Also, there’s a warning message that says ‘Could not import SGMV kernel from Punica, falling back to loop.’
Just to note, my internet connection is pretty good I have a 40Mbps speed currently.

bipul.personal7 · September 26, 2024, 12:45pm

Hello admin,
I’m still awaiting a response, as I need to proceed with my project using the local GPU. Any guidance would be greatly appreciated. I suspect the issue might stem from the code we pulled, possibly due to misconfiguration or fallback logic inside it, though I’m not entirely sure. I would be grateful if you could kindly clarify or provide any insights.

edwli · September 26, 2024, 11:06pm

Hi,

Yeah, we haven’t really faced this issue before because this is not expected behavior. Hugging Face should detect cuda on your system if you are running with a genuine NVIDIA GPU. I could see it detecting cpu if something was wrong with detecting the GPU, but detecting rocm especially when you don’t have any AMD GPUs on the system feels like a bug to me.

Unfortunately, we are downstream from Hugging Face and do not deal with Hugging Face’s TGI container or repo directly. I suggest filing a bug with their repo here and having their engineers take a look at the issue.

bipul.personal7 · September 28, 2024, 7:11am

After reinstalling the OS, AI Workbench, and the Hybrid-RAG project, CUDA is now detected. I’m unsure why it wasn’t recognized first time, but everything is working fine now. The NVIDIA GPU is genuine. Thank you for your generous support

avinashaacharya1 · October 1, 2024, 11:31am

You can work around this by doing the following:

Stop the project container if running

Open Environment > Scripts > postBuild.sh

Add “fastapi==0.112.2” to the end of this line in the file (I’ve already made this change in the upstream repo)

Clear cache and rebuild

Start the chat app

This fixed my issues back when i came across this solution but currently the same error starts to show up since yesterday and i can’t figure out why. Since the last time it worked i made very few changes to the code, and i made sure not touch the sensitive parts. It would be great if anyone can help me with this.

Error :

{"level":"error","error":"(Container project-arishtha) bash -lc \"PROXY_PREFIX=\\\"/projects/Arishtha/applications/Gradio\\\"; PROXY_PREFIX=\\\"/projects/Arishtha/applications/Gradio\\\" cd /project/code &&  PROXY_PREFIX=$PROXY_PREFIX python3 client.py\": signal: killed","project_name":"Arishtha","application_name":"Gradio","time":"2024-10-01T16:50:30+05:30","message":"failed to execute launch command."}

Logs :
workbench.log (970.2 KB)

avinashaacharya1 · October 1, 2024, 11:46am

Minutes after showing the errors, when i go back to workbench it shows this

It doesn’t opens the web browser even after checking the box for Auto Launch, and on the port 8080 in the browser, it does not display anything at all.

edwli · October 1, 2024, 5:55pm

Hi, so it turns out this issue is reproducible and I think I’ve pinned down the problem according to the issue here.

We’ve previously just pulled the latest tag in this project, but it appears to be improperly linked on the huggingface side. We’ve since pinned the base container to a specified version (2.3.0) in the upstream repo until this gets resolved.

edwli · October 1, 2024, 6:40pm

Hi, this is the support thread for a specific example project. For issues with your own project, I would recommend starting your own thread.

If you open Output and then select Gradio from the dropdown, do you see any errors in that log file upon start up?

Here is a breakdown of one Gradio example I have up on a project as a reference. Just as a side note:

the $PROXY_PREFIX environment variable is configured automatically by AI Workbench.
You do not see anything on http://localhost:8080 because AI Workbench’s reverse proxy handles the routing to port 10000.

From your description, it appears the start command runs, gets timed out somehow, but then succeeds at a later time when the UI naturally refreshes since the health check turns green in your screenshot. Is this accurate? And if you click the link next to the list of running apps, does it open anything in the browser?

avinashaacharya1 · October 1, 2024, 9:07pm

When launching the Gradio App, the first try gives an error like the one i just posted above. But for the second try it launches successfully. I have tried this for almost 5 times now and it does that every single time. I can’t figure out the causing issue currently.

avinashaacharya1 · October 1, 2024, 10:49pm

I think i found the issue with my project. Whenever i launch the Gradio Application, the workbench loads the transformer models (my project has around 4 of those) which take quite a long time. And this prolly causes the gradio to fail and while i go for the second try, the models would get loaded and cached by the time.
What i did to overcome this was to simply just add the part of caching the models during the build process, utilizing the postBuild.bash.

edwli · October 2, 2024, 5:02pm

(10/02) Updated deep link landing page

rchola · October 16, 2024, 6:09pm

Hi,

It’s possible to download the models inside the project on some folder? Many times, download the model inside the chat started app get time-out and need to restart the app.

Thanks

Chola

edwli · October 21, 2024, 8:12pm

Hi, thanks for reaching out. Can you show what specific errors you are running into?

Typically once you download the model weights to run the local inference, for example, those weights stay cached to the project container. So you don’t need to download everything again the next time you run the app. Only when you delete the project and reclone will you need to download and pull everything fresh.

Rajiv.Bahl · October 28, 2024, 10:25pm

hybrid rag Build is stuck on step 29 with below error. checked FASTAPI version as mentioned but that is already set correctly. Pls suggest how to proceed

Error: building at STEP “RUN /bin/bash /opt/project/build/postBuild.bash”: while running runtime: exit status 2

edwli · November 4, 2024, 9:15pm

Hi Rajiv,

Can you send the full error output? Is there anything surrounding

Error: building at STEP “RUN /bin/bash /opt/project/build/postBuild.bash”: while running runtime: exit status 2

that can help us get more info as to why your postBuild fails? Thanks!

wwb2portals · November 23, 2024, 1:10am

Hi,

My project stuck on building in progress (1 out 19 steps compleate)
Outputs from logs:
[Service]:
…multiple times
{“level”:“warn”,“container-registry”:“ghcr.io”,“time”:“2024-11-23T02:05:33+01:00”,“message”:“BaseEnvironmentLatestTag is unknown for images not in NGC registry”}

[Build]:
#6 [ 1/19] FROM ghcr.io/huggingface/text-generation-inference:2.3.0sha256:xxxxxxxxx
#6 sha256:yyyyyyyyyyyyyyyyyyyy2.98GB / 3.73GB 509.6s
<…multiple times>
#6 sha256:yyyyyyyyyyyyyyyyyyyy3.18GB / 3.73GB 519.7s

after trying to stop build:

I have tried to reinstall workbench, same results.

twhitehouse · November 23, 2024, 2:47pm

Are you trying to build this on an ARM mac?

HF container doesn’t support that.

Can you spin up a remote and try it there?

You can see how that works here.

Topic		Replies	Views
[SUPPORT] Workbench Example Project: Agentic RAG NVIDIA AI Workbench	41	556	June 5, 2025
NIM embedding model downloads but fails with auth error on startup Access/Accounts nim , nv-embedqa-e5-v5	29	693	April 10, 2025
Batch processing using NVIDIA NIM \| Docker \| Self-hosted Models python , nim , llama3-8b-instruct , llama-31-8b-instruct , llama	11	288	January 29, 2025
RTX 4090 shows as "non-free GPU" when running NIM model in docker AI Foundation Models and Endpoints nim	8	2038	October 21, 2024
Blueprint RAG v2.0.0 NVIDIA Blueprints nim , llama-31-70b-instruct , llama , blueprints	1	55	April 24, 2025
VSS blueprint 2.2.0 - ERROR Failed to load VIA stream handler - Failed to generate TRT-LLM engine Visual AI Agent nim , llama-31-70b-instruct , llama	16	301	April 22, 2025
Quick build guide for visual studio code on the nano Jetson Nano	29	15813	October 14, 2021
Deepstream_pose_estimation repository and provided libnvds_osd.so DeepStream SDK	7	622	April 24, 2023
Cannot load built engine resnet50_market1501_aicity156 DeepStream SDK nvbugs	53	1773	February 14, 2025
Tao toolkit version5 is getting error when comes to training part TAO Toolkit	45	1750	August 22, 2023

[SUPPORT] Workbench Example Project: Hybrid RAG

Related topics