Thank you for your response.
Again, there is issue with Running Models on NVIDIA GPU - RuntimeError: No HIP GPUs Available
I’ve been able to successfully download both Ungated nvidia/Llama3-ChatQA-1.5-8B and Gated mistralai/Mistral-7B-Instruct-v0.2 models, but whenever I attempt to start the server, I encounter the following error:
RuntimeError: No HIP GPUs are available rank=0
xxxx-xx-xxxxx:45:29.577451Z ERROR text_generation_launcher: Shard 0 failed to start
xxxx-xx-xxxxx:45:29.577477Z INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart
I’m using an NVIDIA GEFORCE RTX 4060 and it seems the server is attempting to initialize HIP (which is typically for AMD GPUs) rather than CUDA(please correct me if i am wrong). Could you suggest what steps I can take to resolve this and which models I should download that will work best with my NVIDIA GPU?
Hi, thanks for reaching out. The screenshot and the output logs both tell me the model weights are still being pulled down. These are very large–perhaps you are facing a slow network connection?
I also see a Detected system rocm in the output logs–can you verify you are running with NVIDIA GPUs and not AMD? Thanks!
And yes local RAG was a completely separate project that consists of functionality that is a subset of this project and has since been replaced by this one.
Hi, yes following up from my previous comment above, it seems like from the output logs of the Hugging Face TGI server, it is detecting AMD hardware.
Detected system rocm
Typically with NVIDIA GPUs, you would see Detected system cuda instead. If running CPU only, it would say Detected system cpu. Are you absolutely sure you have no other GPU hardware on this system? Can you run nvidia-smi to see the GPUs, or open up the settings and see the dedicated GPU(s) on the system information?
Yes, I’m using NVIDIA GPUs. You can see in the screenshot nvidia-smi output.
when I try to download a model from Hybrid RAG: Chat UI it says ‘Detected system rocm’ .
Also, there’s a warning message that says ‘Could not import SGMV kernel from Punica, falling back to loop.’
Just to note, my internet connection is pretty good I have a 40Mbps speed currently.
Hello admin,
I’m still awaiting a response, as I need to proceed with my project using the local GPU. Any guidance would be greatly appreciated. I suspect the issue might stem from the code we pulled, possibly due to misconfiguration or fallback logic inside it, though I’m not entirely sure. I would be grateful if you could kindly clarify or provide any insights.
Yeah, we haven’t really faced this issue before because this is not expected behavior. Hugging Face should detect cuda on your system if you are running with a genuine NVIDIA GPU. I could see it detecting cpu if something was wrong with detecting the GPU, but detecting rocm especially when you don’t have any AMD GPUs on the system feels like a bug to me.
Unfortunately, we are downstream from Hugging Face and do not deal with Hugging Face’s TGI container or repo directly. I suggest filing a bug with their repo here and having their engineers take a look at the issue.
After reinstalling the OS, AI Workbench, and the Hybrid-RAG project, CUDA is now detected. I’m unsure why it wasn’t recognized first time, but everything is working fine now. The NVIDIA GPU is genuine. Thank you for your generous support
Add “fastapi==0.112.2” to the end of this line in the file (I’ve already made this change in the upstream repo)
Clear cache and rebuild
Start the chat app
This fixed my issues back when i came across this solution but currently the same error starts to show up since yesterday and i can’t figure out why. Since the last time it worked i made very few changes to the code, and i made sure not touch the sensitive parts. It would be great if anyone can help me with this.
Error :
{"level":"error","error":"(Container project-arishtha) bash -lc \"PROXY_PREFIX=\\\"/projects/Arishtha/applications/Gradio\\\"; PROXY_PREFIX=\\\"/projects/Arishtha/applications/Gradio\\\" cd /project/code && PROXY_PREFIX=$PROXY_PREFIX python3 client.py\": signal: killed","project_name":"Arishtha","application_name":"Gradio","time":"2024-10-01T16:50:30+05:30","message":"failed to execute launch command."}
It doesn’t opens the web browser even after checking the box for Auto Launch, and on the port 8080 in the browser, it does not display anything at all.
Hi, so it turns out this issue is reproducible and I think I’ve pinned down the problem according to the issue here.
We’ve previously just pulled the latest tag in this project, but it appears to be improperly linked on the huggingface side. We’ve since pinned the base container to a specified version (2.3.0) in the upstream repo until this gets resolved.
Here is a breakdown of one Gradio example I have up on a project as a reference. Just as a side note:
the $PROXY_PREFIX environment variable is configured automatically by AI Workbench.
You do not see anything on http://localhost:8080 because AI Workbench’s reverse proxy handles the routing to port 10000.
From your description, it appears the start command runs, gets timed out somehow, but then succeeds at a later time when the UI naturally refreshes since the health check turns green in your screenshot. Is this accurate? And if you click the link next to the list of running apps, does it open anything in the browser?
When launching the Gradio App, the first try gives an error like the one i just posted above. But for the second try it launches successfully. I have tried this for almost 5 times now and it does that every single time. I can’t figure out the causing issue currently.
I think i found the issue with my project. Whenever i launch the Gradio Application, the workbench loads the transformer models (my project has around 4 of those) which take quite a long time. And this prolly causes the gradio to fail and while i go for the second try, the models would get loaded and cached by the time.
What i did to overcome this was to simply just add the part of caching the models during the build process, utilizing the postBuild.bash.