Workbench Example Project: Hybrid RAG - Stuck at setting up backend polling inference server

Malay_Kumar · September 25, 2024, 1:14pm

Hello NVIDIA team,

I am currently working on the Hybrid Retrieval-Augmented Generation (RAG) quickstart project using NVIDIA AI Workbench. I followed the steps outlined in the documentation, but I encountered an issue during the “Setup RAG Backend” step.

Error Details: At the step where the backend setup was polling the inference server, it got stuck with the following error:

Polling inference server. Awaiting status 200; trying again in 5s.
curl: /opt/conda/lib/libcurl.so.4: no version information available (required by curl)
Max attempts reached: 30. Server may have timed out. Stop the container and try again.

Steps to Reproduce:

Cloned the hybrid RAG project from the NVIDIA GitHub repo.
Configured the NVCF_RUN_KEY and attempted to set up the backend via the Gradio Chat App.
At the “Set Up RAG Backend” step, the build was triggered but failed with the above error.

System Details:

libcurl version: /opt/conda/lib/libcurl.so.4
NVIDIA AI Workbench installed on local machine
Using conda environment
Followed all prerequisite steps as per the quickstart guide

Troubleshooting Attempts:

Verified libcurl is installed and correctly linked.
Tried reinstalling libcurl within the conda environment.
Checked and ensured that /opt/conda/lib is in the LD_LIBRARY_PATH.
Attempted to manually link the system version of libcurl.

Unfortunately, none of these steps resolved the issue.

Could you please assist in diagnosing and resolving this issue? I am also happy to provide additional logs or details if needed.

Please tick the appropriate box to help us categorize your post
Bug or Error
Feature Request
Documentation Issue
Other

bfurtaw · September 30, 2024, 2:32pm

The problem here lies in the version libcurl.so.4 is linked too. libcurl.so.4.8 is not the version curl was originally linked too. Curl see’s this as a security breach. You must add libcurl.so.4.7 to the LD_LIBRARY_PATH

export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

edwli · September 30, 2024, 6:11pm

Hi, thanks for reaching out! Do you mind sharing what your runtime logs look like? You can access them at Output > Chat.

I’ll attach a picture of my own logs as reference.

If your logs are appearing to progress similarly to what you see here, it may be the fact that your system is starting up the inference server normally, albeit slower than expected. If this is the case, you can increase your MAX_ATTEMPTS by editing this line here.

If there is another error causing the message you see, that error should similarly be captured in the logs. If so, let us know what you see so we can help address accordingly. Hope this helps!

bipul.personal7 · October 2, 2024, 4:49pm

Hi,
I’m experiencing the same issue as Mr. @Malay_Kumar .
After reviewing the conversation and the solution provided by Mr. @bfurtaw , I followed these steps:

I navigated to AI Workbench > Environment > Variable > Add and added a new environmental variable with the following details:

Name: LD_LIBRARY_PATH
Value: /usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
After clicking “Add,” it prompted me to restart. However, even after waiting for more than half an hour, the issue still persists, and there are no changes reflected in the chat.
You can refer to the attached screenshot for more details.

kindly help.

bipul.personal7 · October 4, 2024, 3:42pm

Hello admin @bfurtaw @edwli

I would greatly appreciate any assistance or feedback on the issue I’ve raised. Your help would mean a lot.

Thank you.

Malay_Kumar · October 4, 2024, 4:02pm

What I did was execute this command directly in the bash terminal of the container and then I ran “setup rag backend”

export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

`

bipul.personal7 · October 7, 2024, 4:24pm

Did your ad hoc solution work? If it did, could you please share the details, the outcome, and the container pull address you used?

In my case, I didn’t need to take that approach, as I simply followed the instructions in the documentation for setting the environment variables in image .

You can easily see it listed under the environment variables. However, I still receive the same verbose output.

edwli · October 7, 2024, 5:19pm

Hi, from the logs you submitted here it looks like you are actually able to successfully set up the RAG backend.

INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

But it looks like attempting to spin up a NIM to run locally causes an error. I believe the issue @Malay_Kumar is referring to is that first step of setting up the RAG backend, so your issue may be unrelated.

For the local NIM, from the logs it looks like the container is trying to start

Attempting to run the NIM Container.

Followed by the container hash.

The attempt to reach the container appears to time out. Can you verify which GPU is on this system? According to the NIM for LLM docs here, in order to run on a non-optimized configuration, you need a GPU with >=24GB VRAM.

You can typically confirm any issues in a NIM container with

docker ps -a
docker logs [hash-of-nim-container]

Do the logs show any errors?

bipul.personal7 · October 8, 2024, 3:45pm

Thank you for your thoughtful response. There are no NIM containers created, as seen in the screenshot. Although the image has been pulled, there’s no sign of any container being created.

the documentation for the local-microservice does not mention prerequisites or specify which NIM container a user should use based on their requirements. well is it possible to optimize configurations for systems with limited VRAM?
I have in single-digit values. Currently, I have a GPU with 8GB of VRAM.
I would appreciate any suggestions for a NIM model that works with 8GB of VRAM, along with if it possible to optimization or apply memory-saving techniques to run these models efficiently in 8GB of VRAM.

edwli · October 8, 2024, 4:41pm

Unfortunately, on only 8GB of VRAM you are likely not be able to use a NIM running locally.

Here are a few options:

If you would like to inference locally, you can try the local inference Huggingface TGI option and select a smaller model, like phi-3 mini, and select 4-bit quantization to minimize VRAM footprint.
Continue to use the cloud endpoints
Set up a NIM on a system with a larger GPU that can handle a NIM container for inference, and use that NIM as a remote NIM endpoint.

system · October 22, 2024, 4:42pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error when setup Rag Chat backend NVIDIA AI Workbench	4	125	June 3, 2025
[SUPPORT] Workbench Example Project: Hybrid RAG NVIDIA AI Workbench workbench-example-project	111	3627	December 15, 2025
Curl: no version information available Error NVIDIA AI Workbench	2	86	July 25, 2025
Internal Server Error. Try Again NVIDIA AI Workbench	1	690	May 30, 2024
Hybrid RAG suddenly stops streaming NVIDIA AI Workbench llama	0	48	July 18, 2025
Building RAG Agents with LLMs Environment not loading NVIDIA AI Workbench	5	148	March 18, 2025
Issue Starting Chat on NVIDIA Workbench NVIDIA AI Workbench	5	141	January 10, 2025
Workbench-example-hybrid-rag Microservice error NVIDIA AI Workbench	1	68	December 2, 2024
Getting git error when trying to set up the RAG back end NVIDIA AI Workbench	1	198	June 27, 2024
Internal Server error ,Try again NVIDIA AI Workbench	5	703	April 11, 2024

Workbench Example Project: Hybrid RAG - Stuck at setting up backend polling inference server

Related topics