I am currently working on the Hybrid Retrieval-Augmented Generation (RAG) quickstart project using NVIDIA AI Workbench. I followed the steps outlined in the documentation, but I encountered an issue during the “Setup RAG Backend” step.
Error Details: At the step where the backend setup was polling the inference server, it got stuck with the following error:
Polling inference server. Awaiting status 200; trying again in 5s.
curl: /opt/conda/lib/libcurl.so.4: no version information available (required by curl)
Max attempts reached: 30. Server may have timed out. Stop the container and try again.
Steps to Reproduce:
Cloned the hybrid RAG project from the NVIDIA GitHub repo.
Configured the NVCF_RUN_KEY and attempted to set up the backend via the Gradio Chat App.
At the “Set Up RAG Backend” step, the build was triggered but failed with the above error.
System Details:
libcurl version: /opt/conda/lib/libcurl.so.4
NVIDIA AI Workbench installed on local machine
Using conda environment
Followed all prerequisite steps as per the quickstart guide
Troubleshooting Attempts:
Verified libcurl is installed and correctly linked.
Tried reinstalling libcurl within the conda environment.
Checked and ensured that /opt/conda/lib is in the LD_LIBRARY_PATH.
Attempted to manually link the system version of libcurl.
Unfortunately, none of these steps resolved the issue.
Could you please assist in diagnosing and resolving this issue? I am also happy to provide additional logs or details if needed.
Please tick the appropriate box to help us categorize your post
Bug or Error
Feature Request
Documentation Issue
Other
The problem here lies in the version libcurl.so.4 is linked too. libcurl.so.4.8 is not the version curl was originally linked too. Curl see’s this as a security breach. You must add libcurl.so.4.7 to the LD_LIBRARY_PATH
If your logs are appearing to progress similarly to what you see here, it may be the fact that your system is starting up the inference server normally, albeit slower than expected. If this is the case, you can increase your MAX_ATTEMPTS by editing this line here.
If there is another error causing the message you see, that error should similarly be captured in the logs. If so, let us know what you see so we can help address accordingly. Hope this helps!
Hi,
I’m experiencing the same issue as Mr. @Malay_Kumar .
After reviewing the conversation and the solution provided by Mr. @bfurtaw , I followed these steps:
I navigated to AI Workbench > Environment > Variable > Add and added a new environmental variable with the following details:
Name: LD_LIBRARY_PATH
Value: /usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
After clicking “Add,” it prompted me to restart. However, even after waiting for more than half an hour, the issue still persists, and there are no changes reflected in the chat.
You can refer to the attached screenshot for more details.
Did your ad hoc solution work? If it did, could you please share the details, the outcome, and the container pull address you used?
In my case, I didn’t need to take that approach, as I simply followed the instructions in the documentation for setting the environment variables in image .
You can easily see it listed under the environment variables. However, I still receive the same verbose output.
But it looks like attempting to spin up a NIM to run locally causes an error. I believe the issue @Malay_Kumar is referring to is that first step of setting up the RAG backend, so your issue may be unrelated.
For the local NIM, from the logs it looks like the container is trying to start
Attempting to run the NIM Container.
Followed by the container hash.
The attempt to reach the container appears to time out. Can you verify which GPU is on this system? According to the NIM for LLM docs here, in order to run on a non-optimized configuration, you need a GPU with >=24GB VRAM.
You can typically confirm any issues in a NIM container with
Thank you for your thoughtful response. There are no NIM containers created, as seen in the screenshot. Although the image has been pulled, there’s no sign of any container being created.
the documentation for the local-microservice does not mention prerequisites or specify which NIM container a user should use based on their requirements. well is it possible to optimize configurations for systems with limited VRAM?
I have in single-digit values. Currently, I have a GPU with 8GB of VRAM.
I would appreciate any suggestions for a NIM model that works with 8GB of VRAM, along with if it possible to optimization or apply memory-saving techniques to run these models efficiently in 8GB of VRAM.
Unfortunately, on only 8GB of VRAM you are likely not be able to use a NIM running locally.
Here are a few options:
If you would like to inference locally, you can try the local inference Huggingface TGI option and select a smaller model, like phi-3 mini, and select 4-bit quantization to minimize VRAM footprint.
Continue to use the cloud endpoints
Set up a NIM on a system with a larger GPU that can handle a NIM container for inference, and use that NIM as a remote NIM endpoint.