[SUPPORT] Workbench Example Project: Local RAG

edwli · January 9, 2024, 5:45pm

Hi! This is the support thread for the Local RAG Example Project on GitHub. Any major updates we push to the project will be announced here. Further, feel free to discuss, raise issues, and ask for assistance in this thread.

Please keep discussion in this thread project-related. Any issues with the Workbench application should be raised as a standalone thread. Thanks!

brian_d_scott · February 3, 2024, 4:27pm

I’ve noticed that the chat window will truncate the output randomly. Is there a setting to increase the output rows?

nima.nilchian · February 4, 2024, 9:54am

I’m trying to use this example with Nemotron Model but gives json.decode error from, get_llm(), HuggingFaceTextGenInference section, can anyone guide how can I use this with Nemotron models?

edwli · February 6, 2024, 6:04pm

Hey Brian, the defaults are set to generate roughly a paragraph response (eg. 4-5 sentences) but feel free to play around with the code and set the number of new tokens generated (or any hyperparameter) to an appropriate amount. For example, max_new_tokens is set to 100 by default in chains.py:

@lru_cache
def get_llm() -> LangChainLLM:
    """Create the LLM connection."""
    inference_server_url_local = "http://127.0.0.1:9090/"

    llm_local = HuggingFaceTextGenInference(
        inference_server_url=inference_server_url_local,
        max_new_tokens=100,
        top_k=10,
        top_p=0.95,
        typical_p=0.95,
        temperature=0.7,
        repetition_penalty=1.03,
        streaming=True
    )

    return LangChainLLM(llm=llm_local)

brian_d_scott · February 6, 2024, 7:52pm

Thanks - I’ll change that today and give it a test.

brian_d_scott · February 9, 2024, 3:16pm

I’m now getting an error where chat does not run after rebuilding the environment from scratch. I have followed all the steps I did prior when this worked. Here is the error message that appears on screen. I did notice that there is a new HuggingFace meta-llama/Llama-2-7b-hf that is suffixed with the word chat - has there been a change? I’ve also attached the error log.

chat error log.txt (5.5 MB)

brian_d_scott · February 11, 2024, 4:23pm

A clean rebuild of the project this morning now allows chat to run successfully. However, it doesn’t seem to be able to use the provided knowledge base. I will continue to test.

twhitehouse · February 12, 2024, 9:17pm

Hi Brian,

We’re investigating and triaging. We will get back to you shortly.

Tyler

brian_d_scott · February 12, 2024, 9:21pm

Thanks - appreciate the feedback.

yashpareek.workmail · February 21, 2024, 6:19pm

Hi there!
Is it possible to split LLAMA 7b-chat-hf model on two gpus. Currently I’m using 2x RTX 4090. In project text-generation-webui, you get to split model equally on two devices, landed on this project today itself.

greg87 · March 22, 2024, 4:29pm

When I try upload a document I get an error. It is a text document. But it just says error. Furthermore , before I try upload my chatbot works fine , after I try upload when I try submit any text to the chatbot, I get an error I need to restart the environment.

Any ideas why uploading text documents is causing it to go into an error state?

greg87 · March 22, 2024, 7:19pm

edwli · April 4, 2024, 6:58pm

Do you mind providing logs and screenshots of the issue? The vector database takes a while to spin up, so you may be uploading documents before the database is ready to receive them. But I wouldn’t know for sure without logs/screenshots.

We have tried to mitigate this by building in progress bars and warnings into the updated Hybrid RAG project here: GitHub - NVIDIA/workbench-example-hybrid-rag: An NVIDIA AI Workbench example project for Retrieval Augmented Generation (RAG)

edwli · April 4, 2024, 7:02pm

All,

This project has been refreshed to become the Hybrid RAG example project. This local RAG project will likely become discontinued.

Github Repo here

New DevZone Thread here

This refreshed project consists of but is not limited to:

Running RAG locally is a subset of this project, in addition to new features for Cloud and Microservice/NIM-based RAG.
The UI has been updated as well to allow for expanded user settings.
Updates and bug fixes

Topic		Replies	Views
[SUPPORT] Workbench Example Project: Hybrid RAG NVIDIA AI Workbench workbench-example-project	86	1713	January 15, 2025
[SUPPORT] Workbench Example Project: Llama 2 Finetune NVIDIA AI Workbench workbench-example-project	8	606	February 20, 2024
[SUPPORT] Workbench Example Project: NIM Anywhere NVIDIA AI Workbench workbench-example-project , nim	8	129	August 27, 2024
Morpheus and MRC are quite difficult to get examples running TensorRT	1	467	September 20, 2023
Supercharging LLM Applications on Windows PCs with NVIDIA RTX Systems Technical Blog	1	303	January 8, 2024
VIA Summarization Workflow ERROR Visual AI Agent llama	32	293	November 28, 2024
[SUPPORT] Workbench Example Project: SDXL Customization NVIDIA AI Workbench workbench-example-project	19	909	January 21, 2025
Chat with RTX setup issue AI Foundation Models and Endpoints	4	5721	February 19, 2024
[SUPPORT] Workbench Example Project: Llama 3 Finetune NVIDIA AI Workbench llama	5	83	January 21, 2025
Jetson Container `Nano_llm` version 24.6-r36.2.0 error on Jepack 6.0 DP Jetson Orin NX containers , generative_ai	5	213	July 4, 2024

[SUPPORT] Workbench Example Project: Local RAG

Related topics