Hi! This is the support thread for the Local RAG Example Project on GitHub. Any major updates we push to the project will be announced here. Further, feel free to discuss, raise issues, and ask for assistance in this thread.
Please keep discussion in this thread project-related. Any issues with the Workbench application should be raised as a standalone thread. Thanks!
I’m trying to use this example with Nemotron Model but gives json.decode error from, get_llm(), HuggingFaceTextGenInference section, can anyone guide how can I use this with Nemotron models?
Hey Brian, the defaults are set to generate roughly a paragraph response (eg. 4-5 sentences) but feel free to play around with the code and set the number of new tokens generated (or any hyperparameter) to an appropriate amount. For example, max_new_tokens is set to 100 by default in chains.py:
I’m now getting an error where chat does not run after rebuilding the environment from scratch. I have followed all the steps I did prior when this worked. Here is the error message that appears on screen. I did notice that there is a new HuggingFace meta-llama/Llama-2-7b-hf that is suffixed with the word chat - has there been a change? I’ve also attached the error log.
A clean rebuild of the project this morning now allows chat to run successfully. However, it doesn’t seem to be able to use the provided knowledge base. I will continue to test.
Hi there!
Is it possible to split LLAMA 7b-chat-hf model on two gpus. Currently I’m using 2x RTX 4090. In project text-generation-webui, you get to split model equally on two devices, landed on this project today itself.
When I try upload a document I get an error. It is a text document. But it just says error. Furthermore , before I try upload my chatbot works fine , after I try upload when I try submit any text to the chatbot, I get an error I need to restart the environment.
Any ideas why uploading text documents is causing it to go into an error state?
Do you mind providing logs and screenshots of the issue? The vector database takes a while to spin up, so you may be uploading documents before the database is ready to receive them. But I wouldn’t know for sure without logs/screenshots.