[SUPPORT] Workbench Example Project: Agentic RAG

edwli · February 4, 2025, 8:05pm

Hi, thanks for reaching out.

A 401 error typically means a incorrect, expired, malformed or missing API Key (as opposed to a 403 error, which indicates a correct API key but with improper permissions).

Do you know if your keys have recently expired, or if you rotated them recently? You may need to regenerate your API key and try again. If you had regenerated your key for something else, all previous keys typically become invalidated.

edward.warner · February 4, 2025, 8:17pm

Thanks for the follow up I can check that out. Does each model require a separate key? I tried using a key I had generated the basic Hybrid RAG demo which is working fine.

But googling and seems this might be an NVCF RUN Key when an NGC API Key is needed?

edwli · February 4, 2025, 8:24pm

Yeah the naming is an artifact that needs to be updated. Should be the same as the key often referred to as the NVIDIA_API_KEY or NGC Personal Key.

The key should be single and universal, but only the most recent generated key will work (provided it has not expired).

You can generate a personal key on NGC, set the scope for the key (to be safe, I usually enable all services), and set the expiration date.

edward.warner · February 4, 2025, 8:56pm

Does it expect a certain Key name?

The personal keys I am generating start with __autogenerated_playgrounds with a set of numbers added. I can only save the secret in AI workbench if I enter it that way

The log file from when I start the ChatUI app is lengthy but ends with the 403 error below

File “/usr/lib/python3.10/urllib/request.py”, line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

I have Secrets Manager and NGC enabled as services. I am not able to add Public API Endpoints as a service

edward.warner · February 5, 2025, 7:06pm

I have gone back and started from scratch. recreate the project. Did all of the configuration and still get the same result

For clarification I am using an NVIDIA API key (not personal) and a Tavily API key I generated for this project.

The Screenshot is what happens in the chat window when uploading a PDF document and when that happens is when I get the 403 error

edward.warner · February 5, 2025, 7:20pm

The service log is also full of these errors about the chroma db

{“level”:“warn”,“projectPath”:“/home/workbench/nvidia-workbench/edtwarner-workbench-example-agentic-rag”,“file”:“data/chroma.sqlite3”,“time”:“2025-02-05T14:10:17-05:00”,“message”:“cannot return change content for binary files”}

not sure if that means anything but the PDF file does get uploaded as you can see in the screenshot and you can click on it in the UI and it opens.

It goes to 50% immediately then fails later so is the error trying to get the document into the vector database?

edward.warner · February 10, 2025, 7:33pm

I have continued to work on resolving this without much luck.

It seems like there is something missing in the requests.py file that prevents the successful retrieval of the URLs.

Researching the 403 errors this seems to be the case that many sites reject requests from api based applications and the solution is to add headers to make the web site believe it is just a browser based request.

I can get two different errors from the UI

if I request an http:// site I get a 401 error
if I request an https://site I get a 403 error

I wanted to try and edit that requests.py file but the demo seems to have file permissions locked down.

Any suggestions appreciated and if this is beyond the scope of this support thread please just let me know and I will only reply if I am able to resolve.

edwli · March 14, 2025, 6:28pm

Hi Edward,

Apologies for the delay. Just did a clean clone of the project and unfortunately I’m unable to reproduce your issue.

A 403 error typically means your key is recognized but does not have access to the resource (Build API Endpoints). Some troubleshooting tips:

Make sure you are using a key that starts with nvapi-....
If using a personal key generated from NGC, make sure you had selected API Endpoints under the scope of the key.
Make sure your key is current and not expired/rotated.

If you are still seeing an error, please provide the error stacktrace from the Chat app logs (on the bottom left corner go to Outputs > Chat from the dropdown)

zeitgeistian · March 27, 2025, 3:04pm

I was having a similar issue uploading PDFs, using a personal NVIDIA API key and setting to Public API Endpoints seemed to do the trick (after a clean clone). I also, for safety, named mine to NVIDIA_API_KEY, to match the AI Workbench Environment.

Follow-up: How do you store the uploaded documents (PDFs) so that when you open AI Workbench the next time they are still there (and you do not need to upload them again)?

zeitgeistian · April 7, 2025, 2:06pm

I uploaded a PDF and modified the Router prompt to best fit the information in the RAG now, but when I asked a question whose answer I knew was in one PDF, it is clear that the chat pulled the answer from the web. How do I broaden the Router prompt so that it first looks for keywords from the question in the vectorstore rather than only if specific topic areas are mentioned first? (Example: My Chat is focused on Dementia Caregiving, when a question is asked about how a specific technique works, without specifically mentioning dementia or Alzheimer’s or caregiving, how do I still get it to try to pull from the vectorstore primarily/first?)

Also: How do you store the uploaded documents (PDFs) so that when you open AI Workbench the next time they are still there (and you do not need to upload them again)?

edwli · April 7, 2025, 9:10pm

Hi Evan,

Thanks for reaching out.

How do I broaden the Router prompt so that it first looks for keywords from the question in the vectorstore rather than only if specific topic areas are mentioned first?

The Router prompt is fully customizable; we just provide a “search RAG if these topics are mentioned” clause as an example. You can customize it however best works for your use case, eg. something like “search RAG if the following keywords are present”, etc.

In your example, the Router LLM is making an evaluation between what is in the prompt and what the user query is, so your query and prompt may not be matching up to route to RAG if you don’t specifically mention certain domain-specific keywords about what it is you’re looking for.

Once you are happy with your custom prompt, feel free to solidify it in code/chatui/prompts so that it becomes the default prompt every time you open the app.

How do you store the uploaded documents (PDFs) so that when you open AI Workbench the next time they are still there (and you do not need to upload them again)?

Uploaded documents should be persistent (check the data directory). Whenever you re-open the app, the previously uploaded documents may not show up on a fresh UI since it is rendered per-session, but any previously uploaded documents are still in the vector store until you physically delete them from the data directory.

You can always empty the database by clearing out the data directory.

edward.warner · May 13, 2025, 7:39pm

Thanks. I have been extremely busy too. I do believe I had the wrong type of certificate error.

I am very close to having it working.

I am doing a simple request about world series winners

I edited my router prompt to say

You are an expert at routing a user question to a vectorstore or web search. Use the vectorstore for questions about world series winners

and I have uploaded a PDF called world series winners

When I ask who won the world series in 2018 I get this in the monitor tab

who won the world series in 2018
{‘datasource’: ‘vectorstore’}
—ROUTE QUESTION TO RAG—
—RETRIEVE—
—CHECK DOCUMENT RELEVANCE TO QUESTION—
—ASSESS GRADED DOCUMENTS—
—DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH—
—WEB SEARCH—
—GENERATE—
—CHECK HALLUCINATIONS—
—DECISION: GENERATION IS GROUNDED IN DOCUMENTS—
—GRADE GENERATION vs QUESTION—

I would think it would use the vector store given I have a pdf called world series winners but is doesn’t seem to use it

Any advice?

edward.warner · May 13, 2025, 7:46pm

It does look like PDF upload may be failing. I saw something in the thread about nltk error? where would I find that in the app code?

edward.warner · May 14, 2025, 2:47pm

Lastly when it does use the web search I get the following error in response to the question I ask that gets routed to the web search

*** ERR: Unable to process query. Check the Monitor tab for details. ***

Exception: [429] Too Many Requests
{‘status’: 429, ‘title’: ‘Too Many Requests’}

edwli · May 14, 2025, 7:39pm

Exception: [429] Too Many Requests
{‘status’: 429, ‘title’: ‘Too Many Requests’}

The NVIDIA API Catalog recently moved from a limited-credit system to an unlimited-credit system. However, there are rate-limiting controls implemented.

Are you using the Llama or the Mixtral model? I would recommend the Llama since that model appears to be rate limited to 7 calls/sec while the Mixtral one is limited to 1 call/sec.

When I tested it myself on the public endpoints, the Mixtral was throwing the rate-limiting error, while the Llama model worked fine.

Just something to keep in mind as you are navigating the model endpoints to hit moving forward, apologies for the inconvenience.

edward.warner · May 14, 2025, 7:42pm

Thanks for that information. I was using mistral.

Any idea on how to solve PDF uploads failing?

edwli · May 14, 2025, 7:43pm

5/14/2025

Project overhaul – QoL improvements:

Added a new quickstart
Additional docs
Sample queries added
New sample dataset
NVIDIA-internal endpoint support
General bug fixes

edwli · May 14, 2025, 7:44pm

We just pushed an overhaul of the project, the pdf and webpage uploads appear working on our side.

To create a public link, set share=True in launch().
[split_documents] Splitting 1 docs with chunk size 250, overlap 0
[embed_documents] Embedding 9 chunks using model: NV-Embed-QA
—ROUTE QUESTION—
How do I install NVIDIA AI Workbench?
{‘datasource’: ‘vectorstore’}
—ROUTE QUESTION TO RAG—
—RETRIEVE—
—CHECK DOCUMENT RELEVANCE TO QUESTION—
—GRADE: DOCUMENT RELEVANT—
—GRADE: DOCUMENT RELEVANT—
—GRADE: DOCUMENT RELEVANT—
—GRADE: DOCUMENT RELEVANT—
—ASSESS GRADED DOCUMENTS—
—DECISION: GENERATE—
—GENERATE—
—CHECK HALLUCINATIONS—
—DECISION: GENERATION IS GROUNDED IN DOCUMENTS—
—GRADE GENERATION vs QUESTION—
—DECISION: GENERATION ADDRESSES QUESTION—
[clear] Collection ‘rag-chroma’ cleared.
[clear] Removed directory: 34270bfe-682e-4bbe-9a18-2b140767c7c4
[clear] Removed directory: readme-images
[clear] Removed directory: 0324ea8b-1c7b-48db-bac0-2ba4a56ebd78

Do you mind testing this version of the project out?

Let me know if you’re still facing issues.

edward.warner · May 14, 2025, 7:47pm

Not at all. I Will create a fork of the new version and give it a go

Thanks for the quick reply

edward.warner · May 14, 2025, 8:22pm

With the new demo code I got it up and running in less than 30 minutes using the llama3 model. Thanks for all the help.

Will begin to try some more complex queries and data sources.

Thanks again

Ed Warner

Topic		Replies	Views
[SUPPORT] Workbench Example Project: Hybrid RAG NVIDIA AI Workbench workbench-example-project	110	4153	December 15, 2025
Playbook does not work: RAG application in AI Workbench DGX Spark / GB10 agentic-ai	8	457	November 7, 2025
Access RAG project endpoint via Python code NVIDIA AI Workbench python , agentic-ai	1	185	April 4, 2025
Workbench-example-hybrid-rag Microservice error NVIDIA AI Workbench	1	87	December 2, 2024
Workbench-example-hybrid-rag Local System error NVIDIA AI Workbench	5	157	December 2, 2024
Agentic RAG: Chat UI question DGX Spark / GB10 Projects agentic-ai	1	256	February 10, 2026
[SUPPORT] Workbench Example Project: Local RAG NVIDIA AI Workbench workbench-example-project	13	1467	April 4, 2024
RAG app chat not working -> unexpected error DGX Spark / GB10	1	67	January 8, 2026
Unable to log in my local location in the workbench NVIDIA AI Workbench	15	490	September 10, 2024
Chat with RTX .md support NVIDIA Nemotron	1	603	August 9, 2024

[SUPPORT] Workbench Example Project: Agentic RAG

Related topics