[SUPPORT] Workbench Example Project: Agentic RAG

Hi, thanks for reaching out.

A 401 error typically means a incorrect, expired, malformed or missing API Key (as opposed to a 403 error, which indicates a correct API key but with improper permissions).

Do you know if your keys have recently expired, or if you rotated them recently? You may need to regenerate your API key and try again. If you had regenerated your key for something else, all previous keys typically become invalidated.

Thanks for the follow up I can check that out. Does each model require a separate key? I tried using a key I had generated the basic Hybrid RAG demo which is working fine.

But googling and seems this might be an NVCF RUN Key when an NGC API Key is needed?

Yeah the naming is an artifact that needs to be updated. Should be the same as the key often referred to as the NVIDIA_API_KEY or NGC Personal Key.

The key should be single and universal, but only the most recent generated key will work (provided it has not expired).

You can generate a personal key on NGC, set the scope for the key (to be safe, I usually enable all services), and set the expiration date.

Does it expect a certain Key name?

The personal keys I am generating start with __autogenerated_playgrounds with a set of numbers added. I can only save the secret in AI workbench if I enter it that way

The log file from when I start the ChatUI app is lengthy but ends with the 403 error below

File “/usr/lib/python3.10/urllib/request.py”, line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

I have Secrets Manager and NGC enabled as services. I am not able to add Public API Endpoints as a service

I have gone back and started from scratch. recreate the project. Did all of the configuration and still get the same result

For clarification I am using an NVIDIA API key (not personal) and a Tavily API key I generated for this project.

The Screenshot is what happens in the chat window when uploading a PDF document and when that happens is when I get the 403 error

The service log is also full of these errors about the chroma db

{“level”:“warn”,“projectPath”:“/home/workbench/nvidia-workbench/edtwarner-workbench-example-agentic-rag”,“file”:“data/chroma.sqlite3”,“time”:“2025-02-05T14:10:17-05:00”,“message”:“cannot return change content for binary files”}

not sure if that means anything but the PDF file does get uploaded as you can see in the screenshot and you can click on it in the UI and it opens.

It goes to 50% immediately then fails later so is the error trying to get the document into the vector database?

I have continued to work on resolving this without much luck.

It seems like there is something missing in the requests.py file that prevents the successful retrieval of the URLs.

Researching the 403 errors this seems to be the case that many sites reject requests from api based applications and the solution is to add headers to make the web site believe it is just a browser based request.

I can get two different errors from the UI

if I request an http:// site I get a 401 error
if I request an https://site I get a 403 error

I wanted to try and edit that requests.py file but the demo seems to have file permissions locked down.

Any suggestions appreciated and if this is beyond the scope of this support thread please just let me know and I will only reply if I am able to resolve.

Hi Edward,

Apologies for the delay. Just did a clean clone of the project and unfortunately I’m unable to reproduce your issue.

A 403 error typically means your key is recognized but does not have access to the resource (Build API Endpoints). Some troubleshooting tips:

  • Make sure you are using a key that starts with nvapi-....
  • If using a personal key generated from NGC, make sure you had selected API Endpoints under the scope of the key.
  • Make sure your key is current and not expired/rotated.

If you are still seeing an error, please provide the error stacktrace from the Chat app logs (on the bottom left corner go to Outputs > Chat from the dropdown)

1 Like

I was having a similar issue uploading PDFs, using a personal NVIDIA API key and setting to Public API Endpoints seemed to do the trick (after a clean clone). I also, for safety, named mine to NVIDIA_API_KEY, to match the AI Workbench Environment.

Follow-up: How do you store the uploaded documents (PDFs) so that when you open AI Workbench the next time they are still there (and you do not need to upload them again)?

I uploaded a PDF and modified the Router prompt to best fit the information in the RAG now, but when I asked a question whose answer I knew was in one PDF, it is clear that the chat pulled the answer from the web. How do I broaden the Router prompt so that it first looks for keywords from the question in the vectorstore rather than only if specific topic areas are mentioned first? (Example: My Chat is focused on Dementia Caregiving, when a question is asked about how a specific technique works, without specifically mentioning dementia or Alzheimer’s or caregiving, how do I still get it to try to pull from the vectorstore primarily/first?)

Also: How do you store the uploaded documents (PDFs) so that when you open AI Workbench the next time they are still there (and you do not need to upload them again)?

Hi Evan,

Thanks for reaching out.

How do I broaden the Router prompt so that it first looks for keywords from the question in the vectorstore rather than only if specific topic areas are mentioned first?

The Router prompt is fully customizable; we just provide a “search RAG if these topics are mentioned” clause as an example. You can customize it however best works for your use case, eg. something like “search RAG if the following keywords are present”, etc.

In your example, the Router LLM is making an evaluation between what is in the prompt and what the user query is, so your query and prompt may not be matching up to route to RAG if you don’t specifically mention certain domain-specific keywords about what it is you’re looking for.

Once you are happy with your custom prompt, feel free to solidify it in code/chatui/prompts so that it becomes the default prompt every time you open the app.

How do you store the uploaded documents (PDFs) so that when you open AI Workbench the next time they are still there (and you do not need to upload them again)?

Uploaded documents should be persistent (check the data directory). Whenever you re-open the app, the previously uploaded documents may not show up on a fresh UI since it is rendered per-session, but any previously uploaded documents are still in the vector store until you physically delete them from the data directory.

You can always empty the database by clearing out the data directory.

Thanks. I have been extremely busy too. I do believe I had the wrong type of certificate error.

I am very close to having it working.

I am doing a simple request about world series winners

I edited my router prompt to say

You are an expert at routing a user question to a vectorstore or web search. Use the vectorstore for questions about world series winners

and I have uploaded a PDF called world series winners

When I ask who won the world series in 2018 I get this in the monitor tab

who won the world series in 2018
{‘datasource’: ‘vectorstore’}
—ROUTE QUESTION TO RAG—
—RETRIEVE—
—CHECK DOCUMENT RELEVANCE TO QUESTION—
—ASSESS GRADED DOCUMENTS—
—DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH—
—WEB SEARCH—
—GENERATE—
—CHECK HALLUCINATIONS—
—DECISION: GENERATION IS GROUNDED IN DOCUMENTS—
—GRADE GENERATION vs QUESTION—

I would think it would use the vector store given I have a pdf called world series winners but is doesn’t seem to use it

Any advice?

It does look like PDF upload may be failing. I saw something in the thread about nltk error? where would I find that in the app code?

Lastly when it does use the web search I get the following error in response to the question I ask that gets routed to the web search

*** ERR: Unable to process query. Check the Monitor tab for details. ***

Exception: [429] Too Many Requests
{‘status’: 429, ‘title’: ‘Too Many Requests’}

Exception: [429] Too Many Requests
{‘status’: 429, ‘title’: ‘Too Many Requests’}

The NVIDIA API Catalog recently moved from a limited-credit system to an unlimited-credit system. However, there are rate-limiting controls implemented.

Are you using the Llama or the Mixtral model? I would recommend the Llama since that model appears to be rate limited to 7 calls/sec while the Mixtral one is limited to 1 call/sec.

When I tested it myself on the public endpoints, the Mixtral was throwing the rate-limiting error, while the Llama model worked fine.

Just something to keep in mind as you are navigating the model endpoints to hit moving forward, apologies for the inconvenience.

Thanks for that information. I was using mistral.

Any idea on how to solve PDF uploads failing?

5/14/2025

Project overhaul – QoL improvements:

  • Added a new quickstart
  • Additional docs
  • Sample queries added
  • New sample dataset
  • NVIDIA-internal endpoint support
  • General bug fixes

We just pushed an overhaul of the project, the pdf and webpage uploads appear working on our side.

To create a public link, set share=True in launch().
[split_documents] Splitting 1 docs with chunk size 250, overlap 0
[embed_documents] Embedding 9 chunks using model: NV-Embed-QA
—ROUTE QUESTION—
How do I install NVIDIA AI Workbench?
{‘datasource’: ‘vectorstore’}
—ROUTE QUESTION TO RAG—
—RETRIEVE—
—CHECK DOCUMENT RELEVANCE TO QUESTION—
—GRADE: DOCUMENT RELEVANT—
—GRADE: DOCUMENT RELEVANT—
—GRADE: DOCUMENT RELEVANT—
—GRADE: DOCUMENT RELEVANT—
—ASSESS GRADED DOCUMENTS—
—DECISION: GENERATE—
—GENERATE—
—CHECK HALLUCINATIONS—
—DECISION: GENERATION IS GROUNDED IN DOCUMENTS—
—GRADE GENERATION vs QUESTION—
—DECISION: GENERATION ADDRESSES QUESTION—
[clear] Collection ‘rag-chroma’ cleared.
[clear] Removed directory: 34270bfe-682e-4bbe-9a18-2b140767c7c4
[clear] Removed directory: readme-images
[clear] Removed directory: 0324ea8b-1c7b-48db-bac0-2ba4a56ebd78

Do you mind testing this version of the project out?

Let me know if you’re still facing issues.

Not at all. I Will create a fork of the new version and give it a go

Thanks for the quick reply

With the new demo code I got it up and running in less than 30 minutes using the llama3 model. Thanks for all the help.

Will begin to try some more complex queries and data sources.

Thanks again

Ed Warner