I am working on the NVIDIA Hybrid-RAG project and have configured the NVCF_RUN_KEY using the model mistral-7b-instruct-v0.2. However, when I upload a PDF and ask questions, I encounter an ASGI error. I have also tried with other models, but the same issue persists. Below are the details of the problem:
Configuration: Generated the NVCF_RUN_KEY as per the instructions.
NVIDIA Account: I have active credits on my NVIDIA account.
Error Message:
*** ERR: Unable to process query. ***
Message: Response ended prematurely
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi
...
Screenshots of the error and configuration are attached for additional context.
I would appreciate any guidance on resolving this issue. Is there a specific configuration or troubleshooting step I might have missed?
Thank you in advance for your help!
Bug or Error
Feature Request
Documentation Issue
Other
Hi, thanks for reaching out. In that final screenshot, can you scroll down in the logs until you see the error message? That will help us pinpoint the source of the issue. Thanks!
I am getting the same error. Lines in the above match the log I have. Last lines show error (OPENAI error I believe)
File “/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/openai/_utils/_proxy.py”, line 55, in get_proxied
return self.load()
File “/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/openai/_module_client.py”, line 12, in load
return _load_client().chat
File “/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/openai/init.py”, line 327, in _load_client
_client = _ModuleClient(
File “/home/workbench/.conda/envs/api-env/lib/python3.10/site-packages/openai/_client.py”, line 105, in init
raise OpenAIError(
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
Thanks for the follow up. I was trying this using a cloud endpoint. Do you recommend using local instance? I do have a development desktop with an RTX4060 GPU. Is running locally do I also need an inference server?
I was able to get this running. In the logs i noticed that I needed to have a hugging face token with write capabilities. Correcting that I was able to load the model and start the inference server and it is now working great. Thanks for the assistance