VIA Summarization Workflow ERROR

While running the VIA Summarization model I am getting the below mentioned error:
ERROR We couldn’t connect to ’
https://huggingface.co
to load this file, couldn’t find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json.
ERROR Failed to load VIA pipeline - CA-RAG setup failed. Check if NVIDIA_API_KEY set correctly and/or LLM configuration in CA-RAG config is valid.

Please note:
I have downloaded the VITA 2.0 Model from web and kept the files inside folder: VIA/nvidia_tao_vita_2.0.1_vila-llama-3-8b-lita/ . and given the export NGC_MODEL_CACHE as the same.

If I am downloading the Hugging face model offline. Where i shall keep these files?

You can refer to our Guide Using Locally Deployed LLM NIM instead of NVIDIA Hosted LLM NIM to deploy that locally.

My Model is not able to find Hugging face config file. I have downloaded the Hugging face model offline, Where Can I place these files?

By default, the model will be put in the following path that you configured.

export NGC_MODEL_CACHE=</SOME/DIR/ON/HOST>

We have manually downloaded (As our network is restricting them from auto download) both the VITA 2.0 Model and the Hugging Face model and have kept the Hugging face model inside /home/VIA/all-MiniLM-L6-v2 path and my model is inside /home/VIA/nvidia_tao_vita_2.0.1_vila-llama-3-8b-lita/ and my NGC_MODEL_CACHE is /home/VIA/.

What changes I shall do as it is giving below error:

2024-10-21 12:52:51,150 ERROR We couldn’t connect to ‘https://huggingface.co’ to load this file, couldn’t find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json.
2024-10-21 12:52:51,150 ERROR Failed to load VIA pipeline - CA-RAG setup failed. Check if NVIDIA_API_KEY set correctly and/or LLM configuration in CA-RAG config is valid.
Killed process with PID 56

1 Like

Could you describe your operation step by step from the beginning?

I tried running the VIA Summarization Warehouse Use case using VITA 2.0 model as VLM but our network was not allowing us to download the model, so we downloaded the model and kept it inside folder: /home/VIA/nvidia_tao_vita_2.0.1_vila-llama-3-8b-lita/.
And then we tried running again, it came out with the below error:
2024-10-21 12:52:51,150 ERROR We couldn’t connect to ‘https://huggingface.co’ to load this file, couldn’t find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json.
2024-10-21 12:52:51,150 ERROR Failed to load VIA pipeline - CA-RAG setup failed. Check if NVIDIA_API_KEY set correctly and/or LLM configuration in CA-RAG config is valid.
Killed process with PID 56

From above error we feel that again our network has again blocked hugging face model, So we downloaded the model and kept it inside /home/VIA/all-MiniLM-L6-v2 but again it is giving same error.

my NGC_MODEL_CACHE is /home/VIA

Hi @rawnak.kumar,

When VIA launches, it expect to find the hugging face model at the following path in the container:

/tmp/huggingface/hub/models--sentence-transformers--all-MiniLM-L6-v2

If you already have the model downloaded, then you can do the following:

  1. Find the location of the downloaded model and make a via-hf-cache folder to place it in. For example:
/home/ubuntu/via-hf-cache/hub/models--sentence-transformers--all-MiniLM-L6-v2
  1. In the docker run command, mount this path to /tmp/huggingface by adding an additional volume
-v /home/ubuntu/via-hf-cache:/tmp/huggingface

When VIA is launched, it should now check the mounted folder for the model and skip the download.

We tried with above folder structure but still getting the same error:
2024-10-23 09:55:18,771 INFO Stopping VIA pipeline
2024-10-23 09:55:18,771 ERROR We couldn’t connect to ‘https://huggingface.co’ to load this file, couldn’t find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode’.
2024-10-23 09:55:18,771 ERROR Failed to load VIA pipeline - CA-RAG setup failed. Check if NVIDIA_API_KEY set correctly and/or LLM configuration in CA-RAG config is valid.
Killed process with PID 56

Do we need to download any other models or can you share which Hugging face model it is actually searching for?

Hi @rawnak.kumar,

It is trying to pull the following model:

sentence-transformers/all-MiniLM-L6-v2 · Hugging Face

Here is the tree output of the via-hf-cache folder. Your via-hf-cache folder that gets mounted will need to look the same as this.

.
└── hub
    ├── models--sentence-transformers--all-MiniLM-L6-v2
    │   ├── blobs
    │   │   ├── 53aa51172d142c89d9012cce15ae4d6cc0ca6895895114379cacb4fab128d9db
    │   │   ├── 59d594003bf59880a884c574bf88ef7555bb0202
    │   │   ├── 72b987fd805cfa2b58c4c8c952b274a11bfd5a00
    │   │   ├── 8cfec92309f5626a223304af2423e332f6d31887
    │   │   ├── 952a9b81c0bfd99800fabf352f69c7ccd46c5e43
    │   │   ├── c79f2b6a0cea6f4b564fed1938984bace9d30ff0
    │   │   ├── cb202bfe2e3c98645018a6d12f182a434c9d3e02
    │   │   ├── d1514c3162bbe87b343f565fadc62e6c06f04f03
    │   │   ├── e7b0375001f109a6b8873d756ad4f7bbb15fbaa5
    │   │   ├── fb140275c155a9c7c5a3b3e0e77a9e839594a938
    │   │   └── fd1b291129c607e5d49799f87cb219b27f98acdf
    │   ├── refs
    │   │   └── main
    │   └── snapshots
    │       └── ea78891063587eb050ed4166b20062eaf978037c
    │           ├── 1_Pooling
    │           │   └── config.json -> ../../../blobs/d1514c3162bbe87b343f565fadc62e6c06f04f03
    │           ├── config.json -> ../../blobs/72b987fd805cfa2b58c4c8c952b274a11bfd5a00
    │           ├── config_sentence_transformers.json -> ../../blobs/fd1b291129c607e5d49799f87cb219b27f98acdf
    │           ├── model.safetensors -> ../../blobs/53aa51172d142c89d9012cce15ae4d6cc0ca6895895114379cacb4fab128d9db
    │           ├── modules.json -> ../../blobs/952a9b81c0bfd99800fabf352f69c7ccd46c5e43
    │           ├── README.md -> ../../blobs/8cfec92309f5626a223304af2423e332f6d31887
    │           ├── sentence_bert_config.json -> ../../blobs/59d594003bf59880a884c574bf88ef7555bb0202
    │           ├── special_tokens_map.json -> ../../blobs/e7b0375001f109a6b8873d756ad4f7bbb15fbaa5
    │           ├── tokenizer_config.json -> ../../blobs/c79f2b6a0cea6f4b564fed1938984bace9d30ff0
    │           ├── tokenizer.json -> ../../blobs/cb202bfe2e3c98645018a6d12f182a434c9d3e02
    │           └── vocab.txt -> ../../blobs/fb140275c155a9c7c5a3b3e0e77a9e839594a938
    └── version.txt

7 directories, 24 files

Could you share how you have your via-hf-cache folder structured?

Our file structure looks something like this that we downloaded from Hugging Face and we have kept these files inside : /home/VIA/via-hf-cache/hub/models–sentence-transformers–all-MiniLM-L6-v2

Do we need any other files?

Hi @rawnak.kumar,

Can you run the tree command on your via-hf-cache folder and share the output? It needs to look the same as what I pasted in the previous comment.

We were able to solve the Huggingface error problem. Now we are getting below error, We tried giving --privileged=true in docker run command as well:

2024-10-25 09:48:53,051 INFO Stopping VIA pipeline
2024-10-25 09:48:53,052 ERROR Expecting value: line 1 column 1 (char 0)
2024-10-25 09:48:53,052 ERROR Failed to load VIA pipeline - CA-RAG setup failed. Check if NVIDIA_API_KEY set correctly and/or LLM configuration in CA-RAG config is valid.

As per our understanding this is due to network restriction at our organization because of which it is not able to make the call with NVIDIA NIM API

Is there is any possibility to download the model offline and keep it at desired path, then please share the steps, path and the link to download the model.

You can refer to the link I attached before Using Locally Deployed LLM NIM instead of NVIDIA Hosted LLM NIM. At the end of page 44, we have instructions on how to deploy locally.
All you need to do is deploy the NIM locally. The documentation explains how to configure the model before you start the docker command.

We tried running the container for llama3-8b-instruct NIM locally but it is giving us the below error:

Exception: error sending request for url (https://authn.nvidia.com/token?scope=group/ngc:nvidia)

Can you please help us in setting up llama3-8b-instruct NIM locally.

Did you follow our Guide launch-nvidia-nim-for-llms step by step?

Yes, We followed step by step procedure from the documentation but it is giving us error as:
Exception: error sending request for url (https://authn.nvidia.com/token?scope=group/ngc:nvidia)
And we are not able to access the URL: https://authn.nvidia.com/token?scope=group/ngc:nvidia from personal network as well.

Can you please look into this on priority

At which step specifically did the error occur?

When i am trying to load the llm model locally by using the below docker command:

export NGC_API_KEY=<PASTE_API_KEY_HERE>
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p “$LOCAL_NIM_CACHE”
docker run -it --rm
–gpus all
–shm-size=16GB
-e NGC_API_KEY=$NGC_API_KEY
-v “$LOCAL_NIM_CACHE:/opt/nim/.cache”
-u $(id -u)
-p 8000:8000
nvcr.io/nim/meta/llama3-8b-instruct:1.0.3

This is most likely a problem with your NGC_API_KEY. Did you get this key by referring to the Guide in the link?