Jarvis Quick Start: Installation fails

Sorry for the difficulties here everyone. We have issued a Jarvis 1.2.1 with a workaround for the NGC connection issues, and the NGC problem has also been resolved. Please reply here if you are still having difficulties.

Hi All,

It seems to be an issue with the script not updating the nameservers in the container.

docker login nvcr.io

Login Succeeded

ngc registry resource download-version “nvidia/jarvis/jarvis_quickstart:1.2.1-beta”

looks good

updated jarvis_init.sh to include the -x flag for debug.

./jarvis_init.sh

  • ‘[’ true = true ‘]’
  • gmr_speech_models=
  • ‘[’ true = true ‘]’
  • for model in ${models_asr[@]}
  • gmr_speech_models+=’ nvidia/jarvis/jmir_punctuation:1.2.0-beta’
  • for model in ${models_asr[@]}
  • gmr_speech_models+=’ nvidia/jarvis/jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming:1.2.0-beta’
  • for model in ${models_asr[@]}
  • gmr_speech_models+=’ nvidia/jarvis/jmir_jarvis_asr_citrinet_1024_asrset1p7_offline:1.2.0-beta’
  • ‘[’ true = true ‘]’
  • for model in ${models_nlp[@]}
  • gmr_speech_models+=’ nvidia/jarvis/jmir_punctuation:1.2.0-beta’
  • for model in ${models_nlp[@]}
  • gmr_speech_models+=’ nvidia/jarvis/jmir_named_entity_recognition:1.2.0-beta’
  • for model in ${models_nlp[@]}
  • gmr_speech_models+=’ nvidia/jarvis/jmir_intent_slot:1.2.0-beta’
  • for model in ${models_nlp[@]}
  • gmr_speech_models+=’ nvidia/jarvis/jmir_question_answering:1.2.0-beta’
  • for model in ${models_nlp[@]}
  • gmr_speech_models+=’ nvidia/jarvis/jmir_text_classification:1.2.0-beta’
  • ‘[’ true = true ‘]’
  • for model in ${models_tts[@]}
  • gmr_speech_models+=’ nvidia/jarvis/jmir_jarvis_tts_ljspeech:1.2.0-beta’
  • docker run --init -it --rm --gpus ‘“device=0”’ -v jarvis-model-repo:/data -e NGC_CLI_API_KEY=OTFmaWJvbjhpYQyODEtMzY3MC00YjEyLTgwNzYtZmU2NTIyMTZiYjAy -e NGC_CLI_ORG=nvidia --name jarvis-service-maker nvcr.io/nvidia/jarvis/jarvis-speech:1.2.1-beta-servicemaker download_ngc_models nvidia/jarvis/jmir_punctuation:1.2.0-beta nvidia/jarvis/jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming:1.2.0-beta nvidia/jarvis/jmir_jarvis_asr_citrinet_1024_asrset1p7_offline:1.2.0-beta nvidia/jarvis/jmir_punctuation:1.2.0-beta nvidia/jarvis/jmir_named_entity_recognition:1.2.0-beta nvidia/jarvis/jmir_intent_slot:1.2.0-beta nvidia/jarvis/jmir_question_answering:1.2.0-beta nvidia/jarvis/jmir_text_classification:1.2.0-beta nvidia/jarvis/jmir_jarvis_tts_ljspeech:1.2.0-beta

Downloading nvidia/jarvis/jmir_punctuation:1.2.0-beta…
Url: ‘https://authn.nvidia.com/token?service=ngc&scope=group/ngc:nvidia&scope=group/ngc:nvidia/jarvis’ is not reachable.
Attempt 1 out of 3 failed
Trying again…
Url: ‘https://authn.nvidia.com/token?service=ngc&scope=group/ngc:nvidia&scope=group/ngc:nvidia/jarvis’ is not reachable.

**init script fails on model pull, looks like it runs the jarvis-service-maker container which inturn takes a flag download_ngc_models which I guess downloads the parameters of the required model uri’s .

lets try a straight ngc pull

$ ngc registry model download-version nvidia/jarvis/jmir_punctuation:1.2.0-beta

{
“download_end”: “2021-06-13 18:14:10.291947”,
“download_start”: “2021-06-13 18:12:46.208310”,
“download_time”: “1m 24s”,
“files_downloaded”: 1,
“local_path”: “/home//code/jarvis_quickstart_v1.2.1-beta/jmir_punctuation_v1.2.0-beta”,
“size_downloaded”: “418.11 MB”,
“status”: “Completed”,
“transfer_id”: “jmir_punctuation_v1.2.0-beta”
}

ok that works, we can manually pull via ngc

**however, if we get a command prompt in the container, it looks like the container image broken **

~/code/jarvis_quickstart_v1.2.1-beta$ docker run --init -it --rm --gpus ‘“device=0”’ -v jarvis-model-repo:/data -e NGC_CLI_API_KEY=OTFmaWJvbjhpY2lmYjVtc3FqYjUzamwwbIyMTZiYjAy -e NGC_CLI_ORG=nvidia --name jarvis-service-maker nvcr.io/nvidia/jarvis/jarvis-speech:1.2.1-beta-servicemaker /bin/bash

==========================
== Jarvis Speech Skills ==

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be
insufficient for the inference server. NVIDIA recommends the use of the following flags:
nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 …

root@69e8cd412c3a:/opt/jarvis# download_ngc_models
/data/artifacts /opt/jarvis
/opt/jarvis
root@69e8cd412c3a:/opt/jarvis# download_ngc_models nvidia/jarvis/jmir_punctuation:1.2.0-beta nvidia/jarvis/jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming:1.2.0-beta nvidia/jarvis/jmir_jarvis_asr_citrinet_1024_asrset1p7_offline:1.2.0-beta nvidia/jarvis/jmir_punctuation:1.2.0-beta nvidia/jarvis/jmir_named_entity_recognition:1.2.0-beta nvidia/jarvis/jmir_intent_slot:1.2.0-beta nvidia/jarvis/jmir_question_answering:1.2.0-beta nvidia/jarvis/jmir_text_classification:1.2.0-beta nvidia/jarvis/jmir_jarvis_tts_ljspeech:1.2.0-beta
/data/artifacts /opt/jarvis

Downloading nvidia/jarvis/jmir_punctuation:1.2.0-beta…
Url: ‘https://authn.nvidia.com/token?service=ngc&scope=group/ngc:nvidia&scope=group/ngc:nvidia/jarvis’ is not reachable.
Attempt 1 out of 3 failed
Trying again…

we know the models are in the registry, from the host run

ngc registry model list | grep citrinet

from the container bash shell
root@69e8cd412c3a:/opt/jarvis# ngc registry model download-version nvidia/jarvis/jmir_punctuation:1.2.0-beta
Url: ‘https://authn.nvidia.com/token?service=ngc&scope=group/ngc:nvidia&scope=group/ngc:nvidia/jarvis’ is not reachable.
root@69e8cd412c3a:/opt/jarvis#

checked env variables

NGC_CLI_API_KEY=OTFmaWJvbjhpY2…Blah…

looking at the download_ngc_models script it doesnt use/pickup the NGC_CLI_API_KEY environment variable nor is there a ~/.ngc/config file in the container.

root@69e8cd412c3a:/opt/jarvis# cat /usr/local/bin/download_ngc_models
#!/bin/bash -x

Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.


force_mode=false

ARTIFACT_DIR=${ARTIFACT_DIR:-/data/artifacts}
JMIR_DIR=${JMIR_DIR:-/data/jmir}

[ -d $ARTIFACT_DIR ] || mkdir $ARTIFACT_DIR
[ -d $JMIR_DIR ] || mkdir $JMIR_DIR

pushd $ARTIFACT_DIR

for model in “$@”
do
if [[ “$model” == ‘–force’ ]]; then
force_mode=true
continue
fi
model_loc=echo $model | rev | cut -d "/" -f 1 | rev
model_name=echo $model_loc | cut -d ":" -f 1
model_version=echo $model_loc | cut -d ":" -f 2

# trying to retrieve version if user skipped it
if [[ -z "${model_version}" || "${model_version}" == "${model_name}" ]]; then
    model_version="$(ngc registry model list --format_type csv --column version $model | awk -F ',' 'FNR>1 {print $NF}')"
fi

dir_name="${model_name}_v${model_version}"
if [[ "$force_mode" == 'false' && -e ${dir_name} ]]; then
    echo "Directory ${dir_name} already exists, skipping. Use '--force' option to override."
    continue
elif [[ "$force_mode" == 'true' && -e ${dir_name} ]]; then
    rm -Rf ${dir_name}
fi

attempts=3
echo "  > Downloading $model..."
for ((i = 1 ; i <= $attempts ; i++)); do
    ngc registry model download-version $model
    if [ $? -ne 0 ]; then
        echo "  > Attempt $i out of $attempts failed"
        if [ $i -eq $attempts ]; then
            echo "Error occurred downloading '$model'. Exiting."
            exit 1
        else
            echo "  > Trying again..."
            if [[ -e ${dir_name} ]]; then
                echo " > Cleaning up partial download  ${dir_name}"
                rm -Rf ${dir_name}
            fi
            continue
        fi
    else
        break
    fi
done

cp ${model_name}_v${model_version}/*.jmir $JMIR_DIR

done

popd
root@69e8cd412c3a:/opt/jarvis#

I guess the fix is to modify the /usr/local/bin/download_ngc_models script to create the ~/.ngc/config file however this does not work either

root@69e8cd412c3a:/opt/jarvis# ngc config set
Enter API key [********************************************************************************YjAy]. Choices: [<VALID_APIKEY>, ‘no-apikey’]: OTFmaWJvbjhpY2lmYjVtc3YtZmU2NTIyMTZiYjAy
Url: ‘https://authn.nvidia.com/token?service=ngc&’ is not reachable.

not resolvable, now we are getting somewhere, its gotta be a docker problem right?

root@69e8cd412c3a:/opt/jarvis# wget authn.nvidia.com
–2021-06-13 09:35:22-- http://authn.nvidia.com/
Resolving authn.nvidia.com (authn.nvidia.com)… failed: Temporary failure in name resolution.
wget: unable to resolve host address ‘authn.nvidia.com

On the container
root@b03debefa9eb:/opt/jarvis# cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local
nameserver 10.233.0.3
nameserver 127.0.0.53
options ndots:2 timeout:2 attempts:2

On the host
$ cat /etc/resolv.conf

nameserver 127.0.0.53
options edns0 trust-ad
docker network ls
docker network inspect


“Containers”: {
“b03debefa9eb7b0c0c1a568ba667b4a26854fb0d64ba70ecf9c38d75809a8894”: {
“Name”: “jarvis-service-maker”,
“EndpointID”: “b3672eb901c43b1ae3635599156ca5b2bbba16181d94940b684895200b0c0053”,
“MacAddress”: “02:42:ac:11:00:02”,
“IPv4Address”: “172.17.0.2/16”,
“IPv6Address”: “”
}
},

Clearly the container has the wrong dns in resolve.conf on the container, it needs to be updated to connect to the Bridge address to be able to resolve hosts from the container, I can do this manually but the dockerfile for the container should be updated to use the bridge IP ( we don’t have access to the dockerfile because we dont have source access)

Add your gateway address to the --dns=x.x.x.x value in the docker run command in the jarvis_init.sh script


docker run --init -it --rm --gpus '"device=0"' **--dns=192.168.1.1** -v jarvis-model-repo:/data -e NGC_CLI_API_KEY=OTFmaWJvbjhpY2lmYjVtc3Fq<apikey>2NTIyMTZiYjAy -e NGC_CLI_ORG=nvidia --name jarvis-service-maker nvcr.io/nvidia/jarvis/jarvis-speech:1.2.1-beta-servicemaker download_ngc_models nvidia/jarvis/jmir_punctuation:1.2.0-beta nvidia/jarvis/jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming:1.2.0-beta nvidia/jarvis/jmir_jarvis_asr_citrinet_1024_asrset1p7_offline:1.2.0-beta nvidia/jarvis/jmir_punctuation:1.2.0-beta nvidia/jarvis/jmir_named_entity_recognition:1.2.0-beta nvidia/jarvis/jmir_intent_slot:1.2.0-beta nvidia/jarvis/jmir_question_answering:1.2.0-beta nvidia/jarvis/jmir_text_classification:1.2.0-beta nvidia/jarvis/jmir_jarvis_tts_ljspeech:1.2.0-beta


Downloading nvidia/jarvis/jmir_punctuation:1.2.0-beta...
Downloaded 418.11 MB in 1m 19s, Download speed: 5.29 MB/s

Transfer id: jmir_punctuation_v1.2.0-beta Download status: Completed.
Downloaded local path: /data/artifacts/jmir_punctuation_v1.2.0-beta
Total files downloaded: 1 
Total downloaded size: 418.11 MB
Started at: 2021-06-13 12:27:08.492560
Completed at: 2021-06-13 12:28:27.601529
Duration taken: 1m 19s
----------------------------------------------------
  > Downloading nvidia/jarvis/jmir_jarvis_asr_citrinet_1024_asrset1p7_streaming:1.2.0-beta...
Downloaded 579.01 MB in 1m 31s, Download speed: 6.35 MB/s               
----------------------------------------------------

It works!

1 Like