Jarvis looks like a very interesting and useful project. However, I seem to have issues running it. I have RTX 3090 and when I run jarvis_start.sh it times out waiting for triton to start.
$ ./jarvis_start.sh
...
+ '[' 2 -ne 0 ']'
+ echo 'Waiting for Jarvis server to load all models...retrying in 10 seconds'
Waiting for Jarvis server to load all models...retrying in 10 seconds
+ sleep 10
+ echo 'Health ready check failed.'
Health ready check failed.
+ echo 'Check Jarvis logs with: docker logs jarvis-speech'
Check Jarvis logs with: docker logs jarvis-speech
+ exit 1
From the docker logs it looks like triton all of a sudden unloads all the models. The only thing off I could see is that tacotron2_ensemble is not found for some reason. And also that there were some problems with tacotron2_decoder_postnet header.
I manually deleted tacotron_* from /data/models and re-run deploy_all_models /data/jmir /data/models, but it didnāt help. I also tired running tritonserver with --strict-model-config=false, again didnāt hlep.
(I didnāt touch config.sh, so everything is default)
E0418 17:27:42.601750 415 logging.cc:43] coreReadArchive.cpp (32) - Serialization Error in verifyHeader: 0 (Magic tag does not match)
E0418 17:27:42.601784 415 logging.cc:43] INVALID_STATE: std::exception
E0418 17:27:42.601790 415 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
W0418 17:27:42.623235 415 autofill.cc:225] Autofiller failed to detect the platform for jarvis-trt-waveglow (verify contents of model directory or use --log-verbose=1 for more details)
W0418 17:27:42.623242 415 autofill.cc:248] Proceeding with simple config for now
E0418 17:27:42.623247 415 model_repository_manager.cc:1682] unexpected platform type for jarvis-trt-waveglow
E0418 17:27:42.757081 415 logging.cc:43] coreReadArchive.cpp (32) - Serialization Error in verifyHeader: 0 (Magic tag does not match)
E0418 17:27:42.757116 415 logging.cc:43] INVALID_STATE: std::exception
E0418 17:27:42.757122 415 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
E0418 17:27:42.872995 415 logging.cc:43] coreReadArchive.cpp (32) - Serialization Error in verifyHeader: 0 (Magic tag does not match)
E0418 17:27:42.873026 415 logging.cc:43] INVALID_STATE: std::exception
E0418 17:27:42.873032 415 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
W0418 17:27:42.880388 415 autofill.cc:225] Autofiller failed to detect the platform for tacotron2_decoder_postnet (verify contents of model directory or use --log-verbose=1 for more details)
...
...
| Model | Version | Status |
+-------------------------------+---------+----------------------------------------+
...
| tacotron2_decoder_postnet | 1 | READY |
| tacotron2_ensemble | - | Not loaded: No model version was found |
| tts_preprocessor | 1 | READY |
...
...
I0418 17:30:15.524788 56 server.cc:235] Timeout 30: Found 17 live models and 0 in-flight non-inference requests
> Jarvis waiting for Triton server to load all models...retrying in 1 second
I0418 17:30:16.525058 56 server.cc:235] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
Hi @artem5,
Could you please share the complete error log (console output) and system details (GPU Type, Windows/Linux Version, docker version etc.) so we can help better?
This error /data/models/jarvis-trt-jarvis_intent_weather-nn-bert-base-uncased/config.pbtxt: No such file or directory
I was getting that same error. It was because when running jarvis_init it failed on some models. So to fix that I ran this command docker run --init -it --rm --gpus "device=0" -v jarvis-model-repo:/data --name jarvis-speech-maker nvcr.io/nvidia/jarvis/jarvis-speech:1.0.0-b.3-server /bin/bash
With that I got a shell into the container and went to the /data/models and then removed all the models that gave the config.pbtxt error.
(this is probably because it failed to generate the model when running the jarvis_init)
Then I ran the jarvis_init again.
I kept doing this till most the models gave no error.
But I kept on getting an error with jarvis-trt-waveglow where it said Exception: build_waveglow failed to generate waveglow.eng.
Otherwise all the others didnāt give the config.pbtxt error.
I also do get get a cuda error in copytodevice: 2 (out of memory) error when trying to start the jarvis server probably because I donāt have enough gpu memory.
I have done that but it fails to deploy because of waveglow and I still get the out of memory error. I am pretty sure that it is because of the gpu having only 8GB vram while the minimum is 16