Quick start jarvis_init cuda out of memory

mikedov · March 25, 2021, 11:35pm

Hello, I am trying to run the Jarvis quick start scripts on a GTX 1660Ti and in jarvis_init.sh I am getting this error:

[INFO] Building TRT engine from PyTorch Checkpoint
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: …/rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
Engine generation failed! Please check input params.

On a watch window with nvidia-smi running, the gpu memory got up to around 3000MiB out of the 5941MiB available before I got the error and it dropped back down to 200MiB.

nvidia-smi:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 166… On | 00000000:01:00.0 On | N/A |
| 45% 31C P8 10W / 120W | 282MiB / 5941MiB | 2% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 886 G /usr/lib/xorg/Xorg 35MiB |
| 0 N/A N/A 1514 G /usr/lib/xorg/Xorg 122MiB |
| 0 N/A N/A 1641 G /usr/bin/gnome-shell 37MiB |
| 0 N/A N/A 8243 G …AAAAAAAAA= --shared-files 75MiB |
±----------------------------------------------------------------------------+

uname -a:

Linux mike-MS-7978 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Can you please help me to resolve this Issue?

SunilJB · March 26, 2021, 5:05am

Hi @mikedov
Could you please share the complete log/console log?
Also, please refer to below link for model’s memory requirement and support matrix:
https://docs.nvidia.com/deeplearning/jarvis/user-guide/docs/support-matrix.html

Thanks

mikedov · March 26, 2021, 7:08pm

Hello @SunilJB
In the config I have ASR and TTS disabled so it won’t take up memory.

service_enabled_asr=false
service_enabled_nlp=true
service_enabled_tts=false

Here is the console log

bash jarvis_init.sh 
Logging into NGC docker registry if necessary...
Pulling required docker images if necessary...
Note: This may take some time, depending on the speed of your Internet connection.
> Pulling Jarvis Speech Server images.
  > Image nvcr.io/nvidia/jarvis/jarvis-speech:1.0.0-b.2-server exists. Skipping.
  > Image nvcr.io/nvidia/jarvis/jarvis-speech-client:1.0.0-b.2 exists. Skipping.
  > Image nvcr.io/nvidia/jarvis/jarvis-speech:1.0.0-b.2-servicemaker exists. Skipping.

Downloading models (JMIRs) from NGC...
Note: this may take some time, depending on the speed of your Internet connection.
To skip this process and use existing JMIRs set the location and corresponding flag in config.sh.

==========================
== Jarvis Speech Skills ==
==========================

NVIDIA Release  (build 17345815)

Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
find: File system loop detected; ‘/usr/bin/X11’ is part of the same file system loop as ‘/usr/bin’.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

/data/artifacts /opt/jarvis
  > Downloading nvidia/jarvis/jmir_punctuation:1.0.0-b.1...
Downloaded 418.11 MB in 51s, Download speed: 8.19 MB/s               
----------------------------------------------------
Transfer id: jmir_punctuation_v1.0.0-b.1 Download status: Completed.
Downloaded local path: /data/artifacts/jmir_punctuation_v1.0.0-b.1
Total files downloaded: 1 
Total downloaded size: 418.11 MB
Started at: 2021-03-26 19:01:20.054704
Completed at: 2021-03-26 19:02:11.131709
Duration taken: 51s
----------------------------------------------------
  > Downloading nvidia/jarvis/jmir_named_entity_recognition:1.0.0-b.1...
Downloaded 420.38 MB in 21s, Download speed: 19.99 MB/s               
----------------------------------------------------
Transfer id: jmir_named_entity_recognition_v1.0.0-b.1 Download status: Completed.
Downloaded local path: /data/artifacts/jmir_named_entity_recognition_v1.0.0-b.1
Total files downloaded: 1 
Total downloaded size: 420.38 MB
Started at: 2021-03-26 19:02:15.946996
Completed at: 2021-03-26 19:02:36.979057
Duration taken: 21s
----------------------------------------------------
  > Downloading nvidia/jarvis/jmir_intent_slot:1.0.0-b.1...
Downloaded 422.71 MB in 34s, Download speed: 12.42 MB/s               
----------------------------------------------------
Transfer id: jmir_intent_slot_v1.0.0-b.1 Download status: Completed.
Downloaded local path: /data/artifacts/jmir_intent_slot_v1.0.0-b.1
Total files downloaded: 1 
Total downloaded size: 422.71 MB
Started at: 2021-03-26 19:02:41.869351
Completed at: 2021-03-26 19:03:15.914562
Duration taken: 34s
----------------------------------------------------
  > Downloading nvidia/jarvis/jmir_question_answering:1.0.0-b.1...
Downloaded 418.06 MB in 18s, Download speed: 23.2 MB/s                
----------------------------------------------------
Transfer id: jmir_question_answering_v1.0.0-b.1 Download status: Completed.
Downloaded local path: /data/artifacts/jmir_question_answering_v1.0.0-b.1
Total files downloaded: 1 
Total downloaded size: 418.06 MB
Started at: 2021-03-26 19:03:20.610969
Completed at: 2021-03-26 19:03:38.636962
Duration taken: 18s
----------------------------------------------------
  > Downloading nvidia/jarvis/jmir_text_classification:1.0.0-b.1...
Downloaded 420.27 MB in 20s, Download speed: 20.98 MB/s               
----------------------------------------------------
Transfer id: jmir_text_classification_v1.0.0-b.1 Download status: Completed.
Downloaded local path: /data/artifacts/jmir_text_classification_v1.0.0-b.1
Total files downloaded: 1 
Total downloaded size: 420.27 MB
Started at: 2021-03-26 19:03:43.328678
Completed at: 2021-03-26 19:04:03.358189
Duration taken: 20s
----------------------------------------------------
/opt/jarvis

Converting JMIRs at /home/mikedov/nvidia/jarvis_quickstart_v1.0.0-b.2/jarvis_models/jmir to Jarvis Model repository.
+ docker run --init -it --rm --gpus '"device=0"' -v /home/mikedov/nvidia/jarvis_quickstart_v1.0.0-b.2/jarvis_models:/data -e MODEL_DEPLOY_KEY=tlt_encode nvcr.io/nvidia/jarvis/jarvis-speech:1.0.0-b.2-servicemaker deploy_all_models /data/jmir /data/models

==========================
== Jarvis Speech Skills ==
==========================

NVIDIA Release  (build 17345815)

Copyright (c) 2018-2020, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.
find: File system loop detected; ‘/usr/bin/X11’ is part of the same file system loop as ‘/usr/bin’.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

2021-03-26 19:04:10,342 [INFO] Writing Jarvis model repository to '/data/models'...
2021-03-26 19:04:10,342 [INFO] The jarvis model repo target directory is /data/models
2021-03-26 19:04:11,967 [INFO] Extract_binaries for tokenizer -> /data/models/jarvis_tokenizer/1
2021-03-26 19:04:13,164 [INFO] Extract_binaries for language_model -> /data/models/jarvis-trt-jarvis_ner-nn-bert-base-uncased/1
2021-03-26 19:04:18,119 [INFO] Building TRT engine from PyTorch Checkpoint
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
[TensorRT] ERROR: ../rtSafe/safeRuntime.cpp (25) - Cuda Error in allocate: 2 (out of memory)
Engine generation failed! Please check input params.
2021-03-26 19:04:32,281 [ERROR] Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/servicemaker/cli/deploy.py", line 88, in deploy_from_jmir
    args.target, jmir, config_only=args.config_only, verbose=args.verbose, overwrite=args.force
  File "/opt/conda/lib/python3.6/site-packages/servicemaker/triton/triton.py", line 323, in serialize_to_disk
    module.serialize_to_disk(repo_dir, jmir, config_only, verbose, overwrite)
  File "/opt/conda/lib/python3.6/site-packages/servicemaker/triton/triton.py", line 217, in serialize_to_disk
    self.generate_config(version_dir, jmir)
  File "/opt/conda/lib/python3.6/site-packages/servicemaker/triton/triton.py", line 249, in generate_config
    input=self._inputs,
AttributeError: 'JarvisBertEncoder' object has no attribute '_inputs'

+ echo

+ echo 'Jarvis initialization complete. Run ./jarvis_start.sh to launch services.'
Jarvis initialization complete. Run ./jarvis_start.sh to launch services.

Thanks

SunilJB · March 31, 2021, 5:16pm

Hi @mikedov,

Could you please try commenting out all the NLP models except 1 and see if that deploys successfully on your setup.

Thanks