Hi,
Activating only one language, everything was working fine but I init riva docker version with riva_init.sh after having activating all the languages in the config.sh (I need only ASR):
service_enabled_asr=true
service_enabled_nlp=false
service_enabled_tts=false
language_code=(en-US en-GB de-DE es-US ru-RU zh-CN hi-IN fr-FR ko-KR pt-BR)
The init finish well after some hours of processing but when I start it with riva_start.sh, I get this error after about one minute :
cudaError_t 700 : "an illegal memory access was encountered" returned from 'cudaMalloc(&data, bytes)' in fileriva/utils/matrix/cu_vector.cc line 179'
cudaError_t 700 : "an illegal memory access was encountered" returned from 'cudaMalloc(&data, row_bytes * rows)' in fileexternal/cu-feat-extr/src/cudamatrix/cu-matrix.cu line 204'
/opt/riva/bin/start-riva: line 4: 103 Segmentation fault (core dumped) ${CUSTOM_TRITON_ENV} tritonserver --log-verbose=0 --strict-model-config=true $model_repos --cuda-memory-pool-byte-size=0:1000000000
It had time to load some models but it hadn’t enough memory to load all of them. I monitored the GPU memory with nivida-smi and I can confirm that the memory load exceed near the 16GB of the card before the crash.
Is there a way to limit the memory used by tritonserver (it looks to be the job of --cuda-memory-pool-byte-size but it doesn’t work)?
Is there a way to load a subpart of the models only? (I don’t need all the languages on each docker container) I only see a parameter for the model repos and I don’t know which one are needed per languages.
Best regards,
Hardware - GPU A2
Hardware - CPU Intel(R) Xeon(R) E-2388G CPU @ 3.20GHz
Operating System Ubuntu 22.04
Riva Version 2.7
TLT Version (if relevant)
How to reproduce the issue ? described above