Triton server crash during NLP intent inference

  • Hardware: Jetson AGX Orin Developer Kit 64GB
  • Operating System: Ubuntu 22.04 w/ Jetpack 6.0
  • Riva Version: v2.16.0

Hey team, I’m trying to run a text classification NLP task, but when I download a sample model from NGC, the Triton server crashes.

This is how to reproduce the issue:

  1. I’m able to run riva_init.sh and riva_start.sh with default config.sh succesfully.

  2. Download the RIVA Intent Slot model from NGC. Apparently, the quick start for embedded does not include NLP models, so I have to download this one separately. (If there is an easier way to try text classification, please advise.)

  3. Then I follow these instructions to deploy the .riva model. First, I launch the ServiceMaker image:

docker run --gpus all -it --rm -v /ssd/code/riva_models:/servicemaker-dev \
	-v /ssd/code/riva_quickstart_arm64_v2.16.0/model_repository:/data \
	--entrypoint="/bin/bash" nvcr.io/nvidia/riva/riva-speech:2.16.0-servicemaker-l4t-aarch64
  1. Then I do riva-build:
riva-build intent_slot \
    /servicemaker-dev/domain_model_misty.rmir:tlt_encode \
    /servicemaker-dev/domain_model_misty.riva:tlt_encode
  1. The I do riva-deploy:
riva-deploy /servicemaker-dev/domain_model_misty.rmir:tlt_encode /data/models
  1. At this point, I can exit the ServiceMaker container, and I can relaunch Riva. I can see the model being loaded successfully.
I0829 17:16:16.099950 20 http_server.cc:282] Started Metrics Service at 0.0.0.0:8002
I0829 17:16:23.944136    22 model_registry.cc:143] Successfully registered: conformer-en-US-asr-streaming-asr-bls-ensemble for ASR Triton URI: localhost:8001
I0829 17:16:23.996660    22 model_registry.cc:143] Successfully registered: riva-punctuation-en-US for NLP Triton URI: localhost:8001
I0829 17:16:24.008618    22 model_registry.cc:143] Successfully registered: riva_intent_default for NLP Triton URI: localhost:8001
I0829 17:16:24.038719    22 model_registry.cc:143] Successfully registered: riva-punctuation-en-US for NLP Triton URI: localhost:8001
I0829 17:16:24.050558    22 model_registry.cc:143] Successfully registered: riva_intent_default for NLP Triton URI: localhost:8001
I0829 17:16:24.068651    22 model_registry.cc:143] Successfully registered: fastpitch_hifigan_ensemble-English-US for TTS Triton URI: localhost:8001
I0829 17:16:24.138864    22 riva_server.cc:171] Riva Conversational AI Server listening on 0.0.0.0:50051
  1. Then I use the Python client to launch the following text classification request:
result = nlp_service.classify_text(
    input_strings=["Do I need an umbrella today?", "Tell me a joke."],
    model_name="riva_intent_default",
    language_code="en-US",
)
  1. The server crashes with the following logs:
I0829 17:16:44.314632   187 grpc_riva_nlp.cc:52] NLPService.ClassifyText called for riva_intent_default.
Signal (11) received.
 0# 0x0000AAAAE986BCCC in tritonserver
 1# __kernel_rt_sigreturn in linux-vdso.so.1

E0829 17:16:44.801805   187 client_object.cc:116] error: failed to do inference: Socket closed
/opt/riva/bin/start-riva: line 59:    20 Segmentation fault      (core dumped) ${CUSTOM_TRITON_ENV} tritonserver --log-verbose=0 --disable-auto-complete-config $model_repos --cuda-memory-pool-byte-size=0:1000000000
One of the processes has exited unexpectedly. Stopping container.
W0829 17:16:49.018988    22 riva_server.cc:195] Signal: 15

How do I fix this crash? Alternatively, is there another intent classification model I can use to test basic functionality? Thanks, team, for your support.

Hi @zugaldia ,
Let m etry a repro for this and share an update.

Thank you @AakankshaS , let me know if you need any additional information.

Any luck reproducing @AakankshaS ? Or any hints on how to test text classification with Riva otherwise?