Riva quickstart 2.18 nmt commands giving error: Failed to open the cudaIpcHandle

tfischer3 · January 27, 2025, 4:59pm

Please provide the following information when requesting support.

Hardware - GPU Geforce Rtx 4060 Ti 16gb
Hardware - CPU i7-13700F
Operating System WSL2 on windows 11 pro
Riva Version 2.18
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

Download the riva quick start 2.18.
Edit config.sh, uncommenting the following line from models_nmt:
"${riva_ngc_org}/${riva_ngc_team}/rmir_megatronnmt_en_any_500m:${riva_ngc_model_version}"
run riva_init.sh, riva_start.sh and riva_start_client.sh
run the following nmt command:

riva_nmt_streaming_s2s_client --audio_file=/opt/riva/wav/en-US_sample.wav --source_language_code="en-US" --target_language_code="es-US"

Observe the following error in ‘docker logs riva-speech’:

I0127 16:49:39.259224   524 grpc_riva_asr.cc:1406] ASRService.StreamingRecognize called.
I0127 16:49:39.260064   524 grpc_riva_asr.cc:1450] Using model conformer-en-US-asr-streaming-asr-bls-ensemble from Triton localhost:8001 for inference
I0127 16:49:39.867183   524 stats_builder.h:100] {"specversion":"1.0","type":"riva.asr.streamingrecognize.v1","source":"","subject":"","id":"02ee2783-2133-4d7b-8bec-b68587fc5910","datacontenttype":"application/json","time":"2025-01-27T16:49:39.259210526+00:00","data":{"release_version":"2.18.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","language_code":"en-US","request_count":1,"audio_duration":4.000000476837158,"speech_duration":0.0,"status":0,"err_msg":""}}
I0127 16:49:39.867390   725 grpc_riva_nmt.cc:220] NMT->TranslateText Requested: megatronnmt_en_any_500m with en -> es
I0127 16:49:40.146175 148 pb_stub.cc:1307] "Failed to initialize CUDA shared memory pool in Python stub: Failed to open the cudaIpcHandle. error: invalid resource handle"
W0127 16:49:40.147103 148 python_be.cc:1389] "Failed to share CUDA memory pool with stub process: Failed to open the cudaIpcHandle. error: invalid resource handle. Will use CUDA IPC."
E0127 16:49:40.147318 148 pb_stub.cc:721] "Failed to process the request(s) for model 'megatronnmt_en_any_500m_0_0', message: TritonModelException: Failed to open the cudaIpcHandle. error: invalid resource handle\n\nAt:\n  /data/models/megatronnmt_en_any_500m/1/megatron_model.py(178): run\n  /data/models/megatronnmt_en_any_500m/1/megatron_model.py(433): run\n  /data/models/megatronnmt_en_any_500m/1/megatron_model.py(629): process_batch\n  /data/models/megatronnmt_en_any_500m/1/megatron_model.py(670): execute\n"
error: NMT model inference failure: Failed to process the request(s) for model 'megatronnmt_en_any_500m_0_0', message: TritonModelException: Failed to open the cudaIpcHandle. error: invalid resource handle

At:
  /data/models/megatronnmt_en_any_500m/1/megatron_model.py(178): run
  /data/models/megatronnmt_en_any_500m/1/megatron_model.py(433): run
  /data/models/megatronnmt_en_any_500m/1/megatron_model.py(629): process_batch
  /data/models/megatronnmt_en_any_500m/1/megatron_model.py(670): execute

I0127 16:49:40.148056   725 stats_builder.h:242] {"specversion":"1.0","type":"riva.nmt.translatetext.v1","source":"","subject":"","id":"5321a541-16c3-4597-a655-04202ea4ec87","datacontenttype":"application/json","time":"2025-01-27T16:49:39.867378205+00:00","data":{"release_version":"2.18.0","customer_uuid":"","ngc_org":"","ngc_team":"","ngc_org_team":"","container_uuid":"","source_language":"en","target_language":"es","request_count":1,"total_characters":37,"audio_duration":0,"status":2,"err_msg":"Error: Triton model failed during inference."}}
[885963e1dfc1:455  :0:726] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)

Please help!
thanks,
Tom

PS - This only happens for the nmt commands. The asr, tts, etc (eg riva_asr_client) commands work fine.

PPS - When this command is executed, an audio file is generated, but it is only 44 bytes and seems to contain no audio data. Also, the server crashes as a result of this error.

Topic		Replies	Views
Jetson orin nano 4G riva fail Riva	2	568	January 8, 2024
Jetson Orin Nano riva fail Jetson Orin Nano riva	7	750	January 30, 2024
Riva ASR quickstart throws cudaError: "an illegal memory access was encountered" Riva riva	7	1228	October 14, 2023
Something wrong with riva quickstart Riva	6	1207	May 28, 2023
RIVA quick start tutorial: I got an error while accessing server after modifying the Config.sh file Riva	2	545	February 23, 2024
Nvidia Riva error Riva cuda , riva	8	637	May 11, 2024
Nvidia Riva Connection Riva grpc , riva	8	321	July 16, 2024
RIVA ASR not working in ESXi8 environment Riva cuda , riva , esxi	5	707	May 11, 2023
Error while running Riva Virtual Assistant Example Riva riva	0	36	September 12, 2024
Riva 2.16 quick start error - riva_init.sh - invalid API key Riva ubuntu , nim	5	161	August 7, 2024

Riva quickstart 2.18 nmt commands giving error: Failed to open the cudaIpcHandle

Related topics