RIVA ASR not working in ESXi8 environment

Please provide the following information when requesting support.

Hardware - GPU (A100/A30/T4/V100) - grid_a100-20c
Hardware - CPU - AMD EPYC 7352 24-Core Processor
Operating System / VMware ESXi, 8.0.0, 21203435 - Virtual Machine(Ubuntu 20.04)
Riva Version - riva_quickstart:2.10.0
TLT Version (if relevant)
How to reproduce the issue ? (This is for errors. Please share the command and the detailed log here)

Hi all.

I’m trying to test ASR,TTS features etc with Nvidia riva quick start guide.

I downloaded the datacenter riva_quirck_start:2.10.0 version and set only asr,nmt services to true in config.sh as the guide says.

I only enabled the German translation part of NMT, the rest is the same as default.

Running riva_start.sh works fine.

Then I wrote and executed the guide code for ASR test, but I get the following error.

------------------------------------ Error ----------------------------------------------

{
“name”: “_InactiveRpcError”,
“message”: “<InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "in ensemble ‘conformer-en-US-asr-offline’, audio_signal: failed to perform CUDA copy: an illegal memory access was encountered"\n\tdebug_error_string = "UNKNOWN:Error received from peer {grpc_message:"in ensemble \‘conformer-en-US-asr-offline\’, audio_signal: failed to perform CUDA copy: an illegal memory access was encountered", grpc_status:2, created_time:"2023-04-21T01:03:01.91677218+00:00"}"\n>",
“stack”: "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31m_InactiveRpcError\u001b[0m Traceback (most recent call last)\nCell \u001b[1;32mIn[13], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m response \u001b[39m=\u001b[39m riva_asr\u001b[39m.\u001b[39;49moffline_recognize(content, config)\n\u001b[0;32m 2\u001b[0m asr_best_transcript \u001b[39m=\u001b[39m response\u001b[39m.\u001b[39mresults[\u001b[39m0\u001b[39m]\u001b[39m.\u001b[39malternatives[\u001b[39m0\u001b[39m]\u001b[39m.\u001b[39mtranscript\n\u001b[0;32m 3\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39m"\u001b[39m\u001b[39mASR Transcript:\u001b[39m\u001b[39m"\u001b[39m, asr_best_transcript)\n\nFile \u001b[1;32m~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\riva\client\asr.py:362\u001b[0m, in \u001b[0;36mASRService.offline_recognize\u001b[1;34m(self, audio_bytes, config, future)\u001b[0m\n\u001b[0;32m 360\u001b[0m request \u001b[39m=\u001b[39m rasr\u001b[39m.\u001b[39mRecognizeRequest(config\u001b[39m=\u001b[39mconfig, audio\u001b[39m=\u001b[39maudio_bytes)\n\u001b[0;32m 361\u001b[0m func \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mstub\u001b[39m.\u001b[39mRecognize\u001b[39m.\u001b[39mfuture \u001b[39mif\u001b[39;00m future \u001b[39melse\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39mstub\u001b[39m.\u001b[39mRecognize\n\u001b[1;32m–> 362\u001b[0m \u001b[39mreturn\u001b[39;00m func(request, metadata\u001b[39m=\u001b[39;49m\u001b[39mself\u001b[39;49m\u001b[39m.\u001b[39;49mauth\u001b[39m.\u001b[39;49mget_auth_metadata())\n\nFile \u001b[1;32m~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\grpc\channel.py:1030\u001b[0m, in \u001b[0;36m_UnaryUnaryMultiCallable.call\u001b[1;34m(self, request, timeout, metadata, credentials, wait_for_ready, compression)\u001b[0m\n\u001b[0;32m 1021\u001b[0m \u001b[39mdef\u001b[39;00m \u001b[39m__call
\u001b[39m(\u001b[39mself\u001b[39m,\n\u001b[0;32m 1022\u001b[0m request: Any,\n\u001b[0;32m 1023\u001b[0m timeout: Optional[\u001b[39mfloat\u001b[39m] \u001b[39m=\u001b[39m \u001b[39mNone\u001b[39;00m,\n\u001b[1;32m (…)\u001b[0m\n\u001b[0;32m 1026\u001b[0m wait_for_ready: Optional[\u001b[39mbool\u001b[39m] \u001b[39m=\u001b[39m \u001b[39mNone\u001b[39;00m,\n\u001b[0;32m 1027\u001b[0m compression: Optional[grpc\u001b[39m.\u001b[39mCompression] \u001b[39m=\u001b[39m \u001b[39mNone\u001b[39;00m) \u001b[39m-\u001b[39m\u001b[39m>\u001b[39m Any:\n\u001b[0;32m 1028\u001b[0m state, call, \u001b[39m=\u001b[39m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_blocking(request, timeout, metadata, credentials,\n\u001b[0;32m 1029\u001b[0m wait_for_ready, compression)\n\u001b[1;32m-> 1030\u001b[0m \u001b[39mreturn\u001b[39;00m _end_unary_response_blocking(state, call, \u001b[39mFalse\u001b[39;49;00m, \u001b[39mNone\u001b[39;49;00m)\n\nFile \u001b[1;32m~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\grpc\_channel.py:910\u001b[0m, in \u001b[0;36m_end_unary_response_blocking\u001b[1;34m(state, call, with_call, deadline)\u001b[0m\n\u001b[0;32m 908\u001b[0m \u001b[39mreturn\u001b[39;00m state\u001b[39m.\u001b[39mresponse\n\u001b[0;32m 909\u001b[0m \u001b[39melse\u001b[39;00m:\n\u001b[1;32m–> 910\u001b[0m \u001b[39mraise\u001b[39;00m _InactiveRpcError(state)\n\n\u001b[1;31m_InactiveRpcError\u001b[0m: <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "in ensemble ‘conformer-en-US-asr-offline’, audio_signal: failed to perform CUDA copy: an illegal memory access was encountered"\n\tdebug_error_string = "UNKNOWN:Error received from peer {grpc_message:"in ensemble \‘conformer-en-US-asr-offline\’, audio_signal: failed to perform CUDA copy: an illegal memory access was encountered", grpc_status:2, created_time:"2023-04-21T01:03:01.91677218+00:00"}"\n>”
}

Can you tell me if this is a CUDA version related issue?

If it is a CUDA version issue, how can I resolve it?

After executing riva_start_client.sh, the NMT Guide Code works normally.

python3 /opt/riva/examples/nmt.py --model-name=en_de_24x6 --src-language=en --tgt-language=de --text=“I love you.” → Ich liebe dich.

Thanks for reading and have a great day.

Hi @mklee1

Thanks for your interest in Riva

Request to share the complete log output of docker logs riva-speech in this thread

Quick doubt, do we have multiple GPUs present in your machine, if yes can we try running only using a single GPU

Thanks

Hi @rvinobha

First of all, thank you for your reply.

I am attaching the docker logs riva-speech log as a txt file.

riva-speech-logs.txt (2.2 KB)

I am using Single GPU for the VM.

I am attaching the nvidia-smi screenshot.

image

Thanks a lot.

Have a nice day.

I saw a similar inquiry on the forum.

I downgraded riva to 2.7.0 version and ASR seems to be working fine.

As for the log file, now that I look at it, it seems to be a miscommunication.

Thanks.

Hi @mklee1

Thanks for proactively trying 2.7 and finding out it works, we will try to find why it didn’t work in 2.10

Thanks for sharing the logs,

Apologies, from the logs captured, I can find the riva-start has failed,

Can you share the complete log output of riva-init to find some clue regarding the failure, as docker logs riva-speech currently shared does not have any details

Also when running riva-start, simultaenously can you parallelly run docker logs riva-speech simultaneously in another window and reshare again

Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.