Nvidia Riva empty inference on Punctuation and Capitalization model

Hardware - GPU V100
Riva Version: riva-speech:2.4.0-server
How to reproduce the issue ?

  1. Fine tune a nemo punctuation and capitalization model.
  2. Convert it to rive using nvidia guide
  3. Deploy it using riva-speech:2.7.0 and it is giving an empty response.

Deploy command
NV_GPU=1 nvidia-docker run -ti --name riva_source -p 8008:8008 -p 50051:50051 -p 5800:8000 -p 58001:8001 -v <localtion_models>:/data/models -e MODEL_REPOSITORY=“–model-repository /data/models” -e INTERIM_RESULTS=“–interim-results” -e CUSTOM_ITN=“–custom-itn” --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 riva-speech:2.4.0-server tritonserver --log-verbose=0 --strict-model-config=true --model-repository /data/models --cuda-memory-pool-byte-size=0:1000000000 --exit-on-error=false

When i do the inference using Nemo, it works perfectly but when i use inference using GRPC endpoint of riva server , the output is empty :

Output looks like this :

raw_input_contents: "\372\000\000\000in a new password i can take it from here alrighty is there anything else i can help you with them thank you very much ms rostow it was my pleasure to assist you thank you for using optum r x and if you chose our survey it will be next okay thank you"

model_name: "riva-punctuation-en-US"
model_version: "1"
id: "1"
outputs {
  datatype: "BYTES"
  shape: 1
  shape: 1
raw_output_contents: "\000\000\000\000"

My inference Script:

import math
import numpy as np
import multiprocessing as mp
import tritonclient.grpc as grpcclient
from tritonclient.utils import *

mp.set_start_method('fork', force=True)

url = '<GRPC_endpoint>:50051'

triton_client = grpcclient.InferenceServerClient(url=url, verbose=True)

# triton_model = 'torch'
model_name = 'riva-punctuation-en-US'
# model_name = 'riva_punctuation'
model_version = '1'

print(triton_client.is_model_ready(model_name, model_version))

# input_strings = 'why am i calling my date of birth is 12/25/86 and my appointment is on 05/06 thank you 3 with 873 thanks my number is 435-634-5234'
input_strings = "hello my name is sharon may i have your name please sure jacqueline rosal please spell your last name is roscore r o s like sam t like tom o w 8306206877 i am locked out of my account i i guess i didn't write this 1 down my my user and my password to be honest i haven't used it a lot so i need to get back in i want to get it up and running and write it down so i can use it let me try that i dont know how i screwed up but i did okay let me get your account unlocked one moment okay all right that accounts unlocked now if you want to attempt to sign in again let's try okay all right i don't know my password that's the problem i think and i don't know what it so if you do not know your password when you get ready to log in your logging in from the landing page with a man and a woman making a salad no okay that's the page you want to sign in okay well go ahead ma'am i am in optum r x okay i i want you please in your computer are you on a computer yes i would you type in your u r l or your address bar optum r x dot com and just press so you can get to the correct sign in screen please okay welcome to optum r x so just click on sign in okay right okay yeah this is the same page i was on before that's okay okay so then just say forgot password okay i think i did this whole thing before and that's why i got locked up okay so going to say forgot user name again or continue oh no no no i am sorry no just continue i forgot my password not my user name so then i say call me or text me a message when i did this before i don't know it didnt work okay so now here is my thing 3285882858 the code let me see if i can get back into 0 okay 659659 okay i am good it just wants to put in a new password i can take it from here alrighty is there anything else i can help you with them thank you very much ms rostow it was my pleasure to assist you thank you for using optum r x and if you chose our survey it will be next okay thank you"

string_toks = input_strings.split(" ")
max_len = 128 - 2

num_splits = math.ceil(len(string_toks) / max_len)

chunks = [string_toks[i:i+max_len] for i in range(0, len(string_toks), max_len)]

out_list = []
final_string = ""
for idx, chunk in enumerate(chunks):
    input_data = [[" ".join(chunk)]]

    inputs = [grpcclient.InferInput("PIPELINE_INPUT", [1, 1], "BYTES")]

    to_array = np.array(input_data, dtype=np.object_)

    outputs = [grpcclient.InferRequestedOutput("PIPELINE_OUTPUT")]

    response = triton_client.infer(model_name, inputs, request_id=str(1), outputs=outputs)
    result = response.get_response()

    out_lists = response.as_numpy("PIPELINE_OUTPUT")
    out_string = out_lists[0][0].decode('utf-8')
    final_string += out_string

# final_string = final_string + "."

Riva conversion done as following:
riva-build punctuation --nn.use_trt_fp32 nemo_punct.rmir nemo_punct.riva
riva-deploy -f nemo_punct.rmir /data/atc_tenant/asr/lsfadmin/nvidia_flag_builds/new_configs/

Hi @manchanda.sahil7

Thanks for your interest in Riva

I will review the information and come back soon,
Quick question, after your finetuning/training your model, did you try inference with nemo itself, i.e does nemo produce outputs and Riva returns empty results, is that the case ?

If i can get a doc/steps on the Punctuation and Capitilization training you have referenced it would be nice, i believe you tried/used the below link
Can you check and let me know if something different was used


Hey rvinobha, Thankyou for the reply.

Yes, the nemo model works perfectly using the nemo reference.

and it was trained following the steps you mentioned. we followed this example : NeMo/Punctuation_and_Capitalization.ipynb at main · NVIDIA/NeMo · GitHub

Hi @manchanda.sahil7

I have a request from my end,

Can you please share your finetuned nemo punctuation and capitalization model along with configs.

I have sent an email to you regarding this request, Kindly request to share a google drive/One Drive link in my sent email