Hardware - GPU V100
Riva Version: riva-speech:2.4.0-server
How to reproduce the issue ?
- Fine tune a nemo punctuation and capitalization model.
- Convert it to rive using nvidia guide
- Deploy it using riva-speech:2.7.0 and it is giving an empty response.
Deploy command
NV_GPU=1 nvidia-docker run -ti --name riva_source -p 8008:8008 -p 50051:50051 -p 5800:8000 -p 58001:8001 -v <localtion_models>:/data/models -e MODEL_REPOSITORY=“–model-repository /data/models” -e INTERIM_RESULTS=“–interim-results” -e CUSTOM_ITN=“–custom-itn” --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 riva-speech:2.4.0-server tritonserver --log-verbose=0 --strict-model-config=true --model-repository /data/models --cuda-memory-pool-byte-size=0:1000000000 --exit-on-error=false
When i do the inference using Nemo, it works perfectly but when i use inference using GRPC endpoint of riva server , the output is empty :
Output looks like this :
raw_input_contents: "\372\000\000\000in a new password i can take it from here alrighty is there anything else i can help you with them thank you very much ms rostow it was my pleasure to assist you thank you for using optum r x and if you chose our survey it will be next okay thank you"
model_name: "riva-punctuation-en-US"
model_version: "1"
id: "1"
outputs {
name: "PIPELINE_OUTPUT"
datatype: "BYTES"
shape: 1
shape: 1
}
raw_output_contents: "\000\000\000\000"
My inference Script:
import math
import numpy as np
import multiprocessing as mp
import tritonclient.grpc as grpcclient
from tritonclient.utils import *
mp.set_start_method('fork', force=True)
url = '<GRPC_endpoint>:50051'
triton_client = grpcclient.InferenceServerClient(url=url, verbose=True)
print(triton_client.is_server_live())
# triton_model = 'torch'
model_name = 'riva-punctuation-en-US'
# model_name = 'riva_punctuation'
model_version = '1'
print(triton_client.is_model_ready(model_name, model_version))
# input_strings = 'why am i calling my date of birth is 12/25/86 and my appointment is on 05/06 thank you 3 with 873 thanks my number is 435-634-5234'
input_strings = "hello my name is sharon may i have your name please sure jacqueline rosal please spell your last name is roscore r o s like sam t like tom o w 8306206877 i am locked out of my account i i guess i didn't write this 1 down my my user and my password to be honest i haven't used it a lot so i need to get back in i want to get it up and running and write it down so i can use it let me try that i dont know how i screwed up but i did okay let me get your account unlocked one moment okay all right that accounts unlocked now if you want to attempt to sign in again let's try okay all right i don't know my password that's the problem i think and i don't know what it so if you do not know your password when you get ready to log in your logging in from the landing page with a man and a woman making a salad no okay that's the page you want to sign in okay well go ahead ma'am i am in optum r x okay i i want you please in your computer are you on a computer yes i would you type in your u r l or your address bar optum r x dot com and just press so you can get to the correct sign in screen please okay welcome to optum r x so just click on sign in okay right okay yeah this is the same page i was on before that's okay okay so then just say forgot password okay i think i did this whole thing before and that's why i got locked up okay so going to say forgot user name again or continue oh no no no i am sorry no just continue i forgot my password not my user name so then i say call me or text me a message when i did this before i don't know it didnt work okay so now here is my thing 3285882858 the code let me see if i can get back into 0 okay 659659 okay i am good it just wants to put in a new password i can take it from here alrighty is there anything else i can help you with them thank you very much ms rostow it was my pleasure to assist you thank you for using optum r x and if you chose our survey it will be next okay thank you"
string_toks = input_strings.split(" ")
max_len = 128 - 2
num_splits = math.ceil(len(string_toks) / max_len)
chunks = [string_toks[i:i+max_len] for i in range(0, len(string_toks), max_len)]
out_list = []
final_string = ""
for idx, chunk in enumerate(chunks):
input_data = [[" ".join(chunk)]]
inputs = [grpcclient.InferInput("PIPELINE_INPUT", [1, 1], "BYTES")]
to_array = np.array(input_data, dtype=np.object_)
inputs[0].set_data_from_numpy(to_array)
outputs = [grpcclient.InferRequestedOutput("PIPELINE_OUTPUT")]
response = triton_client.infer(model_name, inputs, request_id=str(1), outputs=outputs)
result = response.get_response()
out_lists = response.as_numpy("PIPELINE_OUTPUT")
print(out_lists)
out_string = out_lists[0][0].decode('utf-8')
print(out_string)
out_list.append(out_string)
final_string += out_string
# final_string = final_string + "."
print(final_string)