I have a Triton server deployed using v1.7.0b0 of the Nvidia Riva service.
I am successfully able to perform streaming STT requests using gRPC with Node.js v16.
However, 3-4 times every 2-3 hours, I get the following error from gRPC: 2 UNKNOWN: TRTIS response timeout. This causes my server to fail since it’s an unhandled exception.
I cannot handle this specific error using grpc-js because it seems to be caused after a client has shutdown/closed, however, I am able to handle other errors.
I can see in the logs the following when this happens:
Hi @NVES_R, I wasn’t able to reproduce the same error, instead, I now get the following instead of the TRTIS error:
2 UNKNOWN: in ensemble <MODEL_NAME>, inference request for sequence 324231651 to model '<MODEL_NAME>-feature-extractor-streaming' must specify the START flag on the first request of the sequence
We’re processing hundreds of requests successfully, any reason why this happens intermittently?
I filed a ticket with the RIVA team to help take a look at this as it doesn’t seem like a Triton-specific issue at the moment. We’ll get back to you on this, I don’t have any immediate short term workarounds right now, please stay tuned.
After doing some more debugging, here’s a full TypeScript example of how we were able to handle the responses from the server properly.
Not this is not tested and was copy-pasted from our implementation, removing internal usages and code.
Some more documentation/examples in the RIVA docs would help here :)
We are still seeing the errors, but now, we can at least handle them properly.
As shown in the code below, we restart the connection during certain cases.
import { credentials, StatusObject } from '@grpc/grpc-js';
import { RivaSpeechRecognitionClient } from './lib/riva_asr_grpc_pb';
import {
RecognitionConfig,
SpeechRecognitionAlternative,
StreamingRecognitionConfig,
StreamingRecognitionResult,
StreamingRecognizeRequest,
StreamingRecognizeResponse,
} from './lib/riva_asr_pb';
import { AudioEncoding } from './lib/riva_audio_pb';
let isServerReady = false;
let isChannelClosed = true;
let stream;
let SttClientImpl;
let restartCount = 0;
let config = {}; // Set this up accordingly
const SERVER_URL = "localhost:50051";
const SERVER_READY_TIMEOUT = 2000; // ms
// Deadline is always UNIX epoch time + milliseconds in the future when you want the deadline to expire.
const getDeadline = (timeout: number): Date => {
// TS doesn't like adding `number` to `Date` directly, so create a new instance of date.
return new Date(new Date().getTime() + timeout);
};
function onError(error: Error): void {
isServerReady = false;
stream.removeAllListeners();
stream = null;
SttClientImpl.close();
SttClientImpl = null;
// Optional logic to restart connection
if (isChannelClosed || restartCount >= 3) {
// Handle Error
console.log(error);
} else {
restart();
}
}
function onEnd(): void {
stream.removeAllListeners();
stream = null;
SttClientImpl.close();
SttClientImpl = null;
if (isChannelClosed) {
console.log('STT client closed');
} else {
// Connection was closed by Riva
console.log('Lost connection with Riva');
restart();
}
}
function onStatus(status: StatusObject): void {
// Status usually sends a code 2 or 14, meaning an error or connection lost
isServerReady = false;
console.log(`Status - Code: ${status.code}, Detail: ${status.details}, Meta: ${JSON.stringify(status.metadata.toJSON())}`);
}
function onData(data: StreamingRecognizeResponse): void {
const message = data.toObject();
// No data, do nothing
if (message.resultsList.length === 0) {
return;
}
// Do stuff with response
}
function processAudio(data: ArrayBuffer): void {
if (isChannelClosed) {
console.log('Client closed channel. Dropping incoming audio.');
return;
}
if (!isServerReady) {
console.log('Attempting to send audio data when server is not ready.');
// You can store the audio somewhere if you don't want to drop it
// audioBuffer = Buffer.concat([audioBuffer, Buffer.from(data)]);
return;
}
const streamingRequest = new StreamingRecognizeRequest();
// Depending on your use case, you might need to create a copy of the data so in the
// case that the underlying memory is freed by your HTTP/WS/gRPC client.
// Using `slice()` or `from()` does not return a copy of the data, only a reference.
streamingRequest.setAudioContent(Buffer.concat([Buffer.from(data)]));
if (stream.writable) {
stream.write(streamingRequest);
} else {
console.log('Attempting to send audio data when stream is not writable.');
// audioBuffer = Buffer.concat([audioBuffer, Buffer.from(data)]);
}
}
function restart(): void {
console.log('Restarting STT client');
restartCount++;
_setupRecognitionClient();
return;
}
function close(): void {
isChannelClosed = true;
isServerReady = false;
if (stream) {
stream.end();
}
console.log('STT client closing');
}
function _createRecognitionConfig(): RecognitionConfig {
const recognitionConfig = new RecognitionConfig(); recognitionConfig.setEncoding(AudioEncoding.LINEAR_PCM);
recognitionConfig.setSampleRateHertz(SUPPORTED_SAMPLE_RATE);
recognitionConfig.setAudioChannelCount(SUPPORTED_CHANNELS);
recognitionConfig.setMaxAlternatives(config.maxAlternatives ?? 1);
recognitionConfig.setEnableWordTimeOffsets(config.enableWordTimeOffsets ?? true);
recognitionConfig.setLanguageCode(config.languageCode ?? DEFAULT_LANGUAGE_CODE);
recognitionConfig.setModel(config.modelName ?? '');
return recognitionConfig;
}
function _createStreamingRecognitionConfig(recognitionConfig: RecognitionConfig): StreamingRecognitionConfig {
const streamingRecognitionConfig = new StreamingRecognitionConfig();
streamingRecognitionConfig.setConfig(recognitionConfig);
streamingRecognitionConfig.setInterimResults(config.enableInterimResults ?? true);
return streamingRecognitionConfig;
}
function _setupRecognitionClient(): void {
SttClientImpl = new RivaSpeechRecognitionClient(SERVER_URL, credentials.createInsecure());
SttClientImpl.waitForReady(getDeadline(SERVER_READY_TIMEOUT), (e: Error) => {
if (e) {
onError(e);
return;
}
stream = SttClientImpl.streamingRecognize();
// If you are using a class, you need to do `this.<methodName>.bind(this)` instead of just passing `<functionName>`.
stream.on('data', onData);
stream.on('error', onError);
stream.on('close', close);
stream.on('status', onStatus);
stream.on('end', onEnd);
const recognitionConfig = _createRecognitionConfig();
const streamingRecognitionConfig = _createStreamingRecognitionConfig(recognitionConfig);
const streamingRecognitionRequest = new StreamingRecognizeRequest();
streamingRecognitionRequest.setStreamingConfig(streamingRecognitionConfig);
// The first call to the client must be the request with the config
stream.write(streamingRecognitionRequest, () => {
isServerReady = true;
restartCount = 0;
});
});
}
function main() {
_setupRecognitionClient();
// Have some loop/callback which calls `processAudio`
}
I don’t know if we should mark this as resolved though since despite the async nature of NodeJS and us sending the data in order, we are still seeing the error.
We have only seen this occur when the Riva server’s capacity has been exceeded and it cannot keep up with the rate of requests. Is this occurring when the server is under full load? We’re investigating further…
This is interesting… Yes we do have peak usage times when the server is under heavy load.
It could be that we are not scaling fast enough to handle the requests.
What is the recommended approach here (if you know)? Deploy more instances of the model on the same GPU or scale to more instances/GPUs?