I have converted my BERT-trained model from PyTorch to ONNX and from ONNX to TensorRt. The challenge I am having now is getting the TensorRt model to return the probability values given an input text. When using the PyTorch or ONNX versions, the models take as input the input_ids and attention mask and yield the predictions (input_text_prediction --see below). Given that the TensorRt is the final conversion of the original PyTorch model, my intuition tells me that the TensorRt also needs to take the same inputs. I have been reading/following this guideline, but it seems like their approach is too complex, or am I oversimplifying the inference process using TensorRt?
Yes, I have done my Google/StackOverflow research and have not found an answer/guideline besides the above link. Any help/guidance resources are greatly appreciated.
if ONNX:
ort_inputs = {'input_ids': encoding["input_ids"].cpu().reshape(1, 512).numpy(),
'input_mask': encoding["attention_mask"].cpu().reshape(1, 512).numpy()}
ort_outputs = session_name.run(None, ort_inputs) ## session_name--> generally defined
input_text_prediction = list(ort_outputs[0][0])
if pytorch_model:
input_text_prediction = model_name(encoding["input_ids"], encoding["attention_mask"])
input_text_prediction = input_text_prediction.detach().numpy()[0]