Inference with TensorRT model -- PyTorch BERT model > ONNX > TensorRT > Inference?

I have converted my BERT-trained model from PyTorch to ONNX and from ONNX to TensorRt. The challenge I am having now is getting the TensorRt model to return the probability values given an input text. When using the PyTorch or ONNX versions, the models take as input the input_ids and attention mask and yield the predictions (input_text_prediction --see below). Given that the TensorRt is the final conversion of the original PyTorch model, my intuition tells me that the TensorRt also needs to take the same inputs. I have been reading/following this guideline, but it seems like their approach is too complex, or am I oversimplifying the inference process using TensorRt?

Yes, I have done my Google/StackOverflow research and have not found an answer/guideline besides the above link. Any help/guidance resources are greatly appreciated.

 if ONNX:
        ort_inputs = {'input_ids':  encoding["input_ids"].cpu().reshape(1,  512).numpy(),
                                 'input_mask': encoding["attention_mask"].cpu().reshape(1, 512).numpy()}
        ort_outputs =,  ort_inputs) ## session_name--> generally defined 
        input_text_prediction = list(ort_outputs[0][0])
if pytorch_model:
        input_text_prediction = model_name(encoding["input_ids"], encoding["attention_mask"])
        input_text_prediction = input_text_prediction.detach().numpy()[0]