Thanks @AakankshaS for your answer. I used the solution in this link and it is working. I just needed to store the GPU or CPU pointers in the 2-element array buf, and pass the array to executeV2.
If the input image has a higher resolution than the ONNX input resolution, is there any helping code for splitting the image into patches like tf.image.extract_patches or torch.unfold/torch.fold?