Please use below way to trim the onnx. For example, to trim https://forums.developer.nvidia.com/uploads/short-url/tc4tmv15H16NiFPOqOa1QNswBnM.onnx
import onnx_graphsurgeon as gs
import numpy as np
import onnx
model = onnx.load("cardbox_detection_with_yolov4_tiny.onnx")
graph = gs.import_onnx(model)
tensors = graph.tensors()
graph.inputs = [tensors["Input"].to_variable(dtype=np.float32, shape=( "N", 3,512,768))]
graph.outputs = [tensors["box"].to_variable(dtype=np.float32), tensors["cls"].to_variable(dtype=np.float32)]
graph.cleanup()
onnx.save(gs.export_onnx(graph), "cardbox_detection_with_yolov4_tiny_cut.onnx")
Then, no issue when run it with
import onnx
import onnxruntime
#model_path = "./cardbox_detection_with_yolov4_tiny.onnx"
model_path = "./cardbox_detection_with_yolov4_tiny_cut.onnx"
ort_session = onnxruntime.InferenceSession(model_path, None, providers=['CPUExecutionProvider'])
The BatchedNMS implementation can refer to https://github.com/NVIDIA/TensorRT/tree/23.08/plugin/batchedNMSPlugin