Hi,
I’m using tensorRT in Gstreamer’s Custom plugins.
I converted onnx to trt engine by using onnx-tensorrt.
I get FP32 engine file and FP16 engine file.
And then, I executed two engines, but execute time are same.
Code Below,
// Initiate Engine Function
infer = createInferRuntime(m_Logger);
ifstream gieModelFile(model_file);
if (!gieModelFile.good())
cout << "file no exist error" << endl;
gieModelFile.seekg(0, ios::end);
unsigned int modelSize = gieModelFile.tellg();
gieModelFile.seekg(0, ios::beg);
vector<char> buff(modelSize);
unsigned int i = 0;
while (gieModelFile.get(buff[i++]));
gieModelFile.close();
engine = infer->deserializeCudaEngine((void*)buff.data(), modelSize, nvonnxparser::createPluginFactory(m_Logger));
context = engine->createExecutionContext();
memory_allocate();
And when I executed engine in FP32 and FP16, there are same execution time.
How can I execute in fp16 mode well?
Thanks.