How Can I Execute In FP16 Mode?

Hi,

I’m using tensorRT in Gstreamer’s Custom plugins.

I converted onnx to trt engine by using onnx-tensorrt.

I get FP32 engine file and FP16 engine file.

And then, I executed two engines, but execute time are same.

Code Below,

// Initiate Engine Function
    infer = createInferRuntime(m_Logger);

    ifstream gieModelFile(model_file);
    if (!gieModelFile.good())
        cout << "file no exist error" << endl;

    gieModelFile.seekg(0, ios::end);
    unsigned int modelSize = gieModelFile.tellg();
    gieModelFile.seekg(0, ios::beg);

    vector<char> buff(modelSize);
    unsigned int i = 0;
    while (gieModelFile.get(buff[i++]));
    gieModelFile.close();

    engine = infer->deserializeCudaEngine((void*)buff.data(), modelSize, nvonnxparser::createPluginFactory(m_Logger));

    context = engine->createExecutionContext();

    memory_allocate();

And when I executed engine in FP32 and FP16, there are same execution time.

How can I execute in fp16 mode well?

Thanks.

And,

When I execute two engines in nvinfer plugin, it works well.

Is there anything else i need to do?