Saving and loading serialized engine in Windows 10

Description

I am trying to serialize an engine, save it to file, and then later load the engine from and deserialize it.

I saw the documentation on this, which suggests:

IHostMemory *serializedModel = engine->serialize();
// store model to disk
// <…>
serializedModel->destroy();

And for loading:

IRuntime* runtime = createInferRuntime(gLogger);
ICudaEngine* engine = runtime->deserializeCudaEngine(modelData, modelSize, nullptr);

I also looked at this blog post which also discusses serializing and deserializing the engine.

Finally, I found this github issue which shows someone doing this in a Windows environment.

I then attempted to do this with the MNist ONNX example.

I added the following to the top of my code:

std::string engineFile = "SerializedEngineTest.engine";
bool serializeEngine = false;

And in the main routine:

if (serializeEngine) {
    if (!sample.build())
    {
        return gLogger.reportFail(sampleTest);
    }
}
else {
    if (!sample.derserializeEngineFromFile())
    {
        return gLogger.reportFail(sampleTest);
    }
}

I defined two functions:

bool SampleOnnxMNIST::build()
{
    auto builder = SampleUniquePtr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(gLogger.getTRTLogger()));
    if (!builder)
    {
        return false;
    }

    const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
    auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(explicitBatch));
    if (!network)
    {
        return false;
    }

    auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
    if (!config)
    {
        return false;
    }

    auto parser = SampleUniquePtr<nvonnxparser::IParser>(nvonnxparser::createParser(*network, gLogger.getTRTLogger()));
    if (!parser)
    {
        return false;
    }

    auto constructed = constructNetwork(builder, network, config, parser);
    if (!constructed)
    {
        return false;
    }

    mEngine = std::shared_ptr<nvinfer1::ICudaEngine>(
        builder->buildEngineWithConfig(*network, *config), samplesCommon::InferDeleter());

    if (!mEngine)
    {
        gLogInfo << "false engine..?" << std::endl;
        return false;
    }

    // Serialize engine and save to file
    std::ofstream p(engineFile.c_str(), std::ios::binary);
    IHostMemory* m_ModelStream = mEngine->serialize();
    p.write(reinterpret_cast<const char*>(m_ModelStream->data()), m_ModelStream->size());
    p.close();

    gLogInfo << "network->getNbInputs()" << network->getNbInputs() << std::endl;
    assert(network->getNbInputs() == 1);


    mInputDims = network->getInput(0)->getDimensions();
    gLogInfo << "mInputDims = " << mInputDims << std::endl;
    gLogInfo << "mInputDims.nbDims = " << mInputDims.nbDims << std::endl;
    assert(mInputDims.nbDims == 4);

    gLogInfo << "network->getNbOutputs() = " << network->getNbOutputs() << std::endl;
    gLogInfo << "network->getOutput(0)->getDimensions() = " << network->getOutput(0)->getDimensions() << std::endl;
    assert(network->getNbOutputs() == 1);
    mOutputDims = network->getOutput(0)->getDimensions();
    gLogInfo << "mOutputDims = " << mOutputDims << std::endl;
    gLogInfo << "mOutputDims.nbDims = " << mOutputDims.nbDims << std::endl;
    assert(mOutputDims.nbDims == 2);

    return true;
}

And:

bool SampleOnnxMNIST::derserializeEngineFromFile()
{
    // Load engine from file and de-serialize
    std::vector<char> trtModelStreamfromFile;
    size_t size{ 0 };
    std::ifstream file(engineFile, std::ios::binary);
    if (file.good())
    {
        file.seekg(0, file.end);
        size = file.tellg();
        file.seekg(0, file.beg);
        trtModelStreamfromFile.resize(size);
        file.read(trtModelStreamfromFile.data(), size);
        file.close();
        IRuntime* runtime = createInferRuntime(gLogger.getTRTLogger());
        ICudaEngine* mEngine = runtime->deserializeCudaEngine(trtModelStreamfromFile.data(), size, nullptr);
    }

    if (!mEngine)
    {
        gLogInfo << "false engine..?" << std::endl;
        return false;
    }

    return true;
}

When I set

bool serializeEngine = true;

and run the program, it saves to file.

When I set:

bool serializeEngine = false;

and run the program, in the derserializeEngineFromFile() method, on the line:

if (!mEngine)

I am noticing that mEngine is empty.

What am I doing incorrectly?

Environment

TensorRT Version: 7.0
CUDA Version: 10.2
CUDNN Version: 7.6.5
Operating System + Version: Windows 10 64-bit
Visual Studio Version: 2019

Please refer to this sample in case it helps:

Thanks

@SunilJB Thanks for the reply. So as a test I am trying to run the example, but I get this error:

Refer to the GitHub: sampleUffFasterRCNN/README.md file for detailed information about how this sample works, sample code, and step-by-step instructions on how to run and verify its output.

Thanks