can we write IHostMemory into a file, and read the file to deserializeCudaEngine?

how serialize:

trtModelStream = engine->serialize();
	fstream ifile;
	ifile.open(filename, fstream::in | fstream::out | fstream::trunc);
	std::cout << (int)trtModelStream->type() << "\n"; // why type is kINT8? our net handle float
	ifile.write((char*)trtModelStream->data(), trtModelStream->size()); // write into a file

how deserialize:

fstream ifile;
	ifile.open(engine_txt_path, fstream::in | fstream::out |  std::ios::ate);
	ifstream::pos_type pos = ifile.tellg();
	size_t fsize = (size_t)pos;
	std::cout << pos << " " << fsize << "\n";
	char* tmp_data = new char[pos];
	ifile.seekg(0, ios_base::beg);
	ifile.read(tmp_data, pos);
	nvinfer1::ICudaEngine* engine = runtime->deserializeCudaEngine(tmp_data, fsize, nullptr);

and invalid memory access in deserializeCudaEngine?

what should i do?

Hello,

I don’t see obvious issues with your serial/deserialize code. Can you share a small repro that demonstrates how you are getting the memory access error during deserialization?

Can you also post the full error logs?

hello,
I also have the same problem. Have you solved this issue?

Hi 328703810, 1043140993,

Especially if you’re on Windows, be careful to read and write in binary mode by adding

std::ios::binary

to your stream flags.
(through the

|

operator)

You can see examples of this in the trtexec source from github. Reading an engine: https://github.com/NVIDIA/TensorRT/blob/07ed9b57b1ff7c24664388e5564b17f7ce2873e5/samples/common/sampleEngines.cpp#L450

And writing an engine: https://github.com/NVIDIA/TensorRT/blob/07ed9b57b1ff7c24664388e5564b17f7ce2873e5/samples/common/sampleEngines.cpp#L480

Cheers,
Tom

Hi tom,

I tried your code for read binary model on linux. There still got core dumped when I try to deserializeCudaEngine. I’m very confused.

errors:
[I] build engine 1…
[I] read gie model 1…
[I] read gie model 9…
Segmentation fault (core dumped)

Here is my code:

IRuntime* runtime = createInferRuntime(gSSHLogger.getTRTLogger());
assert(runtime != nullptr);
PluginFactory pluginFactory;
gLogInfo << "build engine 1..." << std::endl;
ICudaEngine* engine = nullptr;
if (trtModelStream != nullptr)
{
    engine = runtime->deserializeCudaEngine(
        trtModelStream->data(), trtModelStream->size(), &pluginFactory);
    gLogInfo << "build engine 3..." << std::endl;
}
else
{
    gLogInfo << "read gie model 1..." << std::endl;
    std::ifstream engineFile(cache_path, std::ios::binary);

    engineFile.seekg(0, engineFile.end);
    long int fsize = engineFile.tellg();
    engineFile.seekg(0, engineFile.beg);

    std::vector<char> engineData(fsize);
    engineFile.read(engineData.data(), fsize);

    gLogInfo << "read gie model 9..." << std::endl;

    engine = runtime->(engineData.data(), fsize, &pluginFactory);

    gLogInfo << "read gie model 2..." << std::endl;
}

Hi Amy_21,

[note: you can format code in this forum for easier readability with the </> tab]

Did you write in binary mode too?
Can you get a backtrace in, eg, gdb to see where TensorRT is segfaulting?
I see you have plugins, are you sure your plugins properly implement serialization?
You can also run under valgrind to see if you have any wild reads or writes (perhaps in your serialization code), or other memory issues (this takes some time, but is often worth it – sanitizers, on the other hand, don’t seem compatible with CUDA in my experience)

Also, towards the end of your code, you write

engine = runtime->(engineData.data(), fsize, &pluginFactory);

which I’m assuming is a typo.

Good luck,
Tom

Hi Tom,

Thank you for your answer, the typo is a pasting error.
Yes, I write in binary mode too.
My model has a custom layer, and I can see nvidia sample used PluginFactory.
When I deserializeCudaEngine from stream, the code worked. But from a disk file, the code went core dumped.
From the gdb trace, I can’t see where the crash come from.

// deserializeCudaEngine 
    IRuntime* runtime = createInferRuntime(gSSHLogger.getTRTLogger());
    assert(runtime != nullptr);
    PluginFactory pluginFactory;
    ICudaEngine* engine = nullptr;
    if (trtModelStream != nullptr)
    {
        engine = runtime->deserializeCudaEngine(
            trtModelStream->data(), trtModelStream->size(), &pluginFactory);
    }
    else
    {
        gLogInfo << "read gie model 1..." << std::endl;
        std::ifstream engineFile(cache_path, std::ifstream::binary);

        engineFile.seekg(0, engineFile.end);
        long int fsize = engineFile.tellg();
        engineFile.seekg(0, engineFile.beg);
    
        std::vector<char> engineData(fsize);
        engineFile.read(reinterpret_cast<char*>(engineData.data()), fsize);

        gLogInfo << "read gie model 9..." << std::endl;
 
        engine = runtime->deserializeCudaEngine(engineData.data(), fsize, &pluginFactory);

        gLogInfo << "read gie model 2..." << std::endl;
    }


        // serialize
        PluginFactory parserPluginFactory;

        caffeToTRTModel(
                        "test_ssh.prototxt",
                        "SSH.caffemodel",
                        std::vector<std::string> {OUTPUT_BLOB_NAME0, OUTPUT_BLOB_NAME1},
                        N, &parserPluginFactory, trtModelStream);
        parserPluginFactory.destroyPlugin();
        assert(trtModelStream != nullptr);
        saveGIEModel(trtModelStream, &cache_path);

void caffeToTRTModel(const std::string& deployFile,           // Name for caffe prototxt
                     const std::string& modelFile,            // Name for model
                     const std::vector<std::string>& outputs, // Network outputs
                     unsigned int maxBatchSize,               // Batch size - NB must be at least as large as the batch we want to run with)
                     nvcaffeparser1::IPluginFactoryExt* pluginFactory, // factory for plugin layers
                     IHostMemory*& trtModelStream)            // Output stream for the TensorRT model
{
    // Create the builder
    IBuilder* builder = createInferBuilder(gSSHLogger.getTRTLogger());
    assert(builder != nullptr);

    // Parse the caffe model to populate the network, then set the outputs
    INetworkDefinition* network = builder->createNetwork();
    ICaffeParser* parser = createCaffeParser();
    parser->setPluginFactoryExt(pluginFactory);

    bool fp16 = builder->platformHasFastFp16();
    const IBlobNameToTensor* blobNameToTensor = parser->parse(locateMyFile(deployFile).c_str(),
                                                              locateMyFile(modelFile).c_str(),
                                                              *network, fp16 ? DataType::kHALF : DataType::kFLOAT);
    gLogInfo << "support fp16: " << fp16 << std::endl;

    // Specify which tensors are outputs
    for (auto& s : outputs)
        network->markOutput(*blobNameToTensor->find(s.c_str()));

    // Build the engine
    builder->setMaxBatchSize(maxBatchSize);
    builder->setMaxWorkspaceSize(10 << 20); // We need about 6MB of scratch space for the plugin layer for batch size 5
    builder->setFp16Mode(fp16);

    gLogInfo << "Begin building engine..." << std::endl;
    ICudaEngine* engine = builder->buildCudaEngine(*network);
    assert(engine);
    gLogInfo << "End building engine..." << std::endl;

    // We don't need the network any more, and we can destroy the parser
    network->destroy();
    parser->destroy();

    // Serialize the engine, then close everything down
    trtModelStream = engine->serialize();

    engine->destroy();
    builder->destroy();
    shutdownProtobufLibrary();
}

void saveGIEModel(IHostMemory*& trtModelStream, std::string* cache_path)
{
    std::ofstream ofs(*cache_path, std::ofstream::binary);
    ofs.write(reinterpret_cast<char*>(trtModelStream->data()), trtModelStream->size());
    ofs.close();
}

the gdb trace:

[New Thread 0x7fffcf75e700 (LWP 32186)]
[New Thread 0x7fffcef5d700 (LWP 32187)]
[New Thread 0x7fffce6db700 (LWP 32195)]
[I] build engine 1...
[I] read gie model 1...
[I] read gie model 9...

Thread 1 "sample_SSH" received signal SIGSEGV, Segmentation fault.
0x0000555555563998 in std::vector<float, std::allocator<float> >::size() const ()
(gdb)

Best regards,
Amy

Hi Amy,

When you catch the SIGSEGV in gdb, you should run bt to get a backtrace. Also, when using gdb or valgrind, it’s important to build with -g for debug information in your build in order to get sensible backtraces. You may also require a debug build.

As I said, even though it will be slow, I’d still encourage you to run through valgrind to see if you have any memory issues.

Also, make sure your file read succeeded: after engineFile.read, you could add a block like this, from trtexec:

if (!engineFile)
    {
        err << "Error loading engine file: " << engine << std::endl;
        return nullptr;
    }

Cheers,
Tom

Hi Tom,

I ran gdb bt, and found the reason for core dumped was in my custom layer code.
My serialize / read / write functions used vector, which I changed into float, and so my problom got solved.

Thanks for your help.

Best regards,
Amy