Error while doing inference with "deserializeCudaEngine" engine which would do "setWeights" for a Conv. layer

Description

With tensorRT 7.1.3, encounter a error in “enqueue” function while doing inference with “deserializeCudaEngine” engine which would do “setWeights” for a Conv. layer, but its OK with tensorRT 7.0.0.

Environment

TensorRT Version: tensorRT 7.1.3
GPU Type: RTX 2080
Nvidia Driver Version: 451.82
CUDA Version: 10.2
CUDNN Version: 8.0
Operating System + Version: Windows 10
Python Version (if applicable): 3.7.6
PyTorch Version (if applicable): 1.5

Is this issue reported before? how can i solver it ? thanks!

Hi @higher127,
Can you please share your verbose error logs, script and the model, so that i can assist you better.
Thanks!

Thanks for your reply. My working project is based on the official routine of tensorRT – “TensorRT-7.1.3.4 -> samples -> sampleMNIST”, and i just modified ‘build()’ function as below: (you should repeat the error easily)

bool SampleMNIST::build()
{
std::string serializeFile = “serializeFile.bin”;
if (_access(serializeFile.c_str(), 0) == -1)
{
auto builder = SampleUniquePtrnvinfer1::IBuilder(nvinfer1::createInferBuilder(sample::gLogger.getTRTLogger()));
if (!builder)
{
return false;
}

	auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetwork());
	if (!network)
	{
		return false;
	}

	auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
	if (!config)
	{
		return false;
	}

	auto parser = SampleUniquePtr<nvcaffeparser1::ICaffeParser>(nvcaffeparser1::createCaffeParser());
	if (!parser)
	{
		return false;
	}

	if (!constructNetwork(parser, network))
	{
		return false;
	}

	builder->setMaxBatchSize(mParams.batchSize);
	config->setMaxWorkspaceSize(16_MiB);
	config->setFlag(BuilderFlag::kREFIT);
	config->setFlag(BuilderFlag::kGPU_FALLBACK);
	config->setFlag(BuilderFlag::kSTRICT_TYPES);
	if (mParams.fp16)
	{
		config->setFlag(BuilderFlag::kFP16);
	}
	if (mParams.int8)
	{
		config->setFlag(BuilderFlag::kINT8);
	}

	samplesCommon::enableDLA(builder.get(), config.get(), mParams.dlaCore);

	mEngine = std::shared_ptr<nvinfer1::ICudaEngine>(
		builder->buildEngineWithConfig(*network, *config), samplesCommon::InferDeleter());
	if (!mEngine)
		return false;

	assert(network->getNbInputs() == 1);
	mInputDims = network->getInput(0)->getDimensions();
	assert(mInputDims.nbDims == 3);

	// serialize the engine, then save it into disk.
	IHostMemory* gieModelStream{ nullptr };
	gieModelStream = mEngine->serialize();
	std::ofstream out(serializeFile.c_str(), std::ios::binary);
	if (out.is_open()) {
		out.write(reinterpret_cast<const char*>(gieModelStream->data()), gieModelStream->size());
		out.close();
	}
	else {
		std::cout << "Saving serializeFile error !" << std::endl;
		return false;
	}
	if (gieModelStream) {
		gieModelStream->destroy();
		gieModelStream = nullptr;
	}
}
else
{
	std::ifstream file(serializeFile, std::ios::binary);
	if (!file.is_open())
	{
		std::cout << "Open serializeFile error !" << std::endl;
		return false;
	}
	IRuntime* runtime = createInferRuntime(sample::gLogger.getTRTLogger());
	std::vector<char> trtModelStream;
	size_t size{ 0 };
	if (file.good())
	{
		file.seekg(0, file.end);
		size = file.tellg();
		file.seekg(0, file.beg);
		trtModelStream.resize(size);
		file.read(trtModelStream.data(), size);
		file.close();
	}
	mEngine = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(trtModelStream.data(), trtModelStream.size(), nullptr),
		samplesCommon::InferDeleter());
	if (!mEngine) { return false; }
	runtime->destroy();
	mInputDims = mEngine->getBindingDimensions(0);	
}
//
auto refitter = SampleUniquePtr<nvinfer1::IRefitter>(createInferRefitter(*mEngine, sample::gLogger));
if (refitter == nullptr) {
	std::cout << "engine is unrefittable !" << std::endl;
	return false;
}
const size_t kernel_size_ = 20 * 5 * 5;
float kernel[kernel_size_] = {0.0f};
Weights newWeights;
newWeights.count = kernel_size_;
newWeights.type = nvinfer1::DataType::kFLOAT;
newWeights.values = kernel;
if (!refitter->setWeights("conv1", WeightsRole::kKERNEL, newWeights)) {
	std::cout << "Set new weights error !" << std::endl;
	return false;
}
float bias[20] = { 0.0f };
Weights newBias;
newBias.count = 20;
newBias.type = nvinfer1::DataType::kFLOAT;
newBias.values = bias;
if (!refitter->setWeights("conv1", WeightsRole::kBIAS, newBias)) {
	std::cout << "Set new weights error !" << std::endl;
	return false;
}
// Find all weight parameters need update with above modification
int n = refitter->getMissing(0, nullptr, nullptr);
if (n > 0)
{
	std::vector<const char*> layerNames(n);
	std::vector<WeightsRole> weightsRoles(n);
	if (refitter->getMissing(n, layerNames.data(), weightsRoles.data()) != n)
	{
		return false;
	}
	for (int i = 0; i < n; ++i) {
		refitter->setWeights(layerNames[i], weightsRoles[i], Weights{});
	}
}
if (refitter->refitCudaEngine() != true)
{
	return false;
}

return true;

}

Thanks again !

Hi @higher127,
Request you to share the --verbose logs.

Thanks!

Do you mean using “trtexec.exe” to get verbose log ? If so, what verbose log of operations you need ? Sorry, I am new to use tensorRT. Thanks a lot.

Hi @higher127,
I tried compiling sampleMNIST with the suggested changes, but couldnt because of some missing variables.
can you please upload the entire cpp file so as to avoid any mismatches.
Also please share the error logs from console.

Thanks!

Hi @AakankshaS,
The crash information is:
Exception thrown at 0x00007FFFE0B5A719 in sample_mnist.exe: Microsoft C++ exception: std::out_of_range at memory location 0x000000EE409BEDF0.
Unhandled exception at 0x00007FFF3E8D4001 (nvinfer.dll) in sample_mnist.exe: Fatal program exit requested.
And the entire cpp file you can read below:
sampleMNIST.cpp (17.4 KB)
Thanks a lot.

Hi @AakankshaS,
Can you repeat the crash issue with the entire cpp file ?
Thanks.

Hi @higher127,
Apologies for the miss.
Are you still facing the issue?

Thanks!

Hi @AakankshaS,
Thanks for your reply.
I am still facing the issue, but I do not know if the latest version have fixed the issue or not, as I have not updated it(still 7.1.3.4).
Thanks again.

[

higher127

higher127@163.com

](https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=higher127&uid=higher127%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=["higher127%40163.com"])

签名由 网易邮箱大师 定制

On 12/1/2020 14:34,Aakankshas via NVIDIA Developer Forumsnvidia@discoursemail.com wrote:

Hi @higher127,
Can you please check if you have shared all the dependent files, as i could not compile your script successfully.
Thanks!