TensorRT can not accelarate the onnx model for inferencing

xieshichen2012 · April 16, 2020, 8:42am

i modified the sampleonnxMINST for my own onnx model, and i have a confusion about how the inputdata in the buffer is reshaped for the model input size? i just follow the sample and changed the input data to A one-dimensional vector for input. However, i found that the TensorRT can not accelarate the onnx model for inferencing, i can obtain 20ms for pytorch inference but 30ms for tensorrt. I don’t know why, maybe there is some bugs in my code. Can anybody give me some advice to slove my problem? Thanks!

A clear and concise description of the bug or issue.

Environment

TensorRT Version : 7.0.0.11
GPU Type : 2080i
Nvidia Driver Version :
CUDA Version : 10.0
CUDNN Version : 7.6.0
Operating System + Version : Vs2017 Community
Python Version (if applicable) : 3.6.7
TensorFlow Version (if applicable) : /
PyTorch Version (if applicable) : 1.3.0
Baremetal or Container (if container which image + tag) :

Relevant Files

Here is my code:

//!
//! sampleOnnxMNIST.cpp
//! This file contains the implementation of the ONNX MNIST sample. It creates the network using
//! the MNIST onnx model.
//! It can be run with the following command line:
//! Command: ./sample_onnx_mnist [-h or --help] [-d=/path/to/data/dir or --datadir=/path/to/data/dir]
//! [–useDLACore=]
//!
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/highgui/highgui_c.h>
#include <opencv2/opencv.hpp>
#include “argsParser.h”
#include “buffers.h”
#include “common.h”
#include “logger.h”
#include “parserOnnxConfig.h”

#include “NvInfer.h”
#include <cuda_runtime_api.h>

#include
#include
#include
#include
#include
#include
#include <numpy/arrayobject.h>

const std::string gSampleName = “TensorRT.sample_onnx_mnist”;

//! \brief The SampleOnnxMNIST class implements the ONNX MNIST sample
//!
//! \details It creates the network using an ONNX model
//!
class SampleOnnxMNIST
{
template
using SampleUniquePtr = std::unique_ptr<T, samplesCommon::InferDeleter>;

public:
SampleOnnxMNIST(const samplesCommon::OnnxSampleParams& params)
: mParams(params)
, mEngine(nullptr)
{
}

//!
//! \brief Function builds the network engine
//!
bool build();

//!
//! \brief Runs the TensorRT inference engine for this sample
//!
bool infer();

private:
samplesCommon::OnnxSampleParams mParams; //!< The parameters for the sample.

nvinfer1::Dims mInputDims;  //!< The dimensions of the input to the network.
nvinfer1::Dims mOutputDims; //!< The dimensions of the output to the network.
int mNumber{ 0 };             //!< The number to classify

std::shared_ptr<nvinfer1::ICudaEngine> mEngine; //!< The TensorRT engine used to run the network

//!
//! \brief Parses an ONNX model for MNIST and creates a TensorRT network
//!
bool constructNetwork(SampleUniquePtr<nvinfer1::IBuilder>& builder,
	SampleUniquePtr<nvinfer1::INetworkDefinition>& network, SampleUniquePtr<nvinfer1::IBuilderConfig>& config,
	SampleUniquePtr<nvonnxparser::IParser>& parser);

//!
//! \brief Reads the input  and stores the result in a managed buffer
//!
bool processInput(const samplesCommon::BufferManager& buffers);

//!
//! \brief Classifies digits and verify result
//!
bool verifyOutput(const samplesCommon::BufferManager& buffers);

};

//!
//! \brief Creates the network, configures the builder and creates the network engine
//!
//! \details This function creates the Onnx MNIST network by parsing the Onnx model and builds
//! the engine that will be used to run MNIST (mEngine)
//!
//! \return Returns true if the engine was created successfully and false otherwise
//!
bool SampleOnnxMNIST::build()
{
cout << “Wawting for the next step…” << endl;
auto builder = SampleUniquePtrnvinfer1::IBuilder(nvinfer1::createInferBuilder(gLogger.getTRTLogger()));
if (!builder)
{
return false;
}

const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
auto network = SampleUniquePtr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(explicitBatch));
if (!network)
{
	return false;
}
cout << "Parsing the onnx model !" << endl;
auto config = SampleUniquePtr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig());
if (!config)
{
	return false;
}

auto parser = SampleUniquePtr<nvonnxparser::IParser>(nvonnxparser::createParser(*network, gLogger.getTRTLogger()));
if (!parser)
{
	return false;
}
cout << "Start to build the network...." << endl;
auto constructed = constructNetwork(builder, network, config, parser);
if (!constructed)
{
	return false;
}
cout << "Start to build the mEngine......." << endl;
mEngine = std::shared_ptr<nvinfer1::ICudaEngine>(
	builder->buildEngineWithConfig(*network, *config), samplesCommon::InferDeleter());
if (!mEngine)
{
	return false;
}
cout << "mEngine is OK!!!!" << endl;
assert(network->getNbInputs() == 1);
mInputDims = network->getInput(0)->getDimensions();
assert(mInputDims.nbDims == 5);

assert(network->getNbOutputs() == 1);
mOutputDims = network->getOutput(0)->getDimensions();
assert(mOutputDims.nbDims == 4);
cout << "Finish build network!" << endl;
return true;

}

//!
//! \brief Uses a ONNX parser to create the Onnx MNIST Network and marks the
//! output layers
//!
//! \param network Pointer to the network that will be populated with the Onnx MNIST network
//!
//! \param builder Pointer to the engine builder
//!
bool SampleOnnxMNIST::constructNetwork(SampleUniquePtrnvinfer1::IBuilder& builder,
SampleUniquePtrnvinfer1::INetworkDefinition& network, SampleUniquePtrnvinfer1::IBuilderConfig& config,
SampleUniquePtrnvonnxparser::IParser& parser)
{
auto parsed = parser->parseFromFile(
locateFile(mParams.onnxFileName, mParams.dataDirs).c_str(), static_cast(gLogger.getReportableSeverity()));
if (!parsed)
{
return false;
}

builder->setMaxBatchSize(mParams.batchSize);
config->setMaxWorkspaceSize(16_MiB);
if (mParams.fp16)
{
	config->setFlag(BuilderFlag::kFP16);
}
if (mParams.int8)
{
	config->setFlag(BuilderFlag::kINT8);
	samplesCommon::setAllTensorScales(network.get(), 127.0f, 127.0f);
}

samplesCommon::enableDLA(builder.get(), config.get(), mParams.dlaCore);

return true;

}

//!
//! \brief Runs the TensorRT inference engine for this sample
//!
//! \details This function is the main execution function of the sample. It allocates the buffer,
//! sets inputs and executes the engine.
//!
bool SampleOnnxMNIST::infer()
{
// Create RAII buffer manager object
samplesCommon::BufferManager buffers(mEngine, mParams.batchSize);

auto context = SampleUniquePtr<nvinfer1::IExecutionContext>(mEngine->createExecutionContext());
if (!context)
{
	return false;
}

// Read the input data into the managed buffers
assert(mParams.inputTensorNames.size() == 1);
if (!processInput(buffers))
{
	return false;
}

// Memcpy from host input buffers to device input buffers
buffers.copyInputToDevice();

auto startTime = std::chrono::high_resolution_clock::now();

bool status = context->executeV2(buffers.getDeviceBindings().data());

auto endTime = std::chrono::high_resolution_clock::now();
float totalTime = std::chrono::duration<float, std::milli>(endTime - startTime).count();
std::cout << "Time used one image (measured by chrono):" << totalTime << " ms" << std::endl;

if (!status)
{
	return false;
}

// Memcpy from device output buffers to host output buffers
buffers.copyOutputToHost();

// Verify results
if (!verifyOutput(buffers))
{
	return false;
}

return true;

}

//!
//! \brief Reads the input and stores the result in a managed buffer
//!
bool SampleOnnxMNIST::processInput(const samplesCommon::BufferManager& buffers)
{
using namespace cv;
string imgpath = “D:/02_DataFile/data_20191122_2D/01_PNG_data/03/lower/frame2d_1064.png”;
string ascpath = “D:/02_DataFile/data_20191122_2D/01_PNG_data/03/lower_asc/frame3d_1064.png”;
string zpath = “D:/02_DataFile/data_20191122_2D/01_PNG_data/03/lower_z/frame3d_1064.png”;
Mat img = imread(imgpath, 1);
Mat asc = imread(imgpath, 1);
Mat z = imread(imgpath, 0);

if (img.empty())
{
	cout << "Image loading failed！"
		<< endl;
	return -1;
}
vector<float> input;
for (int i = 0; i < 3; i++)
{
	for (size_t nrow = 0; nrow < img.rows; nrow++)
	{
		for (size_t ncol = 0; ncol < img.cols; ncol++)
		{
			Vec3i bgr = img.at<Vec3b>(nrow, ncol);
			input.push_back(bgr.val[i]);
		}
	}
}
cout << "Pictureinput.size():" << input.size() << endl;
for (int i = 0; i < z.rows; i++)
{
	for (int j = 0; j < z.cols; j++)
	{
		input.push_back(z.at<uchar>(i, j));
	}
}
cout << "Pictureinput.size():" << input.size() << endl;
for (int i = 0; i < 3; i++)
{
	for (size_t nrow = 0; nrow < asc.rows; nrow++)
	{
		for (size_t ncol = 0; ncol < asc.cols; ncol++)
		{
			Vec3i bgr = asc.at<Vec3b>(nrow, ncol);  
			input.push_back(bgr.val[i]);
		}
	}
}
cout << "Pictureinput.size():" << input.size() << endl;
int vsize = input.size();
for (int i = 0; i < vsize; i++)
{
	input.push_back(input[i]);
}
cout << "Pictureinput.size():" << input.size() << endl;
float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
for (int i = 0; i < vsize; i++)
{
	//hostDataBuffer[i] = 1.0 - float(toothresult[i] / 255.0);
	hostDataBuffer[i] = input[i];
}
return true;

}

//!
//! \brief Classifies digits and verify result
//!
//! \return whether the classification output matches expectations
//!
bool SampleOnnxMNIST::verifyOutput(const samplesCommon::BufferManager& buffers)
{
const int Size0 = mOutputDims.d[0];
const int Size1 = mOutputDims.d[1];
const int Size2 = mOutputDims.d[2];
const int Size3 = mOutputDims.d[3];

cout << typeid(mOutputDims).name() << endl;
cout << "outresult.size():" << mOutputDims.d[0] << endl;
cout << "outresult.size():" << mOutputDims.d[1] << endl;
cout << "outresult.size():" << mOutputDims.d[2] << endl;
cout << "outresult.size():" << mOutputDims.d[3] << endl;

float* output = static_cast<float*>(buffers.getHostBuffer(mParams.outputTensorNames[0]));
vector<float> result;
int size = Size0 * Size1 * Size2 * Size3;
float *arr = new float[size];

for (int i = 0; i < mOutputDims.d[0] * mOutputDims.d[1] * mOutputDims.d[2] * mOutputDims.d[3]; i++)
{
	result.push_back(output[i]);
	arr[i] = output[i];
	//std::cout << result[i] << " " << std::endl;
}

float ***DD;
DD = new float **[Size1];
for (int x = 0; x < Size1; x++)
{
	DD[x] = new float *[Size2];
	for (int y = 0; y < Size2; y++)
	{
		DD[x][y] = new float[Size3];
	}
}

for (int i = 0; i < Size1; i++)
{
	for (int j = 0; j < Size2; j++)
	{
		for (int k = 0; k < Size3; k++)
		{
			DD[i][j][k] = arr[(i*Size2 + j)*Size3 + k];
		}
	}
}

cout << "Picture_result.size():" << result.size() << endl;


using namespace cv;
string imgpath = "D:/02_DataFile/data_20191122_2D/01_PNG_data/03/lower_label/frame2d_1064.png";
Mat img = imread(imgpath, 0);
int row = img.rows;
int col = img.cols;
if (img.empty())
{
	cout << "Image loading failed！"
		<< endl;
	return -1;
}

int **label = new int *[row];
for (int i = 0; i < row; i++)
{
	label[i] = new int[col];
}
for (int i = 0; i < row; i++)
{
	for (int j = 0; j < col; j++)
	{
		label[i][j] = img.at<uchar>(i, j);
	}
}

int **out_result = new int *[row];
for (int i = 0; i < row; i++)
{
	out_result[i] = new int[col];
}
for (int i = 0; i < row; i++)
{
	for (int j = 0; j < col; j++)
	{
		int index_max = 0;     //Initializes the maximum subscript
		for (int k = 0; k < 3; k++)
		{
			if (DD[k][i][j] < DD[index_max][i][j])
			{
				index_max = k;
			}
		}
		out_result[i][j] = index_max;
	}
}
//calculate the accuracy
int sum = 0;
for (int i = 0; i < row; i++)
{
	for (int j = 0; j < col; j++)
	{
		if (label[i][j] == out_result[i][j])
		{
			sum = sum + 1;
			//cout << sum << endl;
		}
	}
}
cout << "The number of sum is : " << sum << endl;
float accuracy = (1.00000f*sum / row / col);
cout << "The accuracy of image is: " << accuracy << endl;

}

//!
//! \brief Initializes members of the params struct using the command line args
//!
samplesCommon::OnnxSampleParams initializeSampleParams(const samplesCommon::Args& args)
{
samplesCommon::OnnxSampleParams params;
if (args.dataDirs.empty()) //!< Use default directories if user hasn’t provided directory paths
{
params.dataDirs.push_back(“data/mnist/”);
params.dataDirs.push_back(“data/samples/mnist/”);
}
else //!< Use the data directory provided by the user
{
params.dataDirs = args.dataDirs;
}
params.onnxFileName = “test_simple.onnx”;
params.inputTensorNames.push_back(“0”);
params.batchSize = 1;
params.outputTensorNames.push_back(“1033”);
params.dlaCore = args.useDLACore;
params.int8 = args.runInInt8;
params.fp16 = args.runInFp16;

return params;

}

//!
//! \brief Prints the help information for running this sample
//!
void printHelpInfo()
{
std::cout
<< “Usage: ./sample_onnx_mnist [-h or --help] [-d or --datadir=] [–useDLACore=]”
<< std::endl;
std::cout << “–help Display help information” << std::endl;
std::cout << "–datadir Specify path to a data directory, overriding the default. This option can be used "
"multiple times to add multiple directories. If no data directories are given, the default is to use "
“(data/samples/mnist/, data/mnist/)”
<< std::endl;
std::cout << "–useDLACore=N Specify a DLA engine for layers that support DLA. Value can range from 0 to n-1, "
“where n is the number of DLA engines on the platform.”
<< std::endl;
std::cout << “–int8 Run in Int8 mode.” << std::endl;
std::cout << “–fp16 Run in FP16 mode.” << std::endl;
}

int main(int argc, char** argv)
{
samplesCommon::Args args;
bool argsOK = samplesCommon::parseArgs(args, argc, argv);
if (!argsOK)
{
gLogError << “Invalid arguments” << std::endl;
printHelpInfo();
return EXIT_FAILURE;
}
if (args.help)
{
printHelpInfo();
return EXIT_SUCCESS;
}

auto sampleTest = gLogger.defineTest(gSampleName, argc, argv);

gLogger.reportTestStart(sampleTest);

SampleOnnxMNIST sample(initializeSampleParams(args));

gLogInfo << "Building and running a GPU inference engine for Onnx FCHardNet" << std::endl;

if (!sample.build())
{
	return gLogger.reportFail(sampleTest);
}

if (!sample.infer())
{
	return gLogger.reportFail(sampleTest);
}


return gLogger.reportPass(sampleTest);

}
And there is my onnx model:https://drive.google.com/open?id=11rmigXkzDIbhxzlwKQBjeDHicX9h9ZyL

SunilJB · April 16, 2020, 11:45am

Hi,

Can you try use trtexec command to generate the trt model file?
I tired below command and it seems to be working
trtexec --onnx=test_simple.onnx --explicitBatch --saveEngine=test_simple.trt --verbose

Please refer below link for more options:
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks

xieshichen2012 · April 17, 2020, 12:37am

i’m confused about why i should generate the trt-model file, i can’t run a inference just through the sampleonnxMINIST?

SunilJB · April 17, 2020, 12:55pm

Hi,

What i meant is you can use trtexec command line tool to perform the bench marking and to debug the issue.
eg:
trtexec --loadEngine=test_simple.trt --batch=16

Please refer below link and trtexec --help to get more info:
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks

Topic		Replies	Views
BufferManager issue \| TensorRT C++ sample TensorRT tensorrt	5	1926	October 12, 2021
Onnx -> tensorrt fp32 conversion performance degradation different outputs TensorRT tensorrt , pytorch , onnx	4	1979	November 29, 2022
LSTM ONNX to TensorRT mismatched outputs TensorRT tensorrt	3	921	September 29, 2022
Tensorrt Batch Inference TensorRT tensorrt	8	1562	December 1, 2020
TensorRT get different result in python and c++ TensorRT	21	2844	August 24, 2022
TensorRT Batch Inference: different results TensorRT	4	4161	December 1, 2021
Writing layer for NonMaxSuppression in onnx parser DRIVE AGX Xavier General driveos-dl	21	3717	October 12, 2021
ONNX to TensorRT Python module doesn't generate dynamic batch size engine TensorRT tensorrt , cudnn , onnx	3	1055	October 20, 2023
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1062	December 13, 2022
Batch Inference Wrong in Python API TensorRT	15	3538	October 12, 2021

TensorRT can not accelarate the onnx model for inferencing

Environment

Relevant Files

Related topics