Always same output vlaues

Environment

TensorRT Version : TensorRT-7.2.3.4
GPU Type :
Nvidia Driver Version :
CUDA Version :
CUDNN Version :
Operating System + Version : Windows10
Python Version (if applicable) : 3.7
TensorFlow Version (if applicable) : 2.3.1
PyTorch Version (if applicable) :
Baremetal or Container (if container which image + tag) :

I tried to do an inference with TensorRT. But I received alwayse the same output (1 0 0 in the for-loop). Although I set different images to the network. I followed the sample of SampleOnnxMNIST. I did a transformation from .h5 to .pb to .onnx.

The code:
samplesCommon::OnnxSampleParams mParams;
mParams.dataDirs.push_back(“path/to/model/folder/”);
mParams.onnxFileName = “model.onnx”;
mParams.inputTensorNames.push_back(“input”);
mParams.outputTensorNames.push_back(“output”);
mParams.int8 = false;
mParams.fp16 = false;

auto builder = std::unique_ptr<nvinfer1::IBuilder, samplesCommon::InferDeleter>(nvinfer1::createInferBuilder(sample::gLogger.getTRTLogger()));
if (!builder)
    return 0;

const auto explicitBatch = 1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
auto network = std::unique_ptr<nvinfer1::INetworkDefinition, samplesCommon::InferDeleter>(builder->createNetworkV2(explicitBatch));
if (!network)
    return 0;
auto config = std::unique_ptr<nvinfer1::IBuilderConfig, samplesCommon::InferDeleter>(builder->createBuilderConfig());
if (!config)
    return 0;

auto parser = std::unique_ptr<nvonnxparser::IParser, samplesCommon::InferDeleter>(nvonnxparser::createParser(*network, sample::gLogger.getTRTLogger()));
if (!parser)
    return 0;

auto parsed = parser->parseFromFile(locateFile(mParams.onnxFileName, mParams.dataDirs).c_str(), static_cast<int>(sample::gLogger.getReportableSeverity()));
if (!parsed)
    return 0;
config->setMaxWorkspaceSize(16_MiB);
if (mParams.fp16)
    config->setFlag(BuilderFlag::kFP16);
if (mParams.int8) {
    config->setFlag(BuilderFlag::kINT8);
    samplesCommon::setAllTensorScales(network.get(), 127.0f, 127.0f);
}

std::shared_ptr<nvinfer1::ICudaEngine> mEngine = std::shared_ptr<nvinfer1::ICudaEngine>(builder->buildEngineWithConfig(*network, *config), samplesCommon::InferDeleter());
if (!mEngine)
    return 0;

nvinfer1::Dims mInputDims = network->getInput(0)->getDimensions();
nvinfer1::Dims mOutputDims = network->getOutput(0)->getDimensions();

samplesCommon::BufferManager buffers(mEngine);

auto context = std::unique_ptr<nvinfer1::IExecutionContext, samplesCommon::InferDeleter>(mEngine->createExecutionContext());
if (!context)
    return 0;

 string fileName = "image01.png";

unsigned char* hostDataBuffer = static_cast<unsigned char*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
cv::Mat frame = cv::imread(fileName);
auto input_width = mInputDims.d[2];
auto input_height = mInputDims.d[1];
Size size = Size(input_height, input_width);
cv::resize(frame, frame, size, 0, 0, INTER_NEAREST);
Scalar mean = Scalar(103.939, 116.779, 123.68);
Mat inputBlob;
cv::subtract(frame, mean, inputBlob);
hostDataBuffer = inputBlob.data;

// Memcpy from host input buffers to device input buffers
buffers.copyInputToDevice();

bool status = context->executeV2(buffers.getDeviceBindings().data());
if (!status)
    return 0;

// Memcpy from device output buffers to host output buffers
buffers.copyOutputToHost();

const int outputSize = mOutputDims.d[1];
float* output = static_cast<float*>(buffers.getHostBuffer(mParams.outputTensorNames[0]));

for (int i = 0; i < outputSize; i++) {
    cout << output[i] << std::endl;
}

Hi, Please refer to the below links to perform inference in INT8
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleINT8/README.md

Thanks!

When I transfer the keras-model to the onnx-model with the following command:

python -m tf2onnx.convert --keras model.h5 --inputs input_layer:0[1,224,224,3] --output model.onnx --inputs input_layer --outputs output_layer

I got the the log:
2021-07-06 14:58:09,852 - INFO - Using tensorflow=2.4.0, onnx=1.8.1, tf2onnx=1.9.0/cd64e4
2021-07-06 14:58:09,852 - INFO - Using opset <onnx, 9>
2021-07-06 14:58:11,790 - INFO - Computed 0 values for constant folding
2021-07-06 14:58:17,708 - INFO - Optimizing ONNX model
2021-07-06 14:58:20,669 - INFO - After optimization: Add -1 (18->17), BatchNormalization -53 (53->0), Const -162 (270->108), GlobalAveragePool +1 (0->1), Identity -54 (54->0), ReduceMean -1 (1->0), Squeeze +1 (0->1), Transpose -213 (214->1)
2021-07-06 14:58:20,861 - INFO -
2021-07-06 14:58:20,862 - INFO - Successfully converted TensorFlow model model.h5 to ONNX
2021-07-06 14:58:20,864 - INFO - Model inputs: [‘input_layer:0’]
2021-07-06 14:58:20,865 - INFO - Model outputs: [‘Identity:0’]
2021-07-06 14:58:20,865 - INFO - ONNX model is saved at model.onnx

The model output is Identity:0 but I do not have this layer in my keras model. Then it is clear why my output it always 1 0 0 (it’s the identiy)

What is wrong with my transformation?

Hi @OpDaSo_B,

Please refer following,

https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/

I checked my model.onnx with the ONNXRuntime.

Here’s the code in Python:

import onnxruntime as ort
import numpy as np
import cv2
img_path = "image.png"
input_img = cv2.imread(img_path)
input_img = input_img.astype(np.float32)
image_resize = cv2.resize(src=input_img, dsize=(224, 224), interpolation=cv2.INTER_NEAREST)
x = image_resize
mean = np.array([103.939, 116.779, 123.68])
x[:, :, 0] -= mean[0]
x[:, :, 1] -= mean[1]
x[:, :, 2] -= mean[2]
preproc_img = x
batch = [ x ]
sess = ort.InferenceSession("model.onnx")
results_ort = sess.run(None, {"input_layer:0": batch})

The values in results_ort are comprehensible. Therefore the model is correct. I think my problem is writing the image to the buffer.

I changed the part of reading and writing the image to the buffer to the following:

float* hostDataBuffer = static_cast<float*>(buffers.getHostBuffer(mParams.inputTensorNames[0]));
cv::Mat frame = cv::imread("image.png");
auto input_batch_size = mInputDims.d[0]; 
auto input_height = mInputDims.d[1]; 
auto input_width = mInputDims.d[2]; 
auto input_channels = mInputDims.d[3]; 

Size size = Size(input_height, input_width);
cv::resize(frame, frame, size, 0, 0, INTER_NEAREST);
Scalar mean = Scalar(103.939, 116.779, 123.68); // BGR format
Mat inputBlob;
float scale = 1.0;
inputBlob = cv::dnn::blobFromImages(frame, scale, size, mean); // format NCHW order

int volChl = input_height * input_width;
int volImg = input_channels  * input_height * input_width;
for (int i = 0; i < input_batch_size; ++i)
{
    for (int j = 0; j < input_height; ++j)
    {
        for(int k = 0; k < input_width; ++ k) {
            Vec<int, 4> idxB = Vec<int, 4>(i, 0, j, k);
            Vec<int, 4> idxG = Vec<int, 4>(i, 1, j, k);
            Vec<int, 4> idxR = Vec<int, 4>(i, 2, j, k);
            hostDataBuffer[i * volImg + 0 * volChl + j * input_width + k] = inputBlob.at<float>(idxB);
            hostDataBuffer[i * volImg + 1 * volChl + j * input_width + k] = inputBlob.at<float>(idxG);
            hostDataBuffer[i * volImg + 2 * volChl + j * input_width + k] = inputBlob.at<float>(idxR);
         }
    }
}

But the results are still false. How should I wrote the image to the buffer? Which order of channels, width, height should be used?

Hi @OpDaSo_B,

Please follow NCHW or NHWC format. TRT internally tries all kinds of tensor layouts.
Please refer TRT samples for your reference,

https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource

Thank you.

The input format of the model is [1, 224, 224, 3] (NHWC). First the blue channel, then the green channel and last the red channel is read to the buffer. Because the model needs the input in BGR format.

Below is the log:

[07/09/2021-10:33:03] [I] [TRT] [MemUsageChange] Init CUDA: CPU +460, GPU +0, now: CPU 17674, GPU 4257 (MiB)
[07/09/2021-10:33:03] [I] [TRT] ----------------------------------------------------------------
[07/09/2021-10:33:03] [I] [TRT] Input filename: model.onnx
[07/09/2021-10:33:03] [I] [TRT] ONNX IR version: 0.0.4
[07/09/2021-10:33:03] [I] [TRT] Opset version: 9
[07/09/2021-10:33:03] [I] [TRT] Producer name: tf2onnx
[07/09/2021-10:33:03] [I] [TRT] Producer version: 1.9.0
[07/09/2021-10:33:03] [I] [TRT] Domain:
[07/09/2021-10:33:03] [I] [TRT] Model version: 0
[07/09/2021-10:33:03] [I] [TRT] Doc string:
[07/09/2021-10:33:03] [I] [TRT] ----------------------------------------------------------------
[07/09/2021-10:33:03] [W] [TRT] ShapedWeights.cpp:173: Weights predictions/MatMul/ReadVariableOp:0 has been transposed with permutation of (1, 0)! If you plan on overwriting the weights with the Refitter API, the new weights must be pre-transposed.
[07/09/2021-10:33:03] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 17794 MiB, GPU 4257 MiB
[07/09/2021-10:33:04] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
[07/09/2021-10:33:04] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +451, GPU +166, now: CPU 18247, GPU 4423 (MiB)
[07/09/2021-10:33:04] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +230, GPU +170, now: CPU 18477, GPU 4593 (MiB)
[07/09/2021-10:33:04] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[07/09/2021-10:33:10] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[07/09/2021-10:33:46] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[07/09/2021-10:33:46] [I] [TRT] Total Host Persistent Memory: 85072
[07/09/2021-10:33:46] [I] [TRT] Total Device Persistent Memory: 1805824
[07/09/2021-10:33:46] [I] [TRT] Total Scratch Memory: 512
[07/09/2021-10:33:46] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 9 MiB, GPU 4 MiB
[07/09/2021-10:33:46] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
[07/09/2021-10:33:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 18731, GPU 4851 (MiB)
[07/09/2021-10:33:46] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 18731, GPU 4859 (MiB)
[07/09/2021-10:33:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 18731, GPU 4843 (MiB)
[07/09/2021-10:33:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 18730, GPU 4825 (MiB)
[07/09/2021-10:33:46] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 18730 MiB, GPU 4825 MiB
[07/09/2021-10:33:46] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 18730 MiB, GPU 4827 MiB
[07/09/2021-10:33:46] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
[07/09/2021-10:33:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 18730, GPU 4835 (MiB)
[07/09/2021-10:33:46] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 18730, GPU 4843 (MiB)
[07/09/2021-10:33:46] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 18730 MiB, GPU 4847 MiB
[07/09/2021-10:33:46] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 18737, GPU 4829 (MiB)

Is there something wrong in the log output?

@OpDaSo_B,

Could you please share us ONNX model and issue repro complete script, steps to try from our end for better assistance.

Thank you.

Here is the code:

code.cpp (3.8 KB)

The images:
cat
dog

And the model:
https://ufile.io/igg08k7b

If I run the code with the attached model for the cat.jpg and afterwards with the dog.jpg image, both times the output for the first class is higher than the second output value.
Output value for cat.jpg:
0.777636
0.222364

Output values for dog.jpg:
0.983091
0.0169093

This topic can be closed. I solved my problem