DenseNet121 transplanting using TensorRT


I need do preprocess of the input image before TensorRT flow, and need do postprocess for output of the neural network after TensorRT flow, but I don’t know how to modify the data interface in TensorRT for the preprocess and postprocess, please share your solution, thanks

Hi, hi-bigcat

  1. Using Multi media API + TensorRT + CUDA.
  2. Using Gstreamer + tensorRT + CUDA.
  3. Using Deepstream SDK which include the preproces and postprocess.

Hi Alanz,

The thing is that I cannot find the real data interface in the samples of Nano Develop Kit, it seems that the data have no direct interface for precess and process, you just have packed all data in a class, such as SampleMNIST

Hi bigcat, the pre- and post-processing depends on the requirements of the network, so there is not one set of specific functions for it.

For example, image recognition networks like AlexNet, GoogleNet, and ResNet typically need pre-processing to get the data in NCHW layout, planar BGR color format, with mean pixel subtraction applied. See here for a code example of pre-processing for image recognition network with CUDA:

For post-processing image recognition network, you need to select the argmax - the output class with the highest confidence/probability. See here for example:


Sorry for later reply cause i am too busy with other things

In our applications, we use pre-process pictures in jpg format, then transfer the processed images to inference flow, then inference flow outputs the inference results to post-process, then the final results come out in post-process, we don’t use the ‘buffers’

In your MNIST demo, the pictures in pgm format are read and installed into input buffers, then TensorRT read the related buffers and run inferences, then restore the output into output buffers

so my questions are:

  1. does the buffers can recognize jpg files? or do you have function to transfer the jpg file to pgm file?

  2. can i or how to directly use the processed images in TensorRT inference without restoring the results into the input buffers?

  3. also can i or how to directly use inference output without reading them from the output buffers?

In fact, i am confused about the buffers work flow in TensorRT and its advantages, it makes our algorithms transplanting become a little complex


MNIST and PGM files are grayscale. First you need to decode/decompress the JPG file into RGB or grayscale.

Generally some layout conversion is required, as DNN’s typically use NCHW format, BGR colorspace, with mean pixel subtraction applied.

If the network produces a result that is directly usable, then you can use the output buffer directly. You would need to train your network to produce the desired output. In general, image classification networks will output a confidence value (or score) for each output neuron (each output neuron corresponds to an object class). Then you need to pick the output with the highest score as the answer.

Seems there are some gaps in our communications, we can directly have a phone call to talk about this, could you send me your email first?

i have followed your rules to make inference, pre-process and post-process, now i encounter one issue in build my network when using auto builder = SampleUniquePtr:
it reports error ’ dl-error-skeleton.c: no such file or directory ’

Using the forum to solve issues is too slow,Please contact me ASAP, we need solve the inference in this two days, we may need over 1K Nano board to deploy our applications

I found a new issue when building CudaEngine, but i cannot attach pictures to show the specific problems

could you answer me ASAP?


Have you marked the output tensors?

Please check the sample shared above:

You will need to mark the tensor as output before accessing it.
In genera, output is the final layer name of your model. ex. prob, bbox, …


Yes, of cause i have marked the output tensors as you demo samples show, please see attachment


I have checked above DL model namely Densenet121 in GoogleNet demo, and find same problem.

Now i think DenseNet121 cannot be supported by TensorRT.

Please help to solve this problem ASAP


Could you help to run your model with trtexec directly and share the log with us?

./trtexec --deploy=[your/prototxt] --output=[output name]



I have tested a Dense-121 model with .trtexec and it works fine.
My source is from

vyu@server:~/tensorrt/bin$ ./trtexec --deploy=./DenseNet_121.prototxt --output=fc6
deploy: ./DenseNet_121.prototxt
output: fc6
Input "data": 3x224x224
Output "fc6": 1000x1x1
name=data, bindingIndex=0, buffers.size()=2
name=fc6, bindingIndex=1, buffers.size()=2
Average over 10 runs is 6.19329 ms (host walltime is 6.38577 ms, 99% percentile time is 6.37216).
Average over 10 runs is 6.35511 ms (host walltime is 6.54085 ms, 99% percentile time is 7.54675).
Average over 10 runs is 6.18524 ms (host walltime is 6.40593 ms, 99% percentile time is 6.78074).
Average over 10 runs is 5.83584 ms (host walltime is 6.03252 ms, 99% percentile time is 6.23603).
Average over 10 runs is 5.83387 ms (host walltime is 6.03153 ms, 99% percentile time is 6.22694).
Average over 10 runs is 5.63986 ms (host walltime is 5.84728 ms, 99% percentile time is 5.90234).
Average over 10 runs is 5.41692 ms (host walltime is 5.61464 ms, 99% percentile time is 5.85626).
Average over 10 runs is 5.45298 ms (host walltime is 5.64838 ms, 99% percentile time is 5.81222).
Average over 10 runs is 5.34745 ms (host walltime is 5.72118 ms, 99% percentile time is 5.69549).
Average over 10 runs is 5.25973 ms (host walltime is 5.47212 ms, 99% percentile time is 5.38998).


i have tried the trtexec and modify the output layer name, now the our DenseNet121 can be run.

1 ) But it is quite slow to deploy the models, about several minutes, how to accelerate it?

  1. In our pre-process, we input a .jpg image then output an cv::Mat variable to inference, so how to store the cv:: Mat variable into the buffer?

For the first time deploy. There are some extra optimizing work need to be done before running. You can choose to save the .plan(optimized already) to accelerate the next deploy. It should be faster.

For .jpg Image, I recommend to decode using jpeg decoder. You can find the sample code from Tegra_Multimedia_API_R32.1.0_aarch64.tbz2(Download from Jetpack download center). The sample code locate at /tegra_multimedia_api/samples/06_jpeg_decode. After decode you can directly convert to BGR without copy/convert through the API provided without any opencv libs. It’s more efficient and faster than cv::Mat


  1. Please show how to store and use the .plan file

  2. Do you mean the decoder module has been in Nano board already and we can directly use it to deal with jpg file?
    The thing is that we already use cv::imread to read jpg file, process the intermediate cv::Mat and output the cv::Mat in our pre-process steps. We may try your method later if it is really faster, but currently we hope to realize reading the cv::Mat into buffers to do inference first. Please provide your method

Attachment is what i have tried, please help me check

  1. modelcache is the file that saved can be used directly by trt engine.
IRuntime* runtime;
    ICudaEngine* engine;
    IExecutionContext *context;
    pLogger = new Logger;

    ifstream gieModelFile(modelcache.c_str());
    if (gieModelFile.good())

    size_t size = 0; 
    size_t i = 0; 

        cout<<"Using cached GIE model"<<endl;

        // Get the length
        gieModelFile.seekg(0, ios::end);
        size = gieModelFile.tellg();
        gieModelFile.seekg(0, ios::beg);

        char * buff = new char ;
        while (gieModelFile.get(buff[i])) i++; 
        runtime = createInferRuntime(*pLogger);
        engine = runtime->deserializeCudaEngine((void *)buff, size, nullptr);

                std::vector < std::string > { OUTPUT_BLOB_NAME },
        cout<<"Create GIE model cache"<<endl;
        ofstream gieModelFile(modelcache.c_str());
        gieModelFile.write((char *)gieModelStream->data(), gieModelStream->size());
        runtime = createInferRuntime(*pLogger);
        engine = runtime->deserializeCudaEngine(gieModelStream->data(), gieModelStream->size(), nullptr);


    context = engine->createExecutionContext();
  1. Yes. There is a hardware jpeg decoder can be used directly. For image pre-process, different application and network require different process. I think you can refer to your network paper to check the exactly format required. And for cv:Mat buffer access pls refer the code bellow.
for(int j = 1; j < myImage.rows - 1; ++j)
    const uchar* previous = myImage.ptr<uchar>(j - 1);
    const uchar* current  = myImage.ptr<uchar>(j    );
    const uchar* next     = myImage.ptr<uchar>(j + 1);
    uchar* output = Result.ptr<uchar>(j);
    for(int i = nChannels; i < nChannels * (myImage.cols - 1); ++i)
        *output++ = saturate_cast<uchar>(5 * current[i]
                     -current[i - nChannels] - current[i + nChannels] - previous[i] - next[i]);
  1. What are caffeToGieModel and gieModelStream? i haven’t found these two variable in the sample demo and documents.

from what file include this two variable? and what are their type?

  1. I will try it