DenseNet121 transplanting using TensorRT

hi-bigcat · April 11, 2019, 9:19am

Hi,

I need do preprocess of the input image before TensorRT flow, and need do postprocess for output of the neural network after TensorRT flow, but I don’t know how to modify the data interface in TensorRT for the preprocess and postprocess, please share your solution, thanks

alanz · April 11, 2019, 9:35am

Hi, hi-bigcat

Using Multi media API + TensorRT + CUDA.
Using Gstreamer + tensorRT + CUDA.
Using Deepstream SDK which include the preproces and postprocess.

hi-bigcat · April 11, 2019, 9:37am

hi-bigcat · April 11, 2019, 9:41am

Hi Alanz,

The thing is that I cannot find the real data interface in the samples of Nano Develop Kit, it seems that the data have no direct interface for precess and process, you just have packed all data in a class, such as SampleMNIST

dusty_nv · April 11, 2019, 12:02pm

Hi bigcat, the pre- and post-processing depends on the requirements of the network, so there is not one set of specific functions for it.

For example, image recognition networks like AlexNet, GoogleNet, and ResNet typically need pre-processing to get the data in NCHW layout, planar BGR color format, with mean pixel subtraction applied. See here for a code example of pre-processing for image recognition network with CUDA:

[url]https://github.com/dusty-nv/jetson-inference/blob/bda3d60a6967d5c081162a940f0bbca399081a55/imageNet.cu#L98[/url]

For post-processing image recognition network, you need to select the argmax - the output class with the highest confidence/probability. See here for example:

[url]https://github.com/dusty-nv/jetson-inference/blob/bda3d60a6967d5c081162a940f0bbca399081a55/imageNet.cpp#L402[/url]

hi-bigcat · April 18, 2019, 2:27pm

Hi,

Sorry for later reply cause i am too busy with other things

In our applications, we use pre-process pictures in jpg format, then transfer the processed images to inference flow, then inference flow outputs the inference results to post-process, then the final results come out in post-process, we don’t use the ‘buffers’

In your MNIST demo, the pictures in pgm format are read and installed into input buffers, then TensorRT read the related buffers and run inferences, then restore the output into output buffers

so my questions are:

does the buffers can recognize jpg files? or do you have function to transfer the jpg file to pgm file?
can i or how to directly use the processed images in TensorRT inference without restoring the results into the input buffers?
also can i or how to directly use inference output without reading them from the output buffers?

In fact, i am confused about the buffers work flow in TensorRT and its advantages, it makes our algorithms transplanting become a little complex

Thanks

dusty_nv · April 18, 2019, 5:35pm

MNIST and PGM files are grayscale. First you need to decode/decompress the JPG file into RGB or grayscale.

Generally some layout conversion is required, as DNN’s typically use NCHW format, BGR colorspace, with mean pixel subtraction applied.

If the network produces a result that is directly usable, then you can use the output buffer directly. You would need to train your network to produce the desired output. In general, image classification networks will output a confidence value (or score) for each output neuron (each output neuron corresponds to an object class). Then you need to pick the output with the highest score as the answer.

hi-bigcat · April 19, 2019, 3:18am

dusty_nv:

does the buffers can recognize jpg files? or do you have function to transfer the jpg file to pgm file?

MNIST and PGM files are grayscale. First you need to decode/decompress the JPG file into RGB or grayscale.
so what's the specific function to read jpg file to buffers?
can i or how to directly use the processed images in TensorRT inference without restoring the results into the input buffers?

Generally some layout conversion is required, as DNN’s typically use NCHW format, BGR colorspace, with mean pixel subtraction applied.
I mean how to transfer the preprocessed image to the inference step without using the buffer interface
also can i or how to directly use inference output without reading them from the output buffers?

If the network produces a result that is directly usable, then you can use the output buffer directly. You would need to train your network to produce the desired output. In general, image classification networks will output a confidence value (or score) for each output neuron (each output neuron corresponds to an object class). Then you need to pick the output with the highest score as the answer.
 I mean directly using the inference outputs without reading them from the specific output buffers, using these buffers makes our transplanting become complex

Seems there are some gaps in our communications, we can directly have a phone call to talk about this, could you send me your email first?

hi-bigcat · April 22, 2019, 12:06pm

i have followed your rules to make inference, pre-process and post-process, now i encounter one issue in build my network when using auto builder = SampleUniquePtr：
it reports error ’ dl-error-skeleton.c: no such file or directory ’

Using the forum to solve issues is too slow,Please contact me ASAP, we need solve the inference in this two days, we may need over 1K Nano board to deploy our applications

hi-bigcat · April 23, 2019, 2:48am

I found a new issue when building CudaEngine, but i cannot attach pictures to show the specific problems

could you answer me ASAP?

AastaLLL · April 23, 2019, 5:29am

Hi,

Have you marked the output tensors?

Please check the sample shared above:
https://github.com/dusty-nv/jetson-inference/blob/master/tensorNet.cpp#L417

You will need to mark the tensor as output before accessing it.
In genera, output is the final layer name of your model. ex. prob, bbox, …

Thanks.

hi-bigcat · April 23, 2019, 7:40am

Yes, of cause i have marked the output tensors as you demo samples show, please see attachment

hi-bigcat · April 23, 2019, 1:46pm

I have checked above DL model namely Densenet121 in GoogleNet demo, and find same problem.

Now i think DenseNet121 cannot be supported by TensorRT.

Please help to solve this problem ASAP

AastaLLL · April 24, 2019, 2:56am

Hi,

Could you help to run your model with trtexec directly and share the log with us?

./trtexec --deploy=[your/prototxt] --output=[output name]

Thanks.

AastaLLL · April 24, 2019, 3:11am

Hi,

I have tested a Dense-121 model with .trtexec and it works fine.
My source is from https://github.com/shicai/DenseNet-Caffe/blob/master/DenseNet_121.prototxt

vyu@server:~/tensorrt/bin$ ./trtexec --deploy=./DenseNet_121.prototxt --output=fc6
deploy: ./DenseNet_121.prototxt
output: fc6
Input "data": 3x224x224
Output "fc6": 1000x1x1
name=data, bindingIndex=0, buffers.size()=2
name=fc6, bindingIndex=1, buffers.size()=2
Average over 10 runs is 6.19329 ms (host walltime is 6.38577 ms, 99% percentile time is 6.37216).
Average over 10 runs is 6.35511 ms (host walltime is 6.54085 ms, 99% percentile time is 7.54675).
Average over 10 runs is 6.18524 ms (host walltime is 6.40593 ms, 99% percentile time is 6.78074).
Average over 10 runs is 5.83584 ms (host walltime is 6.03252 ms, 99% percentile time is 6.23603).
Average over 10 runs is 5.83387 ms (host walltime is 6.03153 ms, 99% percentile time is 6.22694).
Average over 10 runs is 5.63986 ms (host walltime is 5.84728 ms, 99% percentile time is 5.90234).
Average over 10 runs is 5.41692 ms (host walltime is 5.61464 ms, 99% percentile time is 5.85626).
Average over 10 runs is 5.45298 ms (host walltime is 5.64838 ms, 99% percentile time is 5.81222).
Average over 10 runs is 5.34745 ms (host walltime is 5.72118 ms, 99% percentile time is 5.69549).
Average over 10 runs is 5.25973 ms (host walltime is 5.47212 ms, 99% percentile time is 5.38998).

Thanks.

hi-bigcat · April 24, 2019, 6:46am

Hi
i have tried the trtexec and modify the output layer name, now the our DenseNet121 can be run.

1 ) But it is quite slow to deploy the models, about several minutes, how to accelerate it?

In our pre-process, we input a .jpg image then output an cv::Mat variable to inference, so how to store the cv:: Mat variable into the buffer?

alanz · April 25, 2019, 1:57am

For the first time deploy. There are some extra optimizing work need to be done before running. You can choose to save the .plan(optimized already) to accelerate the next deploy. It should be faster.

For .jpg Image, I recommend to decode using jpeg decoder. You can find the sample code from Tegra_Multimedia_API_R32.1.0_aarch64.tbz2(Download from Jetpack download center). The sample code locate at /tegra_multimedia_api/samples/06_jpeg_decode. After decode you can directly convert to BGR without copy/convert through the API provided without any opencv libs. It’s more efficient and faster than cv::Mat

hi-bigcat · April 25, 2019, 7:12am

Hi

Please show how to store and use the .plan file
Do you mean the decoder module has been in Nano board already and we can directly use it to deal with jpg file?
The thing is that we already use cv::imread to read jpg file, process the intermediate cv::Mat and output the cv::Mat in our pre-process steps. We may try your method later if it is really faster, but currently we hope to realize reading the cv::Mat into buffers to do inference first. Please provide your method

Attachment is what i have tried, please help me check

alanz · April 25, 2019, 8:24am

modelcache is the file that saved can be used directly by trt engine.

IRuntime* runtime;
    ICudaEngine* engine;
    IExecutionContext *context;
    pLogger = new Logger;



    ifstream gieModelFile(modelcache.c_str());
    if (gieModelFile.good())
    {    

    size_t size = 0; 
    size_t i = 0; 

        cout<<"Using cached GIE model"<<endl;

        // Get the length
        gieModelFile.seekg(0, ios::end);
        size = gieModelFile.tellg();
        gieModelFile.seekg(0, ios::beg);

        char * buff = new char ;
        while (gieModelFile.get(buff[i])) i++; 
        gieModelFile.close();
    
        runtime = createInferRuntime(*pLogger);
        engine = runtime->deserializeCudaEngine((void *)buff, size, nullptr);
    }    
    else 
    {    

        caffeToGIEModel(deploy,
                caffemodel,
                std::vector < std::string > { OUTPUT_BLOB_NAME },
                BATCH_SIZE);
        cout<<"Create GIE model cache"<<endl;
        ofstream gieModelFile(modelcache.c_str());
        gieModelFile.write((char *)gieModelStream->data(), gieModelStream->size());
        gieModelFile.close();
        runtime = createInferRuntime(*pLogger);
        engine = runtime->deserializeCudaEngine(gieModelStream->data(), gieModelStream->size(), nullptr);
        gieModelStream->destroy();

    }    


    context = engine->createExecutionContext();

Yes. There is a hardware jpeg decoder can be used directly. For image pre-process, different application and network require different process. I think you can refer to your network paper to check the exactly format required. And for cv:Mat buffer access pls refer the code bellow.

for(int j = 1; j < myImage.rows - 1; ++j)
{
    const uchar* previous = myImage.ptr<uchar>(j - 1);
    const uchar* current  = myImage.ptr<uchar>(j    );
    const uchar* next     = myImage.ptr<uchar>(j + 1);
 
    uchar* output = Result.ptr<uchar>(j);
 
    for(int i = nChannels; i < nChannels * (myImage.cols - 1); ++i)
    {
        *output++ = saturate_cast<uchar>(5 * current[i]
                     -current[i - nChannels] - current[i + nChannels] - previous[i] - next[i]);
    }
}

hi-bigcat · April 25, 2019, 9:41am

What are caffeToGieModel and gieModelStream? i haven’t found these two variable in the sample demo and documents.

from what file include this two variable? and what are their type?

I will try it

Topic		Replies	Views
Converting Caffe model to TensorRT Jetson TX2	33	11897	October 18, 2021
Exporting Tensorflow models to Jetson Nano Jetson Nano tensorflow	25	7013	October 15, 2021
Extremely slow inference in TensorRT for live semantic segmentation model Jetson AGX Xavier tensorrt , tensorflow , jetson-inference	11	4541	April 12, 2022
No result when using tensorRT Sample FasterRCNN with other images Jetson TX2	43	6458	October 18, 2021
How to build the objection detection framework SSD with tensorRT on tx2? Jetson TX2	96	22913	February 21, 2018
Speeding Up Deep Learning Inference Using TensorRT Technical Blog	5	1016	November 9, 2021
Inferring detectnet_v2 .trt model in python TAO Toolkit tensorrt	58	4182	August 17, 2021
tiny-tensorrt: a simple, efficient, easy-to-use TensorRT wrapper for cnn,sopport c++ and python TensorRT	1	958	December 18, 2019
Feasibility of SSD, YOLO models on TensorRT and Deepstream? TensorRT	17	1345	July 31, 2020
tiny-tensorrt: a simple, efficient, easy-to-use TensorRT wrapper for cnn,sopport c++ and python TensorRT	0	1834	October 22, 2019

DenseNet121 transplanting using TensorRT

Related topics