The inference result based on transferred mode from tensorrt 4 with fcn8s is very terrible

2547997322 · September 7, 2018, 6:34am

Hi,
First,thank you for Tensorrt 4’s comming,so We can write less custom layers, such as “Crop” layer. And thank you for the author of GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson..

I have used segnet and fcn8s to image segmentation with caffe. I think fcn8s is better according to their result, so I want to use fcn8s with tensorrt.

I inferenced the same image with caffe and tensorrt 4(use the same caffemodel and the same prototxt),but the result of caffe is very good and tensorrt’s is very terible.The attachments are result images. Following is my prototxt and I used the same prototxt with caffe and tensorrt.(I have 16+1 classes and my command is “./segnet-console 1_5_3_90.jpg 1_5_3_90_out.png --prototxt=$NET/jetsoninference/data/networks/fcn8s/deploy_traffic.prototxt --model=$NET/jetson-inference/data/networks/fcn8s/train_iter_400000.caffemodel --labels=$NET/jetson-inference/data/networks/fcn8s/class.txt --colors=$NET/jetson-inference/data/networks/fcn8s/color.txt --input_blob=data --output_blob=score”)

Prototxt:
layer {
name: “input”
type: “Input”
top: “data”
input_param {
shape { dim: 1 dim: 3 dim: 810 dim: 1080 }
}
}
layer {
name: “conv1_1”
type: “Convolution”
bottom: “data”
top: “conv1_1”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 100
kernel_size: 3
stride: 1
}
}
layer {
name: “relu1_1”
type: “ReLU”
bottom: “conv1_1”
top: “conv1_1”
}
layer {
name: “conv1_2”
type: “Convolution”
bottom: “conv1_1”
top: “conv1_2”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu1_2”
type: “ReLU”
bottom: “conv1_2”
top: “conv1_2”
}
layer {
name: “pool1”
type: “Pooling”
bottom: “conv1_2”
top: “pool1”
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: “conv2_1”
type: “Convolution”
bottom: “pool1”
top: “conv2_1”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu2_1”
type: “ReLU”
bottom: “conv2_1”
top: “conv2_1”
}
layer {
name: “conv2_2”
type: “Convolution”
bottom: “conv2_1”
top: “conv2_2”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu2_2”
type: “ReLU”
bottom: “conv2_2”
top: “conv2_2”
}
layer {
name: “pool2”
type: “Pooling”
bottom: “conv2_2”
top: “pool2”
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: “conv3_1”
type: “Convolution”
bottom: “pool2”
top: “conv3_1”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu3_1”
type: “ReLU”
bottom: “conv3_1”
top: “conv3_1”
}
layer {
name: “conv3_2”
type: “Convolution”
bottom: “conv3_1”
top: “conv3_2”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu3_2”
type: “ReLU”
bottom: “conv3_2”
top: “conv3_2”
}
layer {
name: “conv3_3”
type: “Convolution”
bottom: “conv3_2”
top: “conv3_3”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu3_3”
type: “ReLU”
bottom: “conv3_3”
top: “conv3_3”
}
layer {
name: “pool3”
type: “Pooling”
bottom: “conv3_3”
top: “pool3”
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: “conv4_1”
type: “Convolution”
bottom: “pool3”
top: “conv4_1”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu4_1”
type: “ReLU”
bottom: “conv4_1”
top: “conv4_1”
}
layer {
name: “conv4_2”
type: “Convolution”
bottom: “conv4_1”
top: “conv4_2”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu4_2”
type: “ReLU”
bottom: “conv4_2”
top: “conv4_2”
}
layer {
name: “conv4_3”
type: “Convolution”
bottom: “conv4_2”
top: “conv4_3”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu4_3”
type: “ReLU”
bottom: “conv4_3”
top: “conv4_3”
}
layer {
name: “pool4”
type: “Pooling”
bottom: “conv4_3”
top: “pool4”
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: “conv5_1”
type: “Convolution”
bottom: “pool4”
top: “conv5_1”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu5_1”
type: “ReLU”
bottom: “conv5_1”
top: “conv5_1”
}
layer {
name: “conv5_2”
type: “Convolution”
bottom: “conv5_1”
top: “conv5_2”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu5_2”
type: “ReLU”
bottom: “conv5_2”
top: “conv5_2”
}
layer {
name: “conv5_3”
type: “Convolution”
bottom: “conv5_2”
top: “conv5_3”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu5_3”
type: “ReLU”
bottom: “conv5_3”
top: “conv5_3”
}
layer {
name: “pool5”
type: “Pooling”
bottom: “conv5_3”
top: “pool5”
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: “fc6”
type: “Convolution”
bottom: “pool5”
top: “fc6”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 4096
pad: 0
kernel_size: 7
stride: 1
}
}
layer {
name: “relu6”
type: “ReLU”
bottom: “fc6”
top: “fc6”
}
layer {
name: “fc7”
type: “Convolution”
bottom: “fc6”
top: “fc7”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 4096
pad: 0
kernel_size: 1
stride: 1
}
}
layer {
name: “relu7”
type: “ReLU”
bottom: “fc7”
top: “fc7”
}
layer {
name: “score_fr_traffic”
type: “Convolution”
bottom: “fc7”
top: “score_fr”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 17
pad: 0
kernel_size: 1
}
}
layer {
name: “upscore2_traffic”
type: “Deconvolution”
bottom: “score_fr”
top: “upscore2”
param {
lr_mult: 0
}
convolution_param {
num_output: 17
bias_term: false
kernel_size: 4
stride: 2
}
}
layer {
name: “score_pool4_traffic”
type: “Convolution”
bottom: “pool4”
top: “score_pool4”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 17
pad: 0
kernel_size: 1
}
}
layer {
name: “score_pool4c”
type: “Crop”
bottom: “score_pool4”
bottom: “upscore2”
top: “score_pool4c”
crop_param {
axis: 2
offset: 5
}
}
layer {
name: “fuse_pool4”
type: “Eltwise”
bottom: “upscore2”
bottom: “score_pool4c”
top: “fuse_pool4”
eltwise_param {
operation: SUM
}
}
layer {
name: “upscore_pool4_traffic”
type: “Deconvolution”
bottom: “fuse_pool4”
top: “upscore_pool4”
param {
lr_mult: 0
}
convolution_param {
num_output: 17
bias_term: false
kernel_size: 4
stride: 2
}
}
layer {
name: “score_pool3_traffic”
type: “Convolution”
bottom: “pool3”
top: “score_pool3”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 17
pad: 0
kernel_size: 1
}
}
layer {
name: “score_pool3c”
type: “Crop”
bottom: “score_pool3”
bottom: “upscore_pool4”
top: “score_pool3c”
crop_param {
axis: 2
offset: 9
}
}
layer {
name: “fuse_pool3”
type: “Eltwise”
bottom: “upscore_pool4”
bottom: “score_pool3c”
top: “fuse_pool3”
eltwise_param {
operation: SUM
}
}
layer {
name: “upscore8_traffic”
type: “Deconvolution”
bottom: “fuse_pool3”
top: “upscore8”
param {
lr_mult: 0
}
convolution_param {
num_output: 17
bias_term: false
kernel_size: 16
stride: 8
}
}
layer {
name: “score”
type: “Crop”
bottom: “upscore8”
bottom: “data”
top: “score”
crop_param {
axis: 2
offset: 31
}
}

Who can give me some suggestions?

Thanks

NVES · September 13, 2018, 10:53pm

Hello, can you provide details on the platforms you are using?

Linux distro and version
GPU type
nvidia driver version
CUDA version
CUDNN version
Python version [if using python]
Tensorflow version
TensorRT version

Also, are you using an NVIDIA container? DIGITS? can you share the version used?

2547997322 · September 14, 2018, 7:05am

Thank you for your response!

Linux distro and version：Ubuntu 14.04 LTS
GPU type: titan xp 12GB
nvidia driver version：384.111
CUDA version：CUDA8
CUDNN version：cudnn7.1
Python version [if using python]：python2.7.12
Tensorflow version：1.15
TensorRT version：Tensorrt4.0.1.6
fcn8s training code is GitHub - shelhamer/fcn.berkeleyvision.org: Fully Convolutional Networks for Semantic Segmentation by Jonathan Long*, Evan Shelhamer*, and Trevor Darrell. CVPR 2015 and PAMI 2016.

Thanks

NVES · September 14, 2018, 6:22pm

Tensorflow version：1.15 ?

You mean 1.5?

https://www.tensorflow.org/versions/

2547997322 · September 17, 2018, 1:42am

Sorry, Tensorflow version is 1.5.
In fact, My program did not use tensorflow.
I used caffe from command:“git clone https://github.com/NVIDIA/caffe.git -b ‘caffe-0.15’”

Thanks

NVES · October 2, 2018, 5:16pm

Hello, to help us debug, can you provide a simple reproduction ? the prototext, model, lables, colors? for both caffe and tensorrt cases.

2547997322 · October 8, 2018, 1:54am

Hi, sure, I have uploaded these resources:https://drive.google.com/open?id=1rwUCHP9SgxPSKq-PLmt6wZDtuu6VTWKL.The Readme.txt would explain the role of each document.

Thanks

NVES · October 10, 2018, 11:03pm

Thanks. I’m trying to repro your issue. getting the following error while running on a DGX1. Were you executing this on a Jetson?

loaded image  source.jpg  (1080 x 810)  13996800 bytes
[cuda]  cudaAllocMapped 13996800 bytes, CPU 0x7f9e42c00000 GPU 0x7f9e42c00000
[cuda]  cudaAllocMapped 13996800 bytes, CPU 0x7f9ebe000000 GPU 0x7f9ebe000000
segnet-console:  beginning processing overlay (1539212545874)
[cuda]   cudaGetLastError()
[cuda]      no kernel image is available for execution on the device (error 48) (hex 0x30)
[cuda]      /mnt/jetson-inference/imageNet.cu:68
[cuda]   cudaPreImageNet((float4*)rgba, width, height, mInputCUDA, mWidth, mHeight)
[cuda]      no kernel image is available for execution on the device (error 48) (hex 0x30)
[cuda]      /mnt/jetson-inference/segNet.cpp:352
segNet::Overlay() -- cudaPreImageNet failed

2547997322 · October 11, 2018, 3:01am

Hello, I executed it on a PC with 8 Titan Xp gpus(not a jetson).
I encountered this problem also.
I have changed CMakeLists.txt and then executed this success.
V100:
set(
CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS};
-O3
-gencode arch=compute_53,code=sm_53
-gencode arch=compute_62,code=sm_62
-gencode arch=compute_70,code=sm_70
)

Titan Xp:
set(
CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS};
-O3 -gencode arch=compute_20,code=sm_20
-gencode arch=compute_20,code=sm_21
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_50,code=compute_50
-gencode arch=compute_53,code=sm_53
)

NVES · October 11, 2018, 5:02pm

Yep, that was the problem. I’m on a DGX with P100. Now I’m getting

root@5412c28d4743:/mnt/fcn8s/fcn8s# /mnt/jetson-inference/build/x86_64/bin/segnet-console source.jpg tensorrt4.png --prototxt=deploy_traffic.prototxt \
> --model=train_iter_400000.caffemodel --labels=class.txt \
> --colors=color.txt --input_blob=data --output_blob=score
segnet-console
  args (9):  0 [/mnt/jetson-inference/build/x86_64/bin/segnet-console]  1 [source.jpg]  2 [tensorrt4.png]  3 [--prototxt=deploy_traffic.prototxt]  4 [--model=train_iter_400000.caffemodel]  5 [--labels=class.txt]  6 [--colors=color.txt]  7 [--input_blob=data]  8 [--output_blob=score]


segNet -- loading segmentation network model from:
       -- prototxt:   deploy_traffic.prototxt
       -- model:      train_iter_400000.caffemodel
       -- labels:     class.txt
       -- colors:     color.txt
       -- input_blob  'data'
       -- output_blob 'score'
       -- batch_size  2

[TRT]  TensorRT version 4.0.1
[TRT]  attempting to open cache file train_iter_400000.caffemodel.2.tensorcache
[TRT]  loading network profile from cache... train_iter_400000.caffemodel.2.tensorcache
[TRT]  platform has FP16 support.
[TRT]  train_iter_400000.caffemodel loaded
segnet-console: caskConvolutionLayer.cpp:145: virtual void nvinfer1::task::caskConvolutionLayer::allocateResources(const nvinfer1::cudnn::CommonContext&): Assertion `configIsValid(context)' failed.
Aborted (core dumped)

Which I think is due to TensorRT engine created on a specific GPU can only be used for inference on the same model of GPU. I don’t have a Titan XP readily available… will look ask around and repro. Will keep you updated.

NVES · October 11, 2018, 5:29pm

[s]Questions:

• are you using TF-TRT or a standalone TRT application?
• are you using FP16 or INT8?[/s]

answered. standalone trt / fp16 default

2547997322 · October 12, 2018, 1:33am

According to your log, I find “[TRT] attempting to open cache file train_iter_400000.caffemodel.2.tensorcache
[TRT] loading network profile from cache… train_iter_400000.caffemodel.2.tensorcache”.
You should delete the tensorcache first when you change the caffemodel or gpu.
And TensorRT engine will be created in runingtime.

NVES · November 20, 2018, 4:43pm

hello,

quick update. Engineering is working to fix for a future release of TRT.

2547997322 · November 22, 2018, 6:08am

Thanks!I am looking forward to it.

NVES · December 26, 2018, 5:12pm

Hello,

Apologize for the delay, but I think we have arrived at a cause now.

Per engineering, This is the new result engineering got from running

./segnet-console source.jpg tensorrt5_new.png --prototxt=deploy_traffic.prototxt --model=train_iter_400000.caffemodel --labels=class.txt --colors=color.txt --input_blob=data --output_blob=score

output attached.

What changed:
I modified segNet.cpp in https://github.com/dusty-nv/jetson-inference/blob/master/segNet.cpp
< const int argmax = (c_max[0] == ignoreID) ? c_max[1] : c_max[0];

                  const int argmax = c_max[0];

There is a void class in class.txt, I guess it just means “does not belong to any other category”.
In the previous code, if we found a pixel has highest score in class ‘void’, instead of saying it does not belong to any meaningful class,
the code tries to pick the class with the second highest score.

This is not some error inside TensorRT, but outside usage caused the difference.

2547997322 · December 28, 2018, 9:48am

Thank you very mach! The inference result is better now!

Topic		Replies	Views
No result when using tensorRT Sample FasterRCNN with other images Jetson TX2	43	5941	October 18, 2021
I don't get similar results with TensorRT and the trained tensorflow model! Jetson TX2	20	4481	October 18, 2021
How to build the objection detection framework SSD with tensorRT on tx2? Jetson TX2	96	21867	February 21, 2018
TensorRT fails to build FasterRCNN GIE model with using INT8 TensorRT	28	9208	May 3, 2018
TF-TRT issue Jetson TX2	26	3829	October 18, 2021
Converting Caffe model to TensorRT Jetson TX2	33	11485	October 18, 2021
TensorRT YOLO inference error Jetson TX1	21	12418	October 18, 2021
TensorRT get different result in python and c++ TensorRT	21	2875	August 24, 2022
TensorRT (TF-TRT) doesn't improve TF model in GeForce 1060? TensorRT	7	2909	January 18, 2019
Create inference graph failed on Agx Xavier Jetson AGX Xavier	32	2076	October 18, 2021

The inference result based on transferred mode from tensorrt 4 with fcn8s is very terrible

What changed: I modified segNet.cpp in https://github.com/dusty-nv/jetson-inference/blob/master/segNet.cpp < const int argmax = (c_max[0] == ignoreID) ? c_max[1] : c_max[0];

Related topics

What changed:
I modified segNet.cpp in https://github.com/dusty-nv/jetson-inference/blob/master/segNet.cpp
< const int argmax = (c_max[0] == ignoreID) ? c_max[1] : c_max[0];