The inference result based on transferred mode from tensorrt 4 with fcn8s is very terrible

Hi,
First,thank you for Tensorrt 4’s comming,so We can write less custom layers, such as “Crop” layer. And thank you for the author of https://github.com/dusty-nv/jetson-inference.

I have used segnet and fcn8s to image segmentation with caffe. I think fcn8s is better according to their result, so I want to use fcn8s with tensorrt.

I inferenced the same image with caffe and tensorrt 4(use the same caffemodel and the same prototxt),but the result of caffe is very good and tensorrt’s is very terible.The attachments are result images. Following is my prototxt and I used the same prototxt with caffe and tensorrt.(I have 16+1 classes and my command is “./segnet-console 1_5_3_90.jpg 1_5_3_90_out.png --prototxt=$NET/jetsoninference/data/networks/fcn8s/deploy_traffic.prototxt --model=$NET/jetson-inference/data/networks/fcn8s/train_iter_400000.caffemodel --labels=$NET/jetson-inference/data/networks/fcn8s/class.txt --colors=$NET/jetson-inference/data/networks/fcn8s/color.txt --input_blob=data --output_blob=score”)

Prototxt:
layer {
name: “input”
type: “Input”
top: “data”
input_param {
shape { dim: 1 dim: 3 dim: 810 dim: 1080 }
}
}
layer {
name: “conv1_1”
type: “Convolution”
bottom: “data”
top: “conv1_1”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 100
kernel_size: 3
stride: 1
}
}
layer {
name: “relu1_1”
type: “ReLU”
bottom: “conv1_1”
top: “conv1_1”
}
layer {
name: “conv1_2”
type: “Convolution”
bottom: “conv1_1”
top: “conv1_2”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu1_2”
type: “ReLU”
bottom: “conv1_2”
top: “conv1_2”
}
layer {
name: “pool1”
type: “Pooling”
bottom: “conv1_2”
top: “pool1”
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: “conv2_1”
type: “Convolution”
bottom: “pool1”
top: “conv2_1”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu2_1”
type: “ReLU”
bottom: “conv2_1”
top: “conv2_1”
}
layer {
name: “conv2_2”
type: “Convolution”
bottom: “conv2_1”
top: “conv2_2”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 128
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu2_2”
type: “ReLU”
bottom: “conv2_2”
top: “conv2_2”
}
layer {
name: “pool2”
type: “Pooling”
bottom: “conv2_2”
top: “pool2”
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: “conv3_1”
type: “Convolution”
bottom: “pool2”
top: “conv3_1”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu3_1”
type: “ReLU”
bottom: “conv3_1”
top: “conv3_1”
}
layer {
name: “conv3_2”
type: “Convolution”
bottom: “conv3_1”
top: “conv3_2”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu3_2”
type: “ReLU”
bottom: “conv3_2”
top: “conv3_2”
}
layer {
name: “conv3_3”
type: “Convolution”
bottom: “conv3_2”
top: “conv3_3”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu3_3”
type: “ReLU”
bottom: “conv3_3”
top: “conv3_3”
}
layer {
name: “pool3”
type: “Pooling”
bottom: “conv3_3”
top: “pool3”
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: “conv4_1”
type: “Convolution”
bottom: “pool3”
top: “conv4_1”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu4_1”
type: “ReLU”
bottom: “conv4_1”
top: “conv4_1”
}
layer {
name: “conv4_2”
type: “Convolution”
bottom: “conv4_1”
top: “conv4_2”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu4_2”
type: “ReLU”
bottom: “conv4_2”
top: “conv4_2”
}
layer {
name: “conv4_3”
type: “Convolution”
bottom: “conv4_2”
top: “conv4_3”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu4_3”
type: “ReLU”
bottom: “conv4_3”
top: “conv4_3”
}
layer {
name: “pool4”
type: “Pooling”
bottom: “conv4_3”
top: “pool4”
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: “conv5_1”
type: “Convolution”
bottom: “pool4”
top: “conv5_1”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu5_1”
type: “ReLU”
bottom: “conv5_1”
top: “conv5_1”
}
layer {
name: “conv5_2”
type: “Convolution”
bottom: “conv5_1”
top: “conv5_2”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu5_2”
type: “ReLU”
bottom: “conv5_2”
top: “conv5_2”
}
layer {
name: “conv5_3”
type: “Convolution”
bottom: “conv5_2”
top: “conv5_3”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 512
pad: 1
kernel_size: 3
stride: 1
}
}
layer {
name: “relu5_3”
type: “ReLU”
bottom: “conv5_3”
top: “conv5_3”
}
layer {
name: “pool5”
type: “Pooling”
bottom: “conv5_3”
top: “pool5”
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: “fc6”
type: “Convolution”
bottom: “pool5”
top: “fc6”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 4096
pad: 0
kernel_size: 7
stride: 1
}
}
layer {
name: “relu6”
type: “ReLU”
bottom: “fc6”
top: “fc6”
}
layer {
name: “fc7”
type: “Convolution”
bottom: “fc6”
top: “fc7”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 4096
pad: 0
kernel_size: 1
stride: 1
}
}
layer {
name: “relu7”
type: “ReLU”
bottom: “fc7”
top: “fc7”
}
layer {
name: “score_fr_traffic”
type: “Convolution”
bottom: “fc7”
top: “score_fr”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 17
pad: 0
kernel_size: 1
}
}
layer {
name: “upscore2_traffic”
type: “Deconvolution”
bottom: “score_fr”
top: “upscore2”
param {
lr_mult: 0
}
convolution_param {
num_output: 17
bias_term: false
kernel_size: 4
stride: 2
}
}
layer {
name: “score_pool4_traffic”
type: “Convolution”
bottom: “pool4”
top: “score_pool4”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 17
pad: 0
kernel_size: 1
}
}
layer {
name: “score_pool4c”
type: “Crop”
bottom: “score_pool4”
bottom: “upscore2”
top: “score_pool4c”
crop_param {
axis: 2
offset: 5
}
}
layer {
name: “fuse_pool4”
type: “Eltwise”
bottom: “upscore2”
bottom: “score_pool4c”
top: “fuse_pool4”
eltwise_param {
operation: SUM
}
}
layer {
name: “upscore_pool4_traffic”
type: “Deconvolution”
bottom: “fuse_pool4”
top: “upscore_pool4”
param {
lr_mult: 0
}
convolution_param {
num_output: 17
bias_term: false
kernel_size: 4
stride: 2
}
}
layer {
name: “score_pool3_traffic”
type: “Convolution”
bottom: “pool3”
top: “score_pool3”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 17
pad: 0
kernel_size: 1
}
}
layer {
name: “score_pool3c”
type: “Crop”
bottom: “score_pool3”
bottom: “upscore_pool4”
top: “score_pool3c”
crop_param {
axis: 2
offset: 9
}
}
layer {
name: “fuse_pool3”
type: “Eltwise”
bottom: “upscore_pool4”
bottom: “score_pool3c”
top: “fuse_pool3”
eltwise_param {
operation: SUM
}
}
layer {
name: “upscore8_traffic”
type: “Deconvolution”
bottom: “fuse_pool3”
top: “upscore8”
param {
lr_mult: 0
}
convolution_param {
num_output: 17
bias_term: false
kernel_size: 16
stride: 8
}
}
layer {
name: “score”
type: “Crop”
bottom: “upscore8”
bottom: “data”
top: “score”
crop_param {
axis: 2
offset: 31
}
}

Who can give me some suggestions?

Thanks



Hello, can you provide details on the platforms you are using?

Linux distro and version
GPU type
nvidia driver version
CUDA version
CUDNN version
Python version [if using python]
Tensorflow version
TensorRT version

Also, are you using an NVIDIA container? DIGITS? can you share the version used?

Thank you for your response!

Linux distro and version:Ubuntu 14.04 LTS
GPU type: titan xp 12GB
nvidia driver version:384.111
CUDA version:CUDA8
CUDNN version:cudnn7.1
Python version [if using python]:python2.7.12
Tensorflow version:1.15
TensorRT version:Tensorrt4.0.1.6
fcn8s training code is https://github.com/shelhamer/fcn.berkeleyvision.org

Thanks

Tensorflow version:1.15 ?

You mean 1.5?

https://www.tensorflow.org/versions/

Sorry, Tensorflow version is 1.5.
In fact, My program did not use tensorflow.
I used caffe from command:“git clone https://github.com/NVIDIA/caffe.git -b ‘caffe-0.15’”

Thanks

Hello, to help us debug, can you provide a simple reproduction ? the prototext, model, lables, colors? for both caffe and tensorrt cases.

Hi, sure, I have uploaded these resources:https://drive.google.com/open?id=1rwUCHP9SgxPSKq-PLmt6wZDtuu6VTWKL.The Readme.txt would explain the role of each document.

Thanks

Thanks. I’m trying to repro your issue. getting the following error while running on a DGX1. Were you executing this on a Jetson?

loaded image  source.jpg  (1080 x 810)  13996800 bytes
[cuda]  cudaAllocMapped 13996800 bytes, CPU 0x7f9e42c00000 GPU 0x7f9e42c00000
[cuda]  cudaAllocMapped 13996800 bytes, CPU 0x7f9ebe000000 GPU 0x7f9ebe000000
segnet-console:  beginning processing overlay (1539212545874)
[cuda]   cudaGetLastError()
[cuda]      no kernel image is available for execution on the device (error 48) (hex 0x30)
[cuda]      /mnt/jetson-inference/imageNet.cu:68
[cuda]   cudaPreImageNet((float4*)rgba, width, height, mInputCUDA, mWidth, mHeight)
[cuda]      no kernel image is available for execution on the device (error 48) (hex 0x30)
[cuda]      /mnt/jetson-inference/segNet.cpp:352
segNet::Overlay() -- cudaPreImageNet failed

Hello, I executed it on a PC with 8 Titan Xp gpus(not a jetson).
I encountered this problem also.
I have changed CMakeLists.txt and then executed this success.
V100:
set(
CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS};
-O3
-gencode arch=compute_53,code=sm_53
-gencode arch=compute_62,code=sm_62
-gencode arch=compute_70,code=sm_70
)

Titan Xp:
set(
CUDA_NVCC_FLAGS
${CUDA_NVCC_FLAGS};
-O3 -gencode arch=compute_20,code=sm_20
-gencode arch=compute_20,code=sm_21
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_50,code=compute_50
-gencode arch=compute_53,code=sm_53
)

Yep, that was the problem. I’m on a DGX with P100. Now I’m getting

root@5412c28d4743:/mnt/fcn8s/fcn8s# /mnt/jetson-inference/build/x86_64/bin/segnet-console source.jpg tensorrt4.png --prototxt=deploy_traffic.prototxt \
> --model=train_iter_400000.caffemodel --labels=class.txt \
> --colors=color.txt --input_blob=data --output_blob=score
segnet-console
  args (9):  0 [/mnt/jetson-inference/build/x86_64/bin/segnet-console]  1 [source.jpg]  2 [tensorrt4.png]  3 [--prototxt=deploy_traffic.prototxt]  4 [--model=train_iter_400000.caffemodel]  5 [--labels=class.txt]  6 [--colors=color.txt]  7 [--input_blob=data]  8 [--output_blob=score]


segNet -- loading segmentation network model from:
       -- prototxt:   deploy_traffic.prototxt
       -- model:      train_iter_400000.caffemodel
       -- labels:     class.txt
       -- colors:     color.txt
       -- input_blob  'data'
       -- output_blob 'score'
       -- batch_size  2

[TRT]  TensorRT version 4.0.1
[TRT]  attempting to open cache file train_iter_400000.caffemodel.2.tensorcache
[TRT]  loading network profile from cache... train_iter_400000.caffemodel.2.tensorcache
[TRT]  platform has FP16 support.
[TRT]  train_iter_400000.caffemodel loaded
segnet-console: caskConvolutionLayer.cpp:145: virtual void nvinfer1::task::caskConvolutionLayer::allocateResources(const nvinfer1::cudnn::CommonContext&): Assertion `configIsValid(context)' failed.
Aborted (core dumped)

Which I think is due to TensorRT engine created on a specific GPU can only be used for inference on the same model of GPU. I don’t have a Titan XP readily available… will look ask around and repro. Will keep you updated.

[s]Questions:

• are you using TF-TRT or a standalone TRT application?
• are you using FP16 or INT8?[/s]

answered. standalone trt / fp16 default

According to your log, I find “[TRT] attempting to open cache file train_iter_400000.caffemodel.2.tensorcache
[TRT] loading network profile from cache… train_iter_400000.caffemodel.2.tensorcache”.
You should delete the tensorcache first when you change the caffemodel or gpu.
And TensorRT engine will be created in runingtime.

hello,

quick update. Engineering is working to fix for a future release of TRT.

Thanks!I am looking forward to it.

Hello,

Apologize for the delay, but I think we have arrived at a cause now.

Per engineering, This is the new result engineering got from running

./segnet-console source.jpg tensorrt5_new.png --prototxt=deploy_traffic.prototxt --model=train_iter_400000.caffemodel --labels=class.txt --colors=color.txt --input_blob=data --output_blob=score

output attached.

What changed:
I modified segNet.cpp in https://github.com/dusty-nv/jetson-inference/blob/master/segNet.cpp
< const int argmax = (c_max[0] == ignoreID) ? c_max[1] : c_max[0];

                  const int argmax = c_max[0];

There is a void class in class.txt, I guess it just means “does not belong to any other category”.
In the previous code, if we found a pixel has highest score in class ‘void’, instead of saying it does not belong to any meaningful class,
the code tries to pick the class with the second highest score.

This is not some error inside TensorRT, but outside usage caused the difference.
tensorrt5_new.png

Thank you very mach! The inference result is better now!