A strange Error when using tensorRT3 to accelerate my SSD model(caffe)

Hi,

I try to accelerate my two class SSD model through tensorRT3. But there is an odd error in the first few layers like Convolution ,Pooling and ReLU. I found the data(blob) are transposed in each channel.The data show as follows. By the way, the input data is zeros mat.

pool3 blob:
data in caffe (get by python and caffe, shape:63*63):

pool3 (1, 256, 63, 63)
[[[[ 73.57597351 45.40362549 45.41010284 …, 45.39312744 48.37147522
81.98123169]
[ 65.84190369 31.6844101 31.48287964 …, 31.46398163 35.48117065
75.63328552]
[ 65.82327271 31.52442169 31.29610443 …, 31.27752686 35.30277252
75.50411987]
…,
[ 65.82125854 31.51956177 31.29107666 …, 31.27249908 35.29859924
75.50247955]
[ 71.91233063 37.50626373 37.51599884 …, 37.50302887 40.03573608
76.09189606]
[ 88.11582947 62.16601944 62.17658997 …, 62.17053223 62.95288849
87.95324707]]

data in tensorrt(get by my own printPlugin layer, shape:10*10):

73.576 65.8419 65.8233 65.823 65.823 65.823 65.823 65.823 65.823 65.823
65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823
65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823
65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823
65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823
65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823 65.823
65.8213 71.9123 88.1158 45.4036 31.6844 31.5244 31.522 31.522 31.522 31.522
31.522 31.522 31.522 31.522 31.522 31.522 31.522 31.522 31.522 31.522
31.522 31.522 31.522 31.522 31.522 31.522 31.522 31.522 31.522 31.522

The red color is the first column of the blob in caffe, and the green color is the second column of the blob in caffe. And the index of the green is 63. So it looks like there is a transpose in the W and H. And I check some other blobs in the first few layers. That’s the same case.
So what may cause this problem? Whether there some errors in my weights model?

Hi,

Could you share the print code in printPlugin?
Thanks.

Hi,

Code in printPlugin:
int enqueue(int batchSize, const voidconst inputs, void* outputs, void, cudaStream_t stream) override
{
std::cout << “print enqueue” << std::endl;
std::cout << "size= " << cpySize << std::endl;
float *data = new float[cpySize];
newCHECK(cudaMemcpyAsync(data, inputs[0], cpySize * sizeof(float), cudaMemcpyDeviceToHost, stream));
for(int i=0; i < 100; i++)
{
std::cout << data[i+63*63] << " ";
if((i+1)%10==0) std::cout << std::endl;
}
newCHECK(cudaMemcpyAsync(outputs[0], inputs[0], cpySize * sizeof(float), cudaMemcpyDeviceToDevice, stream));
delete data;
return 0;
}

prototxt:
layer {
name: “pool3”
type: “Pooling”
bottom: “conv3_3_P”
top: “pool3”
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: ‘printDims’
type: ‘IPlugin’
bottom: ‘pool3’
top: ‘pool3’
}

The start index is 63*63,the start index of the second channel. Because the data in the first channel are all the same.

By the way,how could I save and load the gieModel. I have to wait a long time for the function caffeToGIEMode everytime when runing the program.

Hi,

Could you also check if the output of layer pool3 is different?

For read/write tensorRT engine, please check our sample code here:
[url]https://github.com/dusty-nv/jetson-inference/blob/master/tensorNet.cpp#L254[/url]

Thanks

Hi,

This problem is probably because the weight model has been processed, maybe the Low rank decomposition, not sure…
But I tried the source model brfore these processing. And it works.

Thanks

Hi,

Thanks for your feedback.

Could you also share the source code for reproducing this issue?
We want to reproduce this issue to get more information for further suggestion.

Thanks.

Hi,

Yeah, of course. But what should I do to send the code to you? If it is convenient to send a email, please give me the address first.

Thanks.

Hi,

Sorry for that we can’t disclose our email here.
Could you pass it via private message?

Thanks.

Hi,

Thanks for your sharing. We have a quick check on your code.
It looks complicated and not easy for us to debug where the issue comes from.
Could you help to simplify this source? Maybe one transpose layer should be enough?

Thanks.

Hi,

We are checking this issue internally, and it may take times to have an update.
Will reply information to you later.

Thanks and sorry for the inconvenience.

Hi, I am also trying to accelerate SSD with TensorRT, do you have any guidelines on how I should so so? My platform is the Jetson TX2, should I then use TensorRT 3.0 or 2.1 on it?

Hi, marvinreza

It’s recommended to use TensorRT3.0 for better performance.
You can install it directly via JetPack3.2:
[url]https://developer.nvidia.com/embedded/downloads#?search=jetpack%203.2[/url]

For accelerating SSD with TensorRT, please check this topic for more information:

Thanks.

Hi Aasta, I have installed TensorRT on my host PC(for testing) with a GTX 1070 by downloading the tar file and adding the path to the library environment variable(I did not install the python TensorRT package though) When I try to build your Face-recognition example I get the following error:

fatal error: NvCaffeParser.h: No such file or directory
 #include "NvCaffeParser.h"

I am able to run for instance sampleFasterRCNN from within the TensorRT-3.0.1 folder though. What should I do to fix this error?

Hi,

Please remember to add the tarball path into global parameter.
Check this comment for information:
[url]Is it possible to install different version of TensorRT in same Ubuntu system? - DeepStream SDK - NVIDIA Developer Forums

Thanks.

Hi, I still get the same error although I folllowed the guidelines in the link. Any other thing I can do?

I installed TensorRT 3.0.1, could it be that Face-recognition only works with TensorRT 2.1?
Edit: I was able to find a temporary fix by manually setting ${GIE_PATH} manually in the CMakeLists.txt. Though it still crashes because it is unable to find -lncaffe_parser and -lnvinfer

Hi,

Sorry for there are some missing.

Face recognition is a TensorRT 2.1 sample of Jetson.
If you want to use it on desktop GPU, there are some configuration should be updated to the x86 environment.

Thanks, and sorry for the inconvenience.