resnet50 get error result on px2 with tensorRT2.1.2

Hi!
when i run resnet50(down load from internet,standard cls net) with tensorRT on px2 and caffe on pc,but the result of px2 is diffenent with caffe on cpu/gpu(1080),the result of caffe on pc is right,i test different pictures.

  1. on px2: i make use of the sampleFasterRCNN on tensorRT2.1.2 .
    i write the testPlugin just for get the input data of “res5c”, test code and part deploy as follow

int g_c = 0;int g_h = 0; int g_w = 0;
class testPlugin : public IPlugin
{
public:
testPlugin() {}
testPlugin(const void* buffer, size_t size)
{
assert(size == sizeof(mCopySize));
mCopySize = reinterpret_cast<const size_t>(buffer);
}

int getNbOutputs() const override
{
	printf("testPlugin:getNbOutputs...\n");
	return 1;
}
Dims getOutputDimensions(int index, const Dims* inputs, int nbInputDims) override
{	
	printf("testPlugin:getOutputDimensions...\n");
	g_c = inputs[0].d[0];g_h = inputs[0].d[1];g_w = inputs[0].d[2];
	printf("CHW:%d %d %d\n",inputs[0].d[0],inputs[0].d[1],inputs[0].d[2]);
	return DimsCHW(inputs[0].d[0], inputs[0].d[1],inputs[0].d[2]);
}

int initialize() override
{
	printf("testPlugin:initialize...\n");
	return 0;
}

void terminate() override
{
	printf("testPlugin:terminate...\n");
}

size_t getWorkspaceSize(int) const override
{
	printf("testPlugin:getWorkspaceSize...\n");
	return 0;
}

// currently it is not possible for a plugin to execute "in place". Therefore we memcpy the data from the input to the output buffer
int enqueue(int batchSize, const void*const *inputs, void** outputs, void*, cudaStream_t stream) override
{
	printf("testPlugin:enqueue...\n");
	//CHECK(cudaMemcpyAsync(outputs[0], inputs[0], mCopySize * batchSize, cudaMemcpyDeviceToDevice, stream));
	testPlugin_forward_cpu(inputs, outputs,g_c,g_h,g_w);
	return 0;
}

size_t getSerializationSize() override
{
	printf("testPlugin:getSerializationSize...\n");
	return sizeof(mCopySize);
}

void serialize(void* buffer) override
{
	printf("testPlugin:serialize...\n");
	*reinterpret_cast<size_t*>(buffer) = mCopySize;
}

void configure(const Dims*inputs, int nbInputs, const Dims* outputs, int nbOutputs, int)	override
{
	printf("testPlugin:configure...\n");
	mCopySize = inputs[0].d[0] * inputs[0].d[1] * inputs[0].d[2] * sizeof(float);
}

protected:
size_t mCopySize;
//int c,h,w;
};

void testPlugin_forward_cpu(const void*const input, void* output,const int c,const int h,const int w)
{
printf(“begin testPlugin_forward_cpu…\n”);fflush(stdout);

////////gpu—>cpu/////
printf(“c:%d h:%d w:%d\n”,c,h,w);fflush(stdout);
float* data_buf = (float*)malloc(chwsizeof(float));
cudaMemcpy(data_buf, (const float
)input[0], chw*sizeof(float), cudaMemcpyDeviceToHost);

//debug

FILE fp_in = fopen(“log_px2_testPlugin_0307.txt”, “a+”);
for(int i = 0; i < c
h*w; i++) //
{
fprintf(fp_in,“%f\n”,data_buf[i]);
}
fclose(fp_in);

free(data_buf);
printf(“end testPlugin_forward_cpu…\n”);fflush(stdout);
}

name: “ResNet-50”
#input: “data”
#input_dim: 1
#input_dim: 3
#input_dim: 224
#input_dim: 224

layer {
name: “data”
type: “MemoryData”
top: “data”
top: “label”
memory_data_param {
batch_size: 1
channels: 3
height: 224
width: 224
}

}

layer {
bottom: “res5b”
bottom: “res5c_branch2c”
top: “res5c”
name: “res5c”
type: “Eltwise”
}

layer {
bottom: “res5c”
top: “res5c”
name: “res5c_relu”
type: “ReLU”
}

layer {
bottom: “res5c”
top: “pool5”
name: “pool5”
type: “Pooling”
pooling_param {
kernel_size: 7
stride: 1
pool: AVE
}
}

layer {
bottom: “pool5”
top: “fc1000”
name: “fc1000”
type: “InnerProduct”
inner_product_param {
num_output: 1000
}
}

layer {
bottom: “fc1000”
top: “prob”
name: “prob”
type: “Softmax”
}
2.on cpu/gpu(1080): i get the input (name: “res5c”)same with on px2
but the result if different!

the px2 (bottom0 of “res5c”):
0.000000
0.000000
3.217169
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.901252
0.332424
0.000000
0.000000
0.000000
0.000000
0.000000
1.156103
0.837259
1.147727
0.676404
0.000000
0.000000
0.000000
0.000000
0.000000
0.001276
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000

the caffe on pc(bottom0 of “res5c”):
3.310690
4.821366
4.264204
1.190538
1.798670
0.946362
1.013385
3.501086
1.795165
2.163740
0.860631
0.187800
0.000000
0.000000
1.204569
0.381836
1.398308
0.000000
0.000000
0.000000
0.353026
0.000000
0.000000
0.000000
0.488210
0.000000
0.000000
0.000000
0.000000
0.000000
0.000000
1.233509
0.000000
0.000000
0.000000
0.000000
0.779470
0.230680
1.254745
0.000000
0.000000
0.424548
0.000000
0.000000
0.000000
0.089821
0.000000
2.994178
3.356085
0.000000

so,i want to known how to generate the different result?
anyone’s support is greatly appreciated!
thanks!

the input image is the same one ,224*224 ,BGR order,and substract mean value( float pixelMean[3]{ 102.9801f, 115.9465f, 122.7717f })

Dear zhou-lw,
Please make sure the inputs and outputs are in NCHW format. Could you please upgrade to latest tensorRT and check the issue

thanks for your reply!

1.input image
i use opencv to load JPG image like this,already substract means before this code.
just one image picture for once test.


for (int h = 0; h < height; ++h) {
for (int w = 0; w < width; ++w) {
data[(0 * height + h) * width + w] = float(cv_resized.atcv::Vec3f(cv::Point(w, h))[0]);// Blue
data[(1 * height + h) * width + w] = float(cv_resized.atcv::Vec3f(cv::Point(w, h))[1]);// Green
data[(2 * height + h) * width + w] = float(cv_resized.atcv::Vec3f(cv::Point(w, h))[2]);// Red
}
}

and set data to the net input like this:
doInference(*context, data, imInfo, bboxPreds, clsProbs, rois, 1);

2.on px2 ,i cann’t run other version tensorRT,like tensorRT-3.0.1.because the files like libnvinfer.so.3 is x86 architecture in tensorRT-3.0.1.zip that down load from www.nvidia.com
where could i get arm architecture like libnvinfer.so.3 and so on?

thanks!

Dear zhou-lw,
Could you please upgrade to latest to Drive version if possible. It has latest tensorRT

sorry ,what’s the “Drive version” ?
how to upgrade to latest Drive version on px2?

but ,currently,it’s inconvenient to upgrade to latest to Drive version

Dear zhou-lw,
You can download latest drive version from https://developer.nvidia.com/nvidia-drive-downloads. Please file a bug with all the information to reproduce at https://developer.nvidia.com

thanks!
i think “upgrade to latest to Drive version” is not best solution for me,i will think about it and try later.
before it,did anybody encounter the same problem ?anyone’s support is greatly appreciated!

output of cls_prob:
this is part of result on resnet50 caffemodel
on pc (top10):
label: score
904: 0.971739
552: 0.007146
651: 0.003886
794: 0.003475
40: 0.002886
556: 0.002132
46: 0.001222
722: 0.000822
108: 0.000659
733 0.000588

on px2(top10)
label: score
600: 0.016811
189: 0.007705
898: 0.00704
700: 0.005913
499: 0.005899
223: 0.005651
868: 0.004859
929: 0.004547
457: 0.00445
791: 0.004443

Hi SivaRamaKrishna
Dilated convolutions and stride convolutions are not supported by TensorRT2.1.2?
like next

layer {
bottom: “res5a_branch2a”
top: “res5a_branch2b”
name: “res5a_branch2b”
type: “Convolution”
convolution_param {
num_output: 512
kernel_size: 3
dilation: 2
pad: 2
stride: 1
bias_term: false
}
param {
lr_mult: 1.0
}
}

and this

layer {
name: “conv5_2_V”
type: “Convolution”
bottom: “conv5_1_H”
top: “conv5_2_V”
convolution_param {
num_output: 390
pad: 1
pad: 0
kernel_size: 3
kernel_size: 1
stride: 1
}

thanks!

Dear zhou-lw,
Dilated convolutions are supported from 3.0.1.
Could you please upgrade your Drive version

stride convolutions are also not supported by TensorRT2.1.2?

how to upgrade Drive version,after i have download the .run file?
as the guidance of “DRIVE PX 2 PDK installation with DriverInstall”?
what problem maybe encounter when upgrade process?

Dear zhou-lw,
You can follow the installation steps given at https://docs.nvidia.com/drive/driveinstall_docs/#developertools/mobile/driveinstall/linux/5.0.5.0b/sdk/install.htm%3FTocPath%3D_____3.
The upgrade process is smooth and let us know if you face any issues.

Dear zhou-lw,

We’ve updated NVIDIA DRIVE 5.0.5.0bL (April 2, 2018) version on https://developer.nvidia.com/nvidia-drive-downloads
Please download the latest version.
And then follow the DRIVEInstall installation instructions to complete the install. Thanks.

I also get this problem, can you tell me how did you resolve this issue finally?
I find in the top-eleven layer(eleventh layer is common relu), I can match the outputs successed. After twelfth layer(common convolution), the outputs of every layer are fault.

sorry replay later
I find the problem finally.There are dilation operation and 1*3 kernel size convolution in my prototxt file,but tensorRT2.1.2 is not support them.

dear zhou-lw,

could you please tell me after using tensorrt to predict the results on px2 correctly, how could you get the video steam in and consume the frames of the video and finnally come up with the prediction?

I cannot find any clue in driveworks API.

sorry ,i use image ,not video.base the FasterRCNN demo of tensorRT.