TensorRT 3.0 Deconvolution layer not working in Tx2

According to TensorRT 3.0 RC, release notes, a number of restrictions on the deconvolution layer are lifted for both the jetson platforms as well as tesla based platforms.

We tried to load a simple fcn_alexnet based caffe model with the depoy_fcnalex.prototxt file, with the following last two layers, using jetson_inference/segnet_console application. We changed the jetson_inference code slightly to accommodate different input parameters,

./caffe/test/cpp/build/x86_64/bin/segnet_console -model caffe/models/experimental/foo_fcnalex_nov16_54000.caffemodel -prototxt caffe/test/deploy_fcnalex.prototxt -iblob data -oblob softmax_probabilities -input data/194064.jpg

it works in tesla platform, but in jetson tx2, we are getting an error,

Error in jetson txt2,
[GIE] building CUDA engine
[GIE] Internal error: could not find any implementation for node upscore_kitti, try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
[GIE] cudnnBuilder2.cpp (452) - OutOfMemory Error in buildSingleLayer
[GIE] failed to build CUDA engine

Further details on running in jetson tx2
2)We have tried to increase the workspace size,
builder->setMaxWorkspaceSize(16 << 31);
we watched the memory usage in jetson while this is happening through
“watch free -m”
The memory usage never goes beyond 3GB (jetson tx2 has 8GB memory).
We are indeed using swap file of 10GB which ended up not being used as well.

4)We use a pad value of ‘0’ for the first convolution layer in the deploy prototxt.

——8<—————8<——————last two layers of deploy_fcnalex.prototxt —8<——

layer {
name: “upscore_kitti”
type: “Deconvolution”
bottom: “score_fr_kitti”
top: “upscore_kitti”
param {
lr_mult: 0
convolution_param {
num_output: 2
bias_term: false
kernel_size: 63
stride: 32

layer {
name: “softmax_probabilities”
type: “Softmax”
bottom: “upscore_kitti”
top: “softmax_probabilities”
softmax_param {

Perhaps upscale_kitti is not supported in TensorRT?
You may be able to replace it with another layer name, or you may be able to re-implement it using a custom/user layer (which I believe should now be possible)?

builder->setMaxWorkspaceSize(16 << 31);

You’re telling it you have 32 GB of RAM for the workspace. While this is probably not the problem in this case, I don’t think this is a good idea in general :-)

Also, because of the rules of C, both arguments are “ints” and because “ints” are 32-bit, that expression MAY end up evaluating to zero, before it gets promoted to size_t (64-bit.)
The correct way to write it (should you want it) is (16l << 31) (note the “l” suffix.)
This is because of the promotion rules of C.
This program prints “0” even on a 64-bit machine, with 32-bit “int” but 64-bit “long”:

#include <stdio.h>

void print_value(size_t s) {
    printf("%lu\n", s);

int main() {
    print_value(16 << 31);
    return 0;

Yes, indeed this was the problem.

printf("%lu", 16<<31)

, would print ‘0’

So, after doing,

builder->setMaxWorkspaceSize(16 << 31)



would return 0.

I was able to get this working, with

builder->setMaxWorkspaceSize(16 << 24)

. Note that I am not using

(16l << 24)

, because

(16 <<24)

is well within intmax.

After this fix, I am NOT getting cuda build error while running the code in jetson tx2, with the above deploy_fcnalex.prototxt with the layer named uspcore_kitti of type Deconvolution.