Problem loading Torch model on TX1 - 'Failed to load function from bytecode:'

Hi,

So i’m currently trying to load a network model via torch on an Nvidia TX1. When I try to load the model

net = torch.load('modelfile.t7','ascii')

I get the following error:
https://cloud.githubusercontent.com/assets/5595336/22958374/dda6d76c-f2fc-11e6-8a7e-a639a9c7e294.png

The model loads fine on my Ubuntu 14.04 desktop, so I tried loading the same model, converting it to binary and then trying to load the converted file

net = torch.load('modelfile.bin')

But i still get a similar error:

https://cloud.githubusercontent.com/assets/5595336/22958394/05b5aa1c-f2fd-11e6-8eec-daab7b120891.png

I’ve noticed that a few people have had the same errors in the past but most people seem to have been able to get past this by using an ‘ascii’ version of the model since it’s platform independent (?). I seem to have had no luck with that. The other set of individuals who faced this problem were on a 32bit system. But my Nvidia TX1 is currently running on Ubuntu 16.04 (64bit).

For anyone willing or interested in recreating these results:

I installed JetPack (JetPack-L4T-2.3.1-linux-x64.run) and verified that my installation of CUDA 8.0 and OpenCV is functional.

For Torch, I used dusty-nv’s installation script
https://github.com/dusty-nv/jetson-reinforcementThe installation script in particular is https://github.com/dusty-nv/jetson-reinforcement/blob/master/CMakePreBuild.sh
It all looks pretty straightforward.

The model file in specific is https://s3.amazonaws.com/mc-cnn/net_kitti_fast_-a_train_all.t7

Any tips on how to fix this problem is gladly appreciated. If anyone has any ideas on how I can tweak the model on my desktop machine to make it work here I’d love to hear it.

Thanks in advance,

Shreyas

Hi,

I guess the reason of this issue is that this model is serialized with 32-bit machine and cannot be load by 64-bit machine.

I have verified torch on tx1 with cifar10, all goods without error.

local torch = require 'torch'
require 'paths'
if (not paths.filep("cifar10torchsmall.zip")) then
    os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
    os.execute('unzip cifar10torchsmall.zip')
end
trainset = torch.load('cifar10-train.t7')
testset = torch.load('cifar10-test.t7')

By the way, I also tried your model with our 14.04 desktop, still hit some error.
Could you help us check the sanity of this model?

vyu@vyu-server:~$ luajit
LuaJIT 2.1.0-beta1 -- Copyright (C) 2005-2015 Mike Pall. http://luajit.org/

 _____              _     
|_   _|            | |    
  | | ___  _ __ ___| |__  
  | |/ _ \| '__/ __| '_ \ 
  | | (_) | | | (__| | | |
  \_/\___/|_|  \___|_| |_|

JIT: ON SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse
th> require 'nn'
th> require 'torch'
th> require 'cutorch'
th> require 'cudnn'
th> require 'cunn'
th> net = torch.load('net.t7','ascii')
/home/vyu/Arale/torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <nn.Normalize2>
stack traceback:
	[C]: in function 'error'
	/home/vyu/Arale/torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
	/home/vyu/Arale/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/home/vyu/Arale/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/home/vyu/Arale/torch/install/share/lua/5.1/nn/Module.lua:184: in function 'read'
	/home/vyu/Arale/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
	/home/vyu/Arale/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/home/vyu/Arale/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
	stdin:1: in main chunk
	[C]: at 0x004065d0
th>

Hi AastaLLL,

First, thank you for taking the time and effort to look into this. I appreciate it!

So I ran the following code and the models seem to load without error:

local torch = require 'torch'
require 'paths'
if (not paths.filep("cifar10torchsmall.zip")) then
    os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
    os.execute('unzip cifar10torchsmall.zip')
end
trainset = torch.load('cifar10-train.t7')
testset = torch.load('cifar10-test.t7')

With respect to loading the model,

Normalize2.lua is part of the original respository - https://github.com/jzbontar/mc-cnn

If you clone that repo and try to load the net from the mc-cnn directory you should be able to load it without the above error.

I noticed that I only get the binary errors when I try to load these models (fast models):
https://s3.amazonaws.com/mc-cnn/net_kitti_fast_-a_train_all.t7
https://s3.amazonaws.com/mc-cnn/net_kitti2015_fast_-a_train_all.t7

But I can successfully load these models (their slow/accurate model):
https://s3.amazonaws.com/mc-cnn/net_kitti_slow_-a_train_all.t7
https://s3.amazonaws.com/mc-cnn/net_kitti2015_slow_-a_train_all.t7

So if you’re trying to recreate these results from scratch,

git clone https://github.com/jzbontar/mc-cnn.git
cd mc-cnn/
wget -P net/ https://s3.amazonaws.com/mc-cnn/net_kitti_fast_-a_train_all.t7

And from inside a luajit environment

require 'torch'
require 'cunn'
require 'cutorch'
require 'image'
require 'libcv'
require 'cudnn'
cudnn.benchmark = true
include('Margin2.lua')
include('Normalize2.lua')
include('BCECriterion2.lua')

net = torch.load('net/net_kitti_fast_-a_train_all.t7','ascii')

And you should see the same ‘Failed to load function from bytecode’ error.

Once again, thank you for taking the time to look into this. I will look into the 32bit/64bit incompatibility but considering the slow models work I doubt that the author compiled the two models on separate machines.

Thanks.

I confirm I can reproduce the same result. I downloaded both the fast and slow models.

git clone https://github.com/jzbontar/mc-cnn.git
cd mc-cnn/
wget -P net/ https://s3.amazonaws.com/mc-cnn/net_kitti_fast_-a_train_all.t7
wget -P net/ https://s3.amazonaws.com/mc-cnn/net_kitti_slow_-a_train_all.t7

Then run (dofile) this script in th/luajit.

-- loadmodel.lua
require 'torch'
require 'cunn'
require 'cutorch'
require 'image'
require 'libcv'
require 'cudnn'
cudnn.benchmark = true
include('Margin2.lua')
include('Normalize2.lua')
include('BCECriterion2.lua')
include('StereoJoin.lua')
include('SpatialConvolution1_fw.lua')

net_slow = torch.load('net/net_kitti_slow_-a_train_all.t7', 'ascii')
net_fast = torch.load('net/net_kitti_fast_-a_train_all.t7', 'ascii')

On a x64 PC with CUDA 8.0 running Ubuntu 14.04, I could load both models successfully. But on Jetson TX1, I can load the slow model but not the fast one.

I think the error message (Failed to load function from bytecode: [string “5…”]:1: unexpected symbol near ‘5’) likely indicates certain data type bearing a size on Jetson TX1 different from that on x64. I will probably investigate more when I have time.

Hi Jkjung13,

Thank you for the taking the time to do this. I appreciate the sanity check, haha. I’m looking into the code deeper now and I’m going to try and retrain the model and generate my own ASCII file. I’m thinking that should do the trick.

Thanks,
Shreyas

Hi,

I checked the model ‘net_kitti_fast_-a_train_all.t7’, and found something strange:

268
^[LJ^B^@7@/home/jz1640/torch/install/share/lua/5.1/nn/Module.luaÌ^A^@^D ^@^D^@^S5:^F^R^E^@^@9^D^@^@B^D^B^B^O^@^D^@X^E^M<80>^R^E^@^@9^D^A^@B^D^B^A^R^E^@^@9^D^B^@^R^F^A^@^R^G^B^@)^H^A^@B^D^E^A^R^E^@^@9^     D^C^@^R^F^C^@B^D^C^AK^@^A^@^UupdateParameters^VaccGradParameters^WzeroGradParameters^Oparameters^A^A^A^A^A^B^B^B^C^C^C^C^C^C^D^D^D^D^Fself^@^@^Tinput^@^@^TgradOutput^@^@^Tlr^@^@^T^@^@
3
12

Does this is normal for an weight file?

Hi singularity7,

Any update? Have you confirmed our finding of the weight file is normal?

Thanks

Hi Kayccc,

It seems as if the problem was indeed that strange line in the weight file. I have documented a suggested fix for the issue in this post - https://github.com/jzbontar/mc-cnn/issues/19#issuecomment-283538982
Thanks everyone for the help!