Hi,
I have tested the same model in other x86_64 arch and although with small memory the execution keeps using swap but does not segfault.
Today I noticed something strange that executing the model more than once the segfault appears in different lines of the script:
1- First execution
th train_telling.lua -gpuid 0 -mc_evaluation -verbose -finetune_cnn_after -1 -rnn_size 100 -input_encoding_size 100 -batch_size 4
QADatasetLoader loading dataset file: visual7w-toolkit/datasets/visual7w-telling/dataset.json
image size is 28653
QADatasetLoader loading json file: data/qa_data.json
vocab size is 3007
QADatasetLoader loading h5 file: data/qa_data.h5
max question sequence length in data is 15
max answer sequence length in data is 5
assigned 8609 images to split test
assigned 5678 images to split val
assigned 14366 images to split train
initializing RNN
RNN initialized
initializing convNet
convNet initialized
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
Successfully loaded cnn_models/VGG_ILSVRC_16_layers.caffemodel
Segmentation fault
It appears to segfault at this lines:
local cnn_raw = loadcaffe.load(opt.vgg_proto, opt.vgg_model, cnn_backend)
modules.cnn = net_utils.build_cnn(cnn_raw, {encoding_size = opt.input_encoding_size, backend = cnn_backend})
modules.crit = nn.QACriterion()
While after some execution I get past this point and segfault at different line of code:
th train_telling.lua -gpuid 0 -mc_evaluation -verbose -finetune_cnn_after -1
QADatasetLoader loading dataset file: visual7w-toolkit/datasets/visual7w-telling/dataset.json
image size is 28653
QADatasetLoader loading json file: data/qa_data.json
vocab size is 3007
QADatasetLoader loading h5 file: data/qa_data.h5
max question sequence length in data is 15
max answer sequence length in data is 5
assigned 8609 images to split test
assigned 5678 images to split val
assigned 14366 images to split train
initializing RNN
RNN initialized
initializing convNet
convNet initialized
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
Successfully loaded cnn_models/VGG_ILSVRC_16_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
converting first layer conv filters from BGR to RGB...
cnn built
end initialization
shipping to GPU
Segmentation fault
So it sefgfaults at this line:
if gpu_mode then
for k,v in pairs(modules) do v:cuda() end
end
Below the tegrastats output during the last execution where I did a reboot of TX1 in order to free RAM:
ubuntu@tegra-ubuntu:~$ sudo ./tegrastats
RAM 293/3983MB (lfb 688x4MB) SWAP 0/42000MB (cached 0MB) cpu [0%,0%,0%,0%]@816 EMC 4%@665 APE 25 GR3D 0%@76
RAM 293/3983MB (lfb 688x4MB) SWAP 0/42000MB (cached 0MB) cpu [3%,8%,2%,1%]@204 EMC 6%@408 APE 25 GR3D 0%@76
RAM 297/3983MB (lfb 688x4MB) SWAP 0/42000MB (cached 0MB) cpu [1%,5%,2%,17%]@1734 EMC 1%@1600 APE 25 GR3D 0%@76
RAM 373/3983MB (lfb 688x4MB) SWAP 0/42000MB (cached 0MB) cpu [4%,2%,3%,66%]@102 EMC 9%@408 APE 25 GR3D 0%@76
RAM 392/3983MB (lfb 688x4MB) SWAP 0/42000MB (cached 0MB) cpu [3%,17%,32%,2%]@102 EMC 9%@408 APE 25 GR3D 0%@76
RAM 401/3983MB (lfb 688x4MB) SWAP 0/42000MB (cached 0MB) cpu [3%,5%,54%,1%]@1734 EMC 2%@1600 APE 25 GR3D 0%@76
RAM 452/3983MB (lfb 688x4MB) SWAP 0/42000MB (cached 0MB) cpu [1%,0%,100%,0%]@1734 EMC 2%@1600 APE 25 GR3D 0%@76
RAM 524/3983MB (lfb 687x4MB) SWAP 0/42000MB (cached 0MB) cpu [1%,77%,12%,0%]@1734 EMC 2%@1600 APE 25 GR3D 0%@76
RAM 582/3983MB (lfb 668x4MB) SWAP 0/42000MB (cached 0MB) cpu [17%,36%,1%,0%]@102 EMC 10%@408 APE 25 GR3D 0%@76
RAM 610/3983MB (lfb 656x4MB) SWAP 0/42000MB (cached 0MB) cpu [22%,24%,1%,4%]@204 EMC 10%@408 APE 25 GR3D 0%@76
RAM 1011/3983MB (lfb 550x4MB) SWAP 0/42000MB (cached 0MB) cpu [46%,4%,5%,6%]@408 EMC 14%@408 APE 25 GR3D 0%@76
RAM 1012/3983MB (lfb 545x4MB) SWAP 0/42000MB (cached 0MB) cpu [11%,2%,4%,32%]@102 EMC 12%@408 APE 25 GR3D 0%@76
RAM 1012/3983MB (lfb 539x4MB) SWAP 0/42000MB (cached 0MB) cpu [8%,36%,2%,4%]@102 EMC 11%@408 APE 25 GR3D 0%@76
RAM 1012/3983MB (lfb 534x4MB) SWAP 0/42000MB (cached 0MB) cpu [35%,12%,6%,2%]@102 EMC 10%@408 APE 25 GR3D 0%@76
RAM 1012/3983MB (lfb 528x4MB) SWAP 0/42000MB (cached 0MB) cpu [25%,6%,3%,12%]@204 EMC 5%@665 APE 25 GR3D 0%@76
RAM 1012/3983MB (lfb 522x4MB) SWAP 0/42000MB (cached 0MB) cpu [20%,23%,2%,2%]@102 EMC 9%@408 APE 25 GR3D 0%@76
RAM 1013/3983MB (lfb 516x4MB) SWAP 0/42000MB (cached 0MB) cpu [8%,37%,1%,3%]@102 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1012/3983MB (lfb 510x4MB) SWAP 0/42000MB (cached 0MB) cpu [29%,9%,6%,4%]@102 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1013/3983MB (lfb 504x4MB) SWAP 0/42000MB (cached 0MB) cpu [5%,5%,7%,21%]@102 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1013/3983MB (lfb 498x4MB) SWAP 0/42000MB (cached 0MB) cpu [15%,4%,2%,32%]@102 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1013/3983MB (lfb 494x4MB) SWAP 0/42000MB (cached 0MB) cpu [11%,13%,1%,19%]@102 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1013/3983MB (lfb 487x4MB) SWAP 0/42000MB (cached 0MB) cpu [39%,3%,4%,0%]@102 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1013/3983MB (lfb 482x4MB) SWAP 0/42000MB (cached 0MB) cpu [12%,8%,31%,1%]@102 EMC 5%@665 APE 25 GR3D 0%@76
RAM 1013/3983MB (lfb 476x4MB) SWAP 0/42000MB (cached 0MB) cpu [17%,23%,4%,1%]@102 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1014/3983MB (lfb 470x4MB) SWAP 0/42000MB (cached 0MB) cpu [7%,21%,15%,1%]@102 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1013/3983MB (lfb 465x4MB) SWAP 0/42000MB (cached 0MB) cpu [28%,4%,16%,3%]@510 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1013/3983MB (lfb 459x4MB) SWAP 0/42000MB (cached 0MB) cpu [12%,2%,29%,2%]@204 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1013/3983MB (lfb 454x4MB) SWAP 0/42000MB (cached 0MB) cpu [11%,3%,28%,1%]@102 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1077/3983MB (lfb 432x4MB) SWAP 0/42000MB (cached 0MB) cpu [14%,6%,35%,1%]@102 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1077/3983MB (lfb 426x4MB) SWAP 0/42000MB (cached 0MB) cpu [41%,10%,5%,2%]@102 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1093/3983MB (lfb 416x4MB) SWAP 0/42000MB (cached 0MB) cpu [41%,12%,2%,1%]@204 EMC 8%@408 APE 25 GR3D 0%@76
RAM 1123/3983MB (lfb 402x4MB) SWAP 0/42000MB (cached 0MB) cpu [9%,4%,2%,69%]@1734 EMC 4%@1600 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [1%,3%,0%,76%]@102 EMC 16%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [3%,3%,2%,2%]@102 EMC 13%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [1%,2%,3%,2%]@102 EMC 10%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [5%,4%,2%,1%]@102 EMC 9%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [1%,2%,2%,2%]@102 EMC 8%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [3%,3%,1%,1%]@102 EMC 7%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [2%,1%,2%,2%]@102 EMC 7%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [4%,4%,2%,1%]@102 EMC 7%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [2%,2%,1%,4%]@102 EMC 7%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [5%,3%,2%,1%]@102 EMC 6%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [3%,3%,3%,1%]@102 EMC 6%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [4%,3%,3%,1%]@102 EMC 6%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [2%,2%,3%,1%]@102 EMC 6%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [2%,3%,2%,2%]@102 EMC 6%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [3%,1%,2%,1%]@306 EMC 6%@408 APE 25 GR3D 0%@76
RAM 295/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [5%,3%,2%,0%]@102 EMC 6%@408 APE 25 GR3D 0%@76
RAM 294/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [6%,3%,2%,2%]@408 EMC 6%@408 APE 25 GR3D 0%@76
RAM 294/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [5%,1%,1%,0%]@204 EMC 6%@408 APE 25 GR3D 0%@76
RAM 294/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [1%,0%,1%,2%]@102 EMC 6%@408 APE 25 GR3D 0%@76
RAM 294/3983MB (lfb 531x4MB) SWAP 0/42000MB (cached 0MB) cpu [4%,3%,0%,1%]@102 EMC 6%@408 APE 25 GR3D 0%@76
P.S I did a lot of testings in regard to the gcc versions, torch compilation with different compiler versions and LUA/Luajit versions. Below the results.
Jetson TX1 is flashed with latest Jetpack 3.1 and ships with Ubuntu 16.04, CUDA 8.0.34, Cudnn 5.1, gcc version 5.4.0 20160609.
1- Installation of Torch with LUA51 and gcc/g++ 4.8 → Result of execution = Bus error
2- Installation of Torch with LUA52 and gcc/g++ 4.8 → Result of execution = Bus error
3- Installation of Torch with LUA52 and gcc/g++ 4.9 → Result of execution = Bus error
4- Installation of Torch with LUA52 and gcc/g++ 5 → Result of execution = Bus error
5- Installation of Torch with LUAJIT21 and gcc/g++ 5 → Result of execution = Segmentation fault
6- Installation of Torch with LUAJIT21 and gcc/g++ 4.8 → Result of execution = Error, undefined symbol _ZKNgoogle… (gcc version mismatch? seems that LUAJIT21 works only on gcc-5)
7- Installation of Torch with LUA51 and gcc/g++ 5 → Result of execution = Error, ffi.lua 56: expected align (#) on line 579 (seems that LUA51 works only with gcc/g++ 4.8)
Do you think that some incompatibility between the default packages of Ubuntu 16.04 and installation of torch and luarocks packages has to do with the Segmentation fault error?
Regards,
Enid