I’m running into a segfault while trying to run the TensorRT sample_uff_ssd app with the --int8 flag on a Jetson AGX Xavier board.
I’ve successfully run simpler examples such as the Uff MNIST example… this is the first sample I’m trying to run with int8 which requires calibration. Without the --int8 flag, it ran fine in FP32 mode and was able to identify the objects in the sample PPM images. (As part of getting the FP32 mode to work, I downloaded the model, ran the script to convert the frozen graph to Uff, identified which file was the working ssd.prototxt file – non-obvious by the way, etc.)
For the calibration images, I downloaded the COCO 2017 val zip file, unzipped the images into a temporary directory. I then converted from jpg to PPM via a ‘mogrify --format ppm *.jpg’, and moved all resulting ppm files to /workspace/tensorrt/data/ssd. I then created a list.txt file which contained the names of all the PPM files, with the ‘.ppm’ extension removed, with each file on a separate line.
I very recently loaded the board (about 1 week ago) using a fresh install of jetpack. Unfortunately I’m not 100% sure how to report the version that I’m running on the AGX board itself, so if there’s any other helpful info I can collect let me know.
nvidia@jetson-0423418010368:~/tensorrt/bin$ ./sample_uff_ssd --int8 ../data/ssd/sample_ssd_relu6.uff Begin parsing model... End parsing model... Begin building engine... Batch #0 Calibrating with file 000000000139.ppm Calibrating with file 000000000285.ppm Calibrating with file 000000000632.ppm Calibrating with file 000000000724.ppm Calibrating with file 000000000776.ppm Calibrating with file 000000000785.ppm Calibrating with file 000000000802.ppm Calibrating with file 000000000872.ppm Calibrating with file 000000000885.ppm Calibrating with file 000000001000.ppm Calibrating with file 000000001268.ppm Calibrating with file 000000001296.ppm Calibrating with file 000000001353.ppm Calibrating with file 000000001425.ppm Calibrating with file 000000001490.ppm Calibrating with file 000000001503.ppm Calibrating with file 000000001532.ppm Calibrating with file 000000001584.ppm Calibrating with file 000000001675.ppm Calibrating with file 000000001761.ppm Calibrating with file 000000001818.ppm Calibrating with file 000000001993.ppm Calibrating with file 000000002006.ppm Calibrating with file 000000002149.ppm Calibrating with file 000000002153.ppm Calibrating with file 000000002157.ppm Calibrating with file 000000002261.ppm Calibrating with file 000000002299.ppm Calibrating with file 000000002431.ppm Calibrating with file 000000002473.ppm Calibrating with file 000000002532.ppm Calibrating with file 000000002587.ppm Calibrating with file 000000002592.ppm Calibrating with file 000000002685.ppm Calibrating with file 000000002923.ppm Calibrating with file 000000003156.ppm Calibrating with file 000000003255.ppm Calibrating with file 000000003501.ppm Calibrating with file 000000003553.ppm Calibrating with file 000000003661.ppm Calibrating with file 000000003845.ppm Calibrating with file 000000003934.ppm Calibrating with file 000000004134.ppm Calibrating with file 000000004395.ppm Calibrating with file 000000004495.ppm Calibrating with file 000000004765.ppm Calibrating with file 000000004795.ppm Calibrating with file 000000005001.ppm Calibrating with file 000000005037.ppm Calibrating with file 000000005060.ppm Segmentation fault (core dumped)
Rerunning the debug version with gdb, I see the following stack trace:
Calibrating with file 000000005060.ppm Thread 1 "sample_uff_ssd_" received signal SIGSEGV, Segmentation fault. __memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:108 108 ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory. (gdb) bt #0 __memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:108 #1 0x0000007fab726ae4 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6 #2 0x0000007fab726e3c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6 #3 0x00000055555627ec in samplesCommon::readPPMFile<3, 300, 300> (filename="../data/ssd/000000000285.ppm", ppm=...) at ../common/common.h:447 #4 0x000000555555ec8c in BatchStream::update (this=0x7fffffe198) at BatchStreamPPM.h:110 #5 0x000000555555e6f4 in BatchStream::next (this=0x7fffffe198) at BatchStreamPPM.h:51 #6 0x000000555555f478 in Int8EntropyCalibrator::getBatch (this=0x7fffffe190, bindings=0x55b1b26300, names=0x55b1d76a40, nbBindings=1) at BatchStreamPPM.h:170 #7 0x0000007fb0974890 in nvinfer1::builder::calibrateEngine(nvinfer1::IInt8Calibrator&, nvinfer1::ICudaEngine&, std::unordered_map<std::string, float, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, float> > >&, bool) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.5 #8 0x0000007fb0946250 in nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, nvinfer1::rt::HardwareContext const&, nvinfer1::Network const&) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.5 #9 0x0000007fb09b02ec in nvinfer1::builder::Builder::buildCudaEngine(nvinfer1::INetworkDefinition&) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.5 #10 0x000000555555ac7c in loadModelAndCreateEngine (uffFile=0x55558422c0 "../data/ssd/sample_ssd_relu6.uff", maxBatchSize=2, parser=0x5555824730, calibrator=0x7fffffe190, trtModelStream=@0x7fffffdf50: 0x0) at sampleUffSSD.cpp:162 #11 0x000000555555b5dc in main (argc=2, argv=0x7fffffef48) at sampleUffSSD.cpp:539 (gdb)
Let me know if there’s anything else I can provide that might help. I saw a similar topic on the forum from someone who saw this within a docker container, but searching the forum I didn’t see a similar issue. Apologies if I missed it!