Problem with running MNIST sample on DLA

Hello,

I have a problem with running sample on Nvidia DRIVE development platform - XAVIER, using CUDA 10 and TensorRT 5.
I am trying to use DLA on the platform by running mnist sample, with command below:

./trtexec --deploy=…/data/mnist/mnist.prototxt --output=prob --useDLACore=1 --fp16 --allowGPUFallback

And I am getting this after I run the command:

deploy: …/data/mnist/mnist.prototxt
output: prob
useDLACore: 1
fp16
allowGPUFallback
Input “data”: 1x28x28
Output “prob”: 10x1x1
Default DLA is enabled but layer prob is not running on DLA, falling back to GPU.
Segmentation fault (core dumped)

Do you have any idea what is the problem?

Hello,
can you share the backtrace from the seg fault? I can succesfully on DGX. Maybe specific to Xavier.

Hello,

I turned verbose on, now I am getting:

output: prob
useDLACore: 1
fp16
allowGPUFallback
verbose
Plugin Creator registration succeeded - GridAnchor_TRT
Plugin Creator registration succeeded - NMS_TRT
Plugin Creator registration succeeded - Reorg_TRT
Plugin Creator registration succeeded - Region_TRT
Plugin Creator registration succeeded - Clip_TRT
Plugin Creator registration succeeded - LReLU_TRT
Plugin Creator registration succeeded - PriorBox_TRT
Plugin Creator registration succeeded - Normalize_TRT
Plugin Creator registration succeeded - RPROI_TRT
Input “data”: 1x28x28
Output “prob”: 10x1x1
Default DLA is enabled but layer prob is not running on DLA, falling back to GPU.


Layers running on DLA:
scale, conv1, pool1, conv2, pool2, ip1, relu1, ip2,



Layers running on GPU:
prob,


Original: 9 layers
After dead-layer removal: 9 layers
Segmentation fault (core dumped)

Is there something more that I can do to get more informations?

Hello,

to help us debug, can you provide the prototxt and backtrace from the core dump?

Sorry for the delay,

I sent all the infos I can get for core dump, that is all I get from the terminal when I run it, is there a log file or something, if you can explain me how can I get more informations than that?

Prototxt is here:

Best regards,
Filip

Thanks for the prototext. To get more info from the coredump, can you try the instructions here: https://unix.stackexchange.com/questions/132192/running-application-ends-with-segmentation-fault

but basicaly:

  1. gdb your_program
  2. run
  3. you’ll hit your “Program received signal SIGSEGV, Segmentation fault.”
  4. bt

copy and paste the bt (back trace) output to me.

Hello,

Thanks for your patience and help, here is the bt:

Thread 1 “trtexec” received signal SIGSEGV, Segmentation fault.
0x0000007f97625168 in ?? () from /usr/lib/libnvdla_compiler.so
(gdb) bt
#0 0x0000007f97625168 in ?? () from /usr/lib/libnvdla_compiler.so
#1 0x0000007fb09cb640 in nvinfer1::utility::dla::TmpWisdom::compile(int, int)
() from /usr/lib/aarch64-linux-gnu/libnvinfer.so.5
#2 0x0000007fb09d7840 in nvinfer1::builder::dla::validateGraphNode(std::unique_ptr<nvinfer1::builder::Node, std::default_deletenvinfer1::builder::Node > const&) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.5
#3 0x0000007fb09472ac in nvinfer1::builder::createForeignNodes(nvinfer1::builder::Graph&, nvinfer1::builder::ForeignNode* (*)(nvinfer1::Backend, std::string const&), nvinfer1::CudaEngineBuildConfig const&) ()
from /usr/lib/aarch64-linux-gnu/libnvinfer.so.5
#4 0x0000007fb0993504 in nvinfer1::builder::applyGenericOptimizations(nvinfer1::builder::Graph&, nvinfer1::CpuMemoryGroup&, nvinfer1::CudaEngineBuildConfig const&) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.5
#5 0x0000007fb095b42c in nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, nvinfer1::rt::HardwareContext const&, nvinfer1::Network const&) ()
from /usr/lib/aarch64-linux-gnu/libnvinfer.so.5
#6 0x0000007fb09c62ec in nvinfer1::builder::Builder::buildCudaEngine(nvinfer1::INetworkDefinition&) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.5
#7 0x0000000000403cb0 in caffeToTRTModel() ()
#8 0x0000000000406050 in createEngine() ()
#9 0x0000000000406534 in main ()

Best regards,
Filip Baba

Hello,

Engineering believes this is fixed in the upcoming TRT release. Please stay tuned for the release announcement.