batch inference with TensorRT

Could anyone share their experience on running batch inference with TensorRT?

Running into segmentation fault when batch size are bigger than 6 with image size 256 x 512 x 3 and an encoder-decoder type of segmentation network. The code was exactly the sampleOnnxMNIST code but with a different network and input size. The images were saved into an 2d array [batch_size][input_hinput_winput_c] and memcpy to the gpu buffer[inputbIdx].

What would be the best practice for batch inference and achieving higher batch inference? Please advice.

using TensorRT 4.0.1.6 C++ API on Ubuntu 16.04 & 1080Ti.

Hello, can you provide the full traceback from the seg fault to help us debug?

Thanks for the follow-up!

cuda-gdb ../bin/sample_uff_seg_batch_debug 

NVIDIA (R) CUDA Debugger
9.0 release
Portions Copyright (C) 2007-2017 NVIDIA Corporation
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../bin/sample_uff_seg_batch_debug...done.
(cuda-gdb) run
Starting program: /home/code/TensorRT/bin/sample_uff_seg_batch_debug 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffd1bff700 (LWP 26749)]
[New Thread 0x7fffd13fe700 (LWP 26750)]
[New Thread 0x7fffccbfd700 (LWP 26751)]
[New Thread 0x7fffcc3fc700 (LWP 26752)]
[New Thread 0x7fffc7bfb700 (LWP 26753)]
[New Thread 0x7fffc73fa700 (LWP 26754)]
[New Thread 0x7fffc2bf9700 (LWP 26755)]

Thread 1 "sample_uff_seg_" received signal SIGSEGV, Segmentation fault.
0x00000000004060f9 in main (argc=<error reading variable: Cannot access memory at address 0x7fffff6fda8c>, 
    argv=<error reading variable: Cannot access memory at address 0x7fffff6fda80>) at sampleUffSEG.cpp:144
144	{
(cuda-gdb) bt
#0  0x00000000004060f9 in main (argc=<error reading variable: Cannot access memory at address 0x7fffff6fda8c>, 
    argv=<error reading variable: Cannot access memory at address 0x7fffff6fda80>) at sampleUffSEG.cpp:144
(cuda-gdb) bt -10
#0  0x00000000004060f9 in main (argc=<error reading variable: Cannot access memory at address 0x7fffff6fda8c>, 
    argv=<error reading variable: Cannot access memory at address 0x7fffff6fda80>) at sampleUffSEG.cpp:144

I’m also attaching the cpp file for your reference.

sampleUffSEG.cpp (6.12 KB)

After modifying input and output to 1d dynamic memory array, the problem goes away.