Segmentation fault w/ libnvidia-encode.so.381.09

Vanista · April 26, 2017, 4:04pm

Hi

We got a strange crash in our NVENC programs that run on a host with a TITAN Xp device.
The crash points to low level libc functions (see attached valgrind report), and happens only when the program is linked with nvidia-encode.

Furthermore, the simple fact of linking with the library will lead to the crash; the program won’t make any calls to the nvenc API. It seems like nvidia-encode has a global constructor method that can thrash or shadow some function addresses.

Finally the same code and test setup won’t crash on another host with a TITAN X or M6000 device.
We’ve recently upgraded to beta driver 381.09 to workaround another bug.

Valgrind report for an empty unit test

Process terminating with default action of signal 11 (SIGSEGV)
 Bad permissions for mapped region at address 
   at : ???
   by : printf (in /usr/lib64/libc-2.17.so)
  gtest.cc:2660  testing::internal::ColoredPrintf
  gtest.cc:2747  testing::internal::PrettyUnitTestResultPrinter::OnTestIterationStart
  gtest.cc:2995  testing::internal::TestEventRepeater::OnTestIterationStart
  gtest.cc:4301  testing::internal::UnitTestImpl::RunAllTests
   by : bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2078)
   by : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2114)
  gtest.cc:3926  testing::UnitTest::Run
   by : RUN_ALL_TESTS (gtest.h:2288)
   by : main (gtest_main.cc:37)
Jump to the invalid address stated on the next line
   at : ???
   by : __libc_freeres (in /usr/lib64/libc-2.17.so)
   by : _vgnU_freeres (in /opt/valgrind/lib/valgrind/vgpreload_core-amd64-linux.so)
   by : vfprintf (in /usr/lib64/libc-2.17.so)
   by : printf (in /usr/lib64/libc-2.17.so)
  gtest.cc:2660  testing::internal::ColoredPrintf
  gtest.cc:2747  testing::internal::PrettyUnitTestResultPrinter::OnTestIterationStart
  gtest.cc:2995  testing::internal::TestEventRepeater::OnTestIterationStart
  gtest.cc:4301  testing::internal::UnitTestImpl::RunAllTests
   by : bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2078)
   by : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2114)
  gtest.cc:3926  testing::UnitTest::Run
   by : RUN_ALL_TESTS (gtest.h:2288)
   by : main (gtest_main.cc:37)
 Address  is not stack'd, malloc'd or (recently) free'd

Vignesh_Ungrapalli · May 4, 2017, 7:21am

Hi

Can you please share with us the standalone application with which you are seeing the issue?

Thanks

Vanista · May 5, 2017, 12:21pm

The program is a simple unit test based on the Google Test framework. I just created an empty test, linked the binary with -lnvidia-encode and this would crash systematically on run as soon as printf is called as shown in the call stack. I don’t think gtest is key in this case, probably any system call would lead to a crash.

I’ve taken the 375.66 stable branch update and can’t reproduce the bug. Sorry, the system is needed quite a bit so I can’t go back and forth between versions to provide more details.

I expect something like this to trigger the crash condition:

cat > test.cpp << EOF
#include <cstdio>

int main (int argc, char** argv)
{
  printf ("Hello World!\n");
  return 0;
}

EOF

g++ -o test test.cpp -lnvidia-encode
./test

Topic		Replies	Views
GeForce GTX 780 USE NVENC SDK Segmentation fault Linux	2	1730	February 11, 2014
Compiling nvidia_video_sdk_6.0.1 samples -lnvidia-encode failure GPU-Accelerated Libraries	2	1896	June 28, 2016
Segmentation fault running NvDecodeGL example on Ubuntu trusty GPU-Accelerated Libraries	3	706	July 25, 2017
Segmentation fault: Statically built FFmpeg with NVENC GPU-Accelerated Libraries	1	3215	August 28, 2016
Bizarre nvcc segfault CUDA Programming and Performance	1	6988	August 6, 2011
received signal SIGSEGV, Segmentation fault when encode data in Nano. Jetson Nano	2	642	May 15, 2019
[334.21] nvidia-settings - segmentation fault Linux	0	2161	April 8, 2014
Video Codec SDK 9.1.23 crash with RC lookahead Video Processing & Optical Flow	6	1297	August 17, 2021
nvcc Segfault CUDA Programming and Performance	6	11433	October 14, 2010
NVENC fails on latest ArchLinux NVIDIA driver 430.26 Linux	12	3736	October 12, 2021

Segmentation fault w/ libnvidia-encode.so.381.09

Related topics