Segmentation fault w/ libnvidia-encode.so.381.09

Hi

We got a strange crash in our NVENC programs that run on a host with a TITAN Xp device.
The crash points to low level libc functions (see attached valgrind report), and happens only when the program is linked with nvidia-encode.

Furthermore, the simple fact of linking with the library will lead to the crash; the program won’t make any calls to the nvenc API. It seems like nvidia-encode has a global constructor method that can thrash or shadow some function addresses.

Finally the same code and test setup won’t crash on another host with a TITAN X or M6000 device.
We’ve recently upgraded to beta driver 381.09 to workaround another bug.

Valgrind report for an empty unit test

Process terminating with default action of signal 11 (SIGSEGV)
 Bad permissions for mapped region at address 
   at : ???
   by : printf (in /usr/lib64/libc-2.17.so)
  gtest.cc:2660  testing::internal::ColoredPrintf
  gtest.cc:2747  testing::internal::PrettyUnitTestResultPrinter::OnTestIterationStart
  gtest.cc:2995  testing::internal::TestEventRepeater::OnTestIterationStart
  gtest.cc:4301  testing::internal::UnitTestImpl::RunAllTests
   by : bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2078)
   by : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2114)
  gtest.cc:3926  testing::UnitTest::Run
   by : RUN_ALL_TESTS (gtest.h:2288)
   by : main (gtest_main.cc:37)
Jump to the invalid address stated on the next line
   at : ???
   by : __libc_freeres (in /usr/lib64/libc-2.17.so)
   by : _vgnU_freeres (in /opt/valgrind/lib/valgrind/vgpreload_core-amd64-linux.so)
   by : vfprintf (in /usr/lib64/libc-2.17.so)
   by : printf (in /usr/lib64/libc-2.17.so)
  gtest.cc:2660  testing::internal::ColoredPrintf
  gtest.cc:2747  testing::internal::PrettyUnitTestResultPrinter::OnTestIterationStart
  gtest.cc:2995  testing::internal::TestEventRepeater::OnTestIterationStart
  gtest.cc:4301  testing::internal::UnitTestImpl::RunAllTests
   by : bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2078)
   by : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (gtest.cc:2114)
  gtest.cc:3926  testing::UnitTest::Run
   by : RUN_ALL_TESTS (gtest.h:2288)
   by : main (gtest_main.cc:37)
 Address  is not stack'd, malloc'd or (recently) free'd

Hi

Can you please share with us the standalone application with which you are seeing the issue?

Thanks

The program is a simple unit test based on the Google Test framework. I just created an empty test, linked the binary with -lnvidia-encode and this would crash systematically on run as soon as printf is called as shown in the call stack. I don’t think gtest is key in this case, probably any system call would lead to a crash.

I’ve taken the 375.66 stable branch update and can’t reproduce the bug. Sorry, the system is needed quite a bit so I can’t go back and forth between versions to provide more details.

I expect something like this to trigger the crash condition:

cat > test.cpp << EOF
#include <cstdio>

int main (int argc, char** argv)
{
  printf ("Hello World!\n");
  return 0;
}

EOF

g++ -o test test.cpp -lnvidia-encode
./test