Segfault in cuMemcpyHtoD after subprocess.run() in L4T 32.4.3

rickn · March 22, 2022, 11:50pm

pycuda: 2019.1.2
nvidia-jetpack: 4.4-b144
nvidia-l4t-core: 32.4.3
python: 3.6

The following python code segfaults:

import numpy as np
import subprocess

import pycuda.driver as cuda
from pycuda.tools import make_default_context
 
subprocess_count = 2000
 
# initialise cuda
cuda.init()

# get CUDA context
ctx = make_default_context()

# create some random UYVY-like buffers
size = (2160,3840,2)
np_src = np.random.randint(0, 255, size=size, dtype=np.uint8)
nbytes = np_src.nbytes
 
cmd = ["echo", "what do you care, CUDA?"]
for _ in range(subprocess_count):
  subprocess.run(cmd)
print(f"Called {cmd} {subprocess_count} times")
cuda_src = cuda.to_device(np_src)
print("Closing CUDA context, all good")
ctx.pop()

what do you care, CUDA?
...
Called ['echo', 'what do you care, CUDA?'] 2000 times
Segmentation faul (core dumped)

You might have to adjust the number of subprocess calls for this to break, especially if you change the command.

gdb shows this:

Thread 1 "python3.6" received signal SIGSEGV, Segmentation fault.
0x0000007f84ffab7c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so
(gdb) bt
#0  0x0000007f84ffab7c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libnvrm_gpu.so
#1  0x0000007f895644dc in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#2  0x0000007f894e22dc in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#3  0x0000007f893e3b0c in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#4  0x0000007f893e3b7c in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#5  0x0000007f894565ac in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#6  0x0000007f89547de4 in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#7  0x0000007f89456f1c in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#8  0x0000007f893c8f28 in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#9  0x0000007f893cc6e4 in ?? () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#10 0x0000007f894d5800 in cuMemcpyHtoD_v2 () from /usr/lib/aarch64-linux-gnu/libcuda.so.1
#11 0x0000007f8a307004 in (anonymous namespace)::py_memcpy_htod (dst=8648433664, src=...) at src/wrapper/wrap_cudadrv.cpp:199
#12 0x0000007f8a33b278 in pycudaboost::python::detail::invoke<int, void (*)(unsigned long long, pycudaboost::python::api::object), pycudaboost::python::arg_from_python<unsigned long long>, pycudaboost::python::arg_from_python<pycudaboost::python::api::object> > (
    ac1=<synthetic pointer>..., ac0=..., f=<optimized out>) at bpl-subset/bpl_subset/pycudaboost/python/detail/invoke.hpp:81
#13 pycudaboost::python::detail::caller_arity<2u>::impl<void (*)(unsigned long long, pycudaboost::python::api::object), pycudaboost::python::default_call_policies, pycudaboost::mpl::vector3<void, unsigned long long, pycudaboost::python::api::object> >::operator() (
    args_=<optimized out>, this=<optimized out>) at bpl-subset/bpl_subset/pycudaboost/python/detail/caller.hpp:218
#14 pycudaboost::python::objects::caller_py_function_impl<pycudaboost::python::detail::caller<void (*)(unsigned long long, pycudaboost::python::api::object), pycudaboost::python::default_call_policies, pycudaboost::mpl::vector3<void, unsigned long long, pycudaboost::python::api::object> > >::operator() (this=<optimized out>, args=<optimized out>, kw=<optimized out>)
    at bpl-subset/bpl_subset/boost/python/object/py_function.hpp:38
#15 0x0000007f8a37f128 in pycudaboost::python::objects::py_function::operator() (kw=0xfd2021, args=<optimized out>, this=0xb0b5550)
    at bpl-subset/bpl_subset/boost/python/object/py_function.hpp:143
#16 pycudaboost::python::objects::function::call (this=0xd055b081d22f0a00, args=0x7f8a7750c8, keywords=0xfd2021)
    at bpl-subset/bpl_subset/libs/python/src/object/function.cpp:226
#17 0x0000007f8a37f340 in pycudaboost::python::objects::(anonymous namespace)::bind_return::operator() (this=<optimized out>)
    at bpl-subset/bpl_subset/libs/python/src/object/function.cpp:585
#18 pycudaboost::detail::function::void_function_ref_invoker0<pycudaboost::python::objects::(anonymous namespace)::bind_return, void>::invoke (function_obj_ptr=...) at bpl-subset/bpl_subset/boost/function/function_template.hpp:188
#19 0x0000007f8a38d954 in pycudaboost::function0<void>::operator() (this=<optimized out>)
    at bpl-subset/bpl_subset/boost/function/function_template.hpp:766
#20 pycudaboost::python::detail::exception_handler::operator() (this=<optimized out>, f=...)
    at bpl-subset/bpl_subset/libs/python/src/errors.cpp:74
#21 0x0000007f8a3395a8 in pycudaboost::python::detail::translate_exception<pycuda::error, void (*)(pycuda::error const&)>::operator() (
    this=<optimized out>, translate=0x7f8a3032a8 <(anonymous namespace)::translate_cuda_error(pycuda::error const&)>, f=...,
    handler=...) at bpl-subset/bpl_subset/boost/python/detail/translate_exception.hpp:48
#22 pycudaboost::_bi::list3<pycudaboost::arg<1>, pycudaboost::arg<2>, pycudaboost::_bi::value<void (*)(pycuda::error const&)> >::operator()<bool, pycudaboost::python::detail::translate_exception<pycuda::error, void (*)(pycuda::error const&)>, pycudaboost::_bi::list2<pycudaboost::python::detail::exception_handler const&, pycudaboost::function0<void> const&> > (f=..., a=<synthetic pointer>...,
    this=<optimized out>) at bpl-subset/bpl_subset/boost/bind/bind.hpp:382
#23 pycudaboost::_bi::bind_t<bool, pycudaboost::python::detail::translate_exception<pycuda::error, void (*)(pycuda::error const&)>, pycudaboost::_bi::list3<pycudaboost::arg<1>, pycudaboost::arg<2>, pycudaboost::_bi::value<void (*)(pycuda::error const&)> > >::operator()<pycudaboost::python::detail::exception_handler, pycudaboost::function0<void> > (a2=..., a1=..., this=<optimized out>)
    at bpl-subset/bpl_subset/boost/bind/bind_template.hpp:102
#24 pycudaboost::detail::function::function_obj_invoker2<pycudaboost::_bi::bind_t<bool, pycudaboost::python::detail::translate_exception<pycuda::error, void (*)(pycuda::error const&)>, pycudaboost::_bi::list3<pycudaboost::arg<1>, pycudaboost::arg<2>, pycudaboost::_bi::value<void (*)(pycuda::error const&)> > >, bool, pycudaboost::python::detail::exception_handler const&, pycudaboost::function0<void> const&>::invoke (function_obj_ptr=..., a0=..., a1=...) at bpl-subset/bpl_subset/boost/function/function_template.hpp:132
#25 0x0000007f8a38d690 in pycudaboost::function2<bool, pycudaboost::python::detail::exception_handler const&, pycudaboost::function0<void> const&>::operator() (a1=..., a0=..., this=<optimized out>) at bpl-subset/bpl_subset/boost/function/function_template.hpp:767
#26 pycudaboost::python::detail::exception_handler::handle (f=..., this=<optimized out>)
    at bpl-subset/bpl_subset/boost/python/detail/exception_handler.hpp:41
#27 pycudaboost::python::handle_exception_impl (f=...) at bpl-subset/bpl_subset/libs/python/src/errors.cpp:24
#28 0x0000007f8a37c164 in pycudaboost::python::handle_exception<pycudaboost::python::objects::(anonymous namespace)::bind_return> (
    f=...) at bpl-subset/bpl_subset/boost/python/errors.hpp:29
---Type <return> to continue, or q <return> to quit---    
#29 pycudaboost::python::objects::function_call (func=<optimized out>, args=<optimized out>, kw=<optimized out>)
    at bpl-subset/bpl_subset/libs/python/src/object/function.cpp:626
#30 0x0000000000607948 in _PyObject_FastCallDict ()
#31 0x000000000052b850 in ?? ()
#32 0x00000000005306c0 in _PyEval_EvalFrameDefault ()
#33 0x0000000000529978 in ?? ()
#34 0x000000000052b8f4 in ?? ()
#35 0x00000000005306c0 in _PyEval_EvalFrameDefault ()
#36 0x000000000052b108 in ?? ()
#37 0x0000000000631598 in PyRun_FileExFlags ()
#38 0x0000000000636c2c in PyRun_SimpleFileExFlags ()
#39 0x0000000000621428 in Py_Main ()
#40 0x0000000000420d3c in main ()

This seems to be the same issue as in Segfault in cudaMemcpy after system("") in L4T 32.3.1

But the workaround proposed there does not work for us, since we the command we need to run with subprocess.run() is v4l2-ctl to adjust camera settings during runtime.

JacobAspect · March 23, 2022, 12:17am

This doesn’t seem to happen with the l4t-ml container in Jetpack 4.6 so it may already have been fixed. Tried with subprocess_count = 10000. This does fail for me in the l4t-ml on Jetpack 4.4; however, from my testing, running jetson_clocks seems to “fix” it. It seems to remain fixed even after resetting the power mode and clocks with (for me) nvpmodel -m 3.

system · April 6, 2022, 12:18am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.