Hello
I am trying to run PoseCNN algorithm on a RTX 4090 based system with Nvidia Driver 525.85 (installed using run file) using a Docker Container image: Cuda 11.7.0 devel ubuntu 20.04 from docker hub.
While running “python3 setup.py install” inside the dockerfile while image creation, the image creation fails:
[13/14] RUN cd /deps/PoseCNN/lib/layers && python3 setup.py install && cd /deps/PoseCNN/lib/utils && python3 setup.py build_ext --inplace && cd /deps/PoseCNN/ycb_render && python3 setup.py develop && cd …/ && ./build.sh:
#0 1.346 No CUDA runtime is found, using CUDA_HOME=‘/usr/local/cuda’
#0 1.351 running install
#0 1.389 running bdist_egg…
…
#0 1.414 Traceback (most recent call last):
#0 1.414 File “setup.py”, line 8, in
#0 1.414 setup(
#0 1.414 File “/usr/lib/python3/dist-packages/setuptools/init.py”, line 144, in setup
#0 1.414 return distutils.core.setup(**attrs)
#0 1.414 File “/usr/lib/python3.8/distutils/core.py”, line 148, in setup
#0 1.414 dist.run_commands()
#0 1.414 File “/usr/lib/python3.8/distutils/dist.py”, line 966, in run_commands
#0 1.414 self.run_command(cmd)
#0 1.414 File “/usr/lib/python3.8/distutils/dist.py”, line 985, in run_command
#0 1.414 cmd_obj.run()
#0 1.414 File “/usr/lib/python3/dist-packages/setuptools/command/install.py”, line 67, in run
#0 1.414 self.do_egg_install()
#0 1.414 File “/usr/lib/python3/dist-packages/setuptools/command/install.py”, line 109, in do_egg_install
#0 1.414 self.run_command(‘bdist_egg’)
#0 1.414 File “/usr/lib/python3.8/distutils/cmd.py”, line 313, in run_command
#0 1.414 self.distribution.run_command(command)
#0 1.414 File “/usr/lib/python3.8/distutils/dist.py”, line 985, in run_command
#0 1.414 cmd_obj.run()
#0 1.414 File “/usr/lib/python3/dist-packages/setuptools/command/bdist_egg.py”, line 172, in run
#0 1.414 cmd = self.call_command(‘install_lib’, warn_dir=0)
#0 1.414 File “/usr/lib/python3/dist-packages/setuptools/command/bdist_egg.py”, line 158, in call_command
#0 1.414 self.run_command(cmdname)
#0 1.414 File “/usr/lib/python3.8/distutils/cmd.py”, line 313, in run_command
#0 1.414 self.distribution.run_command(command)
#0 1.414 File “/usr/lib/python3.8/distutils/dist.py”, line 985, in run_command
#0 1.414 cmd_obj.run()
#0 1.414 File “/usr/lib/python3/dist-packages/setuptools/command/install_lib.py”, line 23, in run
#0 1.414 self.build()
#0 1.414 File “/usr/lib/python3.8/distutils/command/install_lib.py”, line 109, in build
#0 1.414 self.run_command(‘build_ext’)
#0 1.414 File “/usr/lib/python3.8/distutils/cmd.py”, line 313, in run_command
#0 1.415 self.distribution.run_command(command)
#0 1.415 File “/usr/lib/python3.8/distutils/dist.py”, line 985, in run_command
#0 1.415 cmd_obj.run()
#0 1.415 File “/usr/lib/python3/dist-packages/setuptools/command/build_ext.py”, line 87, in run
#0 1.415 _build_ext.run(self)
#0 1.415 File “/usr/local/lib/python3.8/dist-packages/Cython/Distutils/old_build_ext.py”, line 186, in run
#0 1.415 _build_ext.build_ext.run(self)
#0 1.415 File “/usr/lib/python3.8/distutils/command/build_ext.py”, line 340, in run
#0 1.415 self.build_extensions()
#0 1.415 File “/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py”, line 843, in build_extensions
#0 1.415 build_ext.build_extensions(self)
#0 1.415 File “/usr/local/lib/python3.8/dist-packages/Cython/Distutils/old_build_ext.py”, line 195, in build_extensions
#0 1.415 _build_ext.build_ext.build_extensions(self)
#0 1.415 File “/usr/lib/python3.8/distutils/command/build_ext.py”, line 449, in build_extensions
#0 1.415 self._build_extensions_serial()
#0 1.415 File “/usr/lib/python3.8/distutils/command/build_ext.py”, line 474, in _build_extensions_serial
#0 1.415 self.build_extension(ext)
#0 1.415 File “/usr/lib/python3/dist-packages/setuptools/command/build_ext.py”, line 208, in build_extension
#0 1.415 _build_ext.build_extension(self, ext)
#0 1.415 File “/usr/lib/python3.8/distutils/command/build_ext.py”, line 528, in build_extension
#0 1.415 objects = self.compiler.compile(sources,
#0 1.415 File “/usr/lib/python3.8/distutils/ccompiler.py”, line 574, in compile
#0 1.415 self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
#0 1.415 File “/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py”, line 581, in unix_wrap_single_compile
#0 1.415 cflags = unix_cuda_flags(cflags)
#0 1.415 File “/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py”, line 548, in unix_cuda_flags
#0 1.415 cflags + _get_cuda_arch_flags(cflags))
#0 1.415 File “/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py”, line 1780, in _get_cuda_arch_flags
#0 1.416 arch_list[-1] += ‘+PTX’
#0 1.416 IndexError: list index out of range
However, Entering the same command inside a container instead of while building the image works. And I get no CUDA error.
But now if the code proceeds, I face the error :
Let’s use 2 GPUs! # That means it is detecting 2 GPUs in the system
loading 3D models
libEGL warning: DRI2: failed to create dri screen
libEGL warning: DRI2: failed to create dri screen
Unable to initialize EGL
Command ‘[’/deps/PoseCNN/tools/…/ycb_render/build/test_device’, ‘0’]’ returned non-zero exit status 1.
libEGL warning: DRI2: failed to create dri screen
libEGL warning: DRI2: failed to create dri screen
Unable to initialize EGL
Command ‘[’/deps/PoseCNN/tools/…/ycb_render/build/test_device’, ‘1’]’ returned non-zero exit status 1.
Traceback (most recent call last):
File “./tools/train_net.py”, line 141, in
cfg.renderer = YCBRenderer(width=cfg.TRAIN.SYN_WIDTH, height=cfg.TRAIN.SYN_HEIGHT, render_marker=False)
File “/deps/PoseCNN/tools/…/ycb_render/ycb_renderer.py”, line 88, in _ init _
self.r = CppYCBRenderer.CppYCBRenderer(width, height, get_available_devices()[gpu_id])
IndexError: list index out of range
which means its not able to identify devices during the build stage of the project and this function(get_available_devices()) is not able to identify the gpus
- Inside the docker, running nvidia-smi and nvcc -V output:
NVIDIA-SMI 525.85.05 Driver Version: 525.85.05 CUDA Version: 12.0 …
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
- Even Running deviceQuery from cuda samples repository Passes for all SMs(“50 52 60 61 70 75 80 86”)
Please share what could be the problem here. I have tried multiple images and installing multiple libraries but still there seems to be a problem in CUDA or OpenGL.
Thank you