Running Deepstream6.1 on multi-GPU server failed

• Hardware Platform (Jetson / GPU)
RTX 2080 on x86
• DeepStream Version
6.1
• JetPack Version (valid for Jetson only)
• TensorRT Version
8.4.3.1-1+cuda11.6
• NVIDIA GPU Driver Version (valid for GPU only)
Driver Version: 510

当我在gpu服务器上运行deepstream-test1的时候提示以下错误:
cuGraphicsGLRegisterBuffer failed with error(219) gst_eglglessink_cuda_init texture = 1

我看论坛上有人问过这个问题,需要按这个连接(https://elinux.org/Deepstream/FAQ)设置使用虚拟桌面,我设置了,但是还是不能显示,把显示插件替换成fakesink没有问题;同时我使用sudo nvidia-xconfig --busid=PCI:4:0:0 --allow-empty-initial-configuration 设置第一个显卡远程是没有问题的,但是还是不能显示;但是设置PCI:4:0:0为最后一个显卡远程都不可以了,我该如何设置呢,才能让显示正常,请帮忙指点一下我是哪里没有配置正确么;同时如果apt安装的nvidia-driver,如何判断是否开启了opengl呢?

Hello @392415830 , Please help translate your question and description into English so that global user can also benefit from the discussion, thank you.

翻译成英文有时候不能完全表达出我的意思,而且看您这个称呼感觉是中国人呢

How about result of
sudo nvidia-xconfig --query-gpu-info?

Number of GPUs: 6

GPU #0:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-b1368b21-a652-13a7-72c0-9bc4cbabe82a
PCI BusID : PCI:5:0:0

Number of Display Devices: 0

GPU #1:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-2bcd802e-ee9e-efe2-3b58-c5aa65d063e7
PCI BusID : PCI:7:0:0

Number of Display Devices: 0

GPU #2:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-af939bd7-2b83-bf4f-0da9-7aea17a08669
PCI BusID : PCI:8:0:0

Number of Display Devices: 0

GPU #3:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-67b48cc3-9e8e-90a6-f686-8bb43a5f85f1
PCI BusID : PCI:12:0:0

Number of Display Devices: 0

GPU #4:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-7874f438-1e7d-26a5-75e1-4ef9ddf10c86
PCI BusID : PCI:14:0:0

Number of Display Devices: 0

GPU #5:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-b1e2e542-bf4c-d186-da9c-300d6895a623
PCI BusID : PCI:15:0:0

Number of Display Devices: 0

这是我的xorg.conf信息

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 510.85.02

Section “ServerLayout”
Identifier “Layout0”
Screen 0 “Screen0” 0 0
InputDevice “Keyboard0” “CoreKeyboard”
InputDevice “Mouse0” “CorePointer”
EndSection

Section “Files”
EndSection

Section “InputDevice”

# generated from default
Identifier     "Mouse0"
Driver         "mouse"
Option         "Protocol" "auto"
Option         "Device" "/dev/psaux"
Option         "Emulate3Buttons" "no"
Option         "ZAxisMapping" "4 5"

EndSection

Section “InputDevice”

# generated from default
Identifier     "Keyboard0"
Driver         "kbd"

EndSection

Section “Monitor”
Identifier “Monitor0”
VendorName “Unknown”
ModelName “Unknown”
Option “DPMS”
EndSection

Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
BusID “PCI:5:0:0”
EndSection

Section “Screen”
Identifier “Screen0”
Device “Device0”
Monitor “Monitor0”
DefaultDepth 24
Option “AllowEmptyInitialConfiguration” “True”
SubSection “Display”
Depth 24
EndSubSection
EndSection

现在使用nomachine能远程登录

这是我又重新部署了另一台服务器,也是这个问题
Uploading: image.png…

dixn@x86:~$ sudo nvidia-xconfig --query-gpu-info
Number of GPUs: 6

GPU #0:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-d6c52a8d-5894-26fc-64d1-57bdde559abe
PCI BusID : PCI:2:0:0

Number of Display Devices: 0

GPU #1:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-99007fc8-2ae1-fdf8-f73f-1240de9db57c
PCI BusID : PCI:5:0:0

Number of Display Devices: 0

GPU #2:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-359a075b-ebe4-80be-7839-1db7e449cafc
PCI BusID : PCI:6:0:0

Number of Display Devices: 0

GPU #3:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-3e9b3b20-ac2e-07ea-fa0a-588ea798b881
PCI BusID : PCI:129:0:0

Number of Display Devices: 0

GPU #4:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-289fbaef-c411-bb04-6a1d-dc1763883fb0
PCI BusID : PCI:132:0:0

Number of Display Devices: 0

GPU #5:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-9c597f09-abd9-5839-ed39-bf685fc65540
PCI BusID : PCI:133:0:0

Number of Display Devices: 0

dixn@x86:~$ sudo nvidia-xconfig --busid=PCI:5:0:0 --allow-empty-initial-configuration

Using X configuration file: “/etc/X11/xorg.conf”.
Option “AllowEmptyInitialConfiguration” “True” added to Screen “Screen0”.
Backed up file ‘/etc/X11/xorg.conf’ as ‘/etc/X11/xorg.conf.backup’
New X configuration file written to ‘/etc/X11/xorg.conf’

dixn@x86:~$ sudo reboot

连接断开
连接主机…
连接主机成功
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 5.15.0-50-generic x86_64)

0 更新可以立即应用。

New release ‘22.04.1 LTS’ available.
Run ‘do-release-upgrade’ to upgrade to it.

Your Hardware Enablement Stack (HWE) is supported until April 2025.
Last login: Mon Oct 10 10:20:31 2022 from 192.168.96.127
dixn@x86:~$
dixn@x86:~$
dixn@x86:~$ sudo nvidia-xconfig --query-gpu-info
[sudo] dixn 的密码:
Number of GPUs: 6

GPU #0:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-d6c52a8d-5894-26fc-64d1-57bdde559abe
PCI BusID : PCI:2:0:0

Number of Display Devices: 0

GPU #1:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-99007fc8-2ae1-fdf8-f73f-1240de9db57c
PCI BusID : PCI:5:0:0

Number of Display Devices: 0

GPU #2:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-359a075b-ebe4-80be-7839-1db7e449cafc
PCI BusID : PCI:6:0:0

Number of Display Devices: 0

GPU #3:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-3e9b3b20-ac2e-07ea-fa0a-588ea798b881
PCI BusID : PCI:129:0:0

Number of Display Devices: 0

GPU #4:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-289fbaef-c411-bb04-6a1d-dc1763883fb0
PCI BusID : PCI:132:0:0

Number of Display Devices: 0

GPU #5:
Name : NVIDIA GeForce RTX 2080
UUID : GPU-9c597f09-abd9-5839-ed39-bf685fc65540
PCI BusID : PCI:133:0:0

Number of Display Devices: 0

dixn@x86:~$ cat /etc/X11/xorg.conf

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 510.85.02

Section “ServerLayout”
Identifier “Layout0”
Screen 0 “Screen0” 0 0
InputDevice “Keyboard0” “CoreKeyboard”
InputDevice “Mouse0” “CorePointer”
EndSection

Section “Files”
EndSection

Section “InputDevice”

# generated from default
Identifier     "Mouse0"
Driver         "mouse"
Option         "Protocol" "auto"
Option         "Device" "/dev/psaux"
Option         "Emulate3Buttons" "no"
Option         "ZAxisMapping" "4 5"

EndSection

Section “InputDevice”

# generated from default
Identifier     "Keyboard0"
Driver         "kbd"

EndSection

Section “Monitor”
Identifier “Monitor0”
VendorName “Unknown”
ModelName “Unknown”
Option “DPMS”
EndSection

Section “Device”
Identifier “Device0”
Driver “nvidia”
VendorName “NVIDIA Corporation”
BusID “PCI:5:0:0”
EndSection

Section “Screen”
Identifier “Screen0”
Device “Device0”
Monitor “Monitor0”
DefaultDepth 24
Option “AllowEmptyInitialConfiguration” “True”
SubSection “Display”
Depth 24
EndSubSection
EndSection

同时使用nomachine远程我有点不知道该如何操作了,一个显卡时页没有啥注意的,但多个显卡连接显示器时以前都是接在主板上的,现在执行sudo nvidia-xconfig --busid=PCI:5:0:0 --allow-empty-initial-configuration后发现对应的显卡接在显示器上也能显示,搞得我不知道具体该如何操作了

同时不能远程的时候发现执行下面的命令就能远程了,如果我想运行deepstream,这么操作对么?
sudo /etc/init.d/gdm3 stop
sudo /usr/NX/bin/nxserver --restart

而且通过不断尝试后,发现执行sudo nvidia-xconfig --busid=PCI:5:0:0 --allow-empty-initial-configuration后,将hdmi线直接插在对应的显卡上运行程序就会显示出来,这样到是能解决直接连接显示器调试的问题,但是通常gpu服务器都是安装在机房的,所以我的目的只是要远程调试就行,希望得到帮助,谢谢,十分感谢大家的回复

关于您在其他帖子上说的检查nvidia驱动是否支持opengl,我上网查找了一下但是不知道如何查看是否支持,我是apt get在线安装的,
sudo glxinfo | grep OpenGL
[sudo] dixn 的密码:
OpenGL vendor string: Mesa/X.org
OpenGL renderer string: llvmpipe (LLVM 12.0.0, 256 bits)
OpenGL version string: 3.1 Mesa 21.2.6
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
这样是支持还是不支持呢?

我的单显卡主机是这样的
glxinfo | grep OpenGL
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: NVIDIA GeForce GTX 1060 6GB/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 510.47.03
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.6.0 NVIDIA 510.47.03
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 510.47.03
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

我感觉应该是配置还是有问题,但是不知道如何修改

但多个显卡连接显示器时以前都是接在主板上的

这个是用的集成显卡吗?

现在执行sudo nvidia-xconfig --busid=PCI:5:0:0 --allow-empty-initial-configuration后发现对应的显卡接在显示器上也能显示,搞得我不知道具体该如何操作了

没明白,不知道操作啥?

而且通过不断尝试后,发现执行sudo nvidia-xconfig --busid=PCI:5:0:0 --allow-empty-initial-configuration后,将hdmi线直接插在对应的显卡上运行程序就会显示出来,但是通常gpu服务器都是安装在机房的,所以我的目的只是要远程调试就行

远程通过nomachine访问桌面,你的意思是不接显示器,nomachine访问桌面有问题吗?

关于您在其他帖子上说的检查nvidia驱动是否支持opengl

以前的安装包会询问是否需要安装nvidia opengl 类似这样的,现在的安装包默认都会安装的。

另外你能分情况描述下吗,我有些搞不清楚。比如:

1 服务器1, 有几个显卡,执行faq的步骤后通过nomachine登陆桌面有没有问题
2 服务器1, 有几个显卡,执行faq的步骤后通过nomachine登陆桌面有没有问题

我的问题就是一个,我想远程在gpu服务器上运行deepstream-test1,让他显示出来;不接显示器,nomachine我能远程登录没有问题, 但是启动程序就报错了;我看了看论坛上的帖子,不知道正确的设置步骤是啥,同时我需要系统啥配置文件协助您排查这个问题;

同时,对于gpu服务器,连接显示器,hdmi线是接在显卡上还是应该接主板上使用核心显卡呢?

FAQ是针对Tesla系列显卡,setup虚拟显示。我没有试过在支持显示的显卡上面这个solotion行不行。建议你使用rtspstreaming。
多GPU服务器,连接显示器,看你使用集成显卡还是nvidia显卡,使用哪个就接在哪个口。对于deepstream 你需要使用nvidia显卡工作,需要接nvidia的卡。

好的,那估计支持显示的显卡上就不能通过远程桌面去运行deepstream程序并用显示插件显示

sorry for the late reply, Is this still an issue to support? Thanks

Yes, I don’t know how to solve this problem

Please make sure your display is connected with Nvidia GPU.

I have replied to you on another post. Because this post is in Chinese before, no one has replied to me all the time. I have created a new one

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.