cuda-dbg forces execution on device 0 cuda-gdb forces execution on gpu running X windows

Hi All,

I’m running two geforce 9600s on fedora core 9. I have the display in twinview. I have the monitors hooked up to the geforce (device 0) on bus slot 6, see xorg.conf below.

Problem: I do cudSetDevice(1) and program runs normally on device 1. I do cuda-gdb ./program and set break points, and then when I run progrma I get:

(cuda-gdb) run
Starting program: /home/rdemb/cudaprograms/device1_gdb
[Thread debugging using libthread_db enabled]
[New process 3466]
[New Thread 3585360 (LWP 3466)]
Warning: 1 GPUs were made unavailable to the application because they are used by X. This may change the application behaviour!
we are on device 0
CUDA-GDB: Cannot debug on this GPU as it is running a window system.

As you can see, cuda-gdb seems to force execution on device 0 - even though I’m setting the device to 1

When I step past the line containing cudaSetDevice, I get the above message:

(cuda-gdb) step
23 cudaSetDevice(set_dev);
(cuda-gdb) step
Warning: 1 GPUs were made unavailable to the application because they are used by X. This may change the application behaviour!

I’ve re-installed the drivers twice with the x-server off (in runmode 3). Run nvidia-xconfig -twinview, but I always get forced onto device 0 when I try to use cuda-gdb.

If I take out the BusID in the xorg.conf file, I get an error at boot up time with xorg.conf, and the X -server does not come up.

If I run fedora core 9 at runlevel 3 and execute the programs from the command line, no cuda-enabled cards are recognized, i.e., if I run deviceQuery, I get a message that no cuda cards were detected, and it runs the program in emulation mode.

I’m really wrapped around the axle on this one, any help really appreciated. Thanks,

robullelk

xorg.conf below:

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 1.0 (buildmeister@builder62) Thu Apr 30 16:21:56 PD

T 2009

Xorg configuration created by pyxf86config

Section “ServerLayout”
Identifier “Default Layout”
Screen 0 “Screen0” 0 0
Screen 1 “Screen1” RightOf “Screen0”
InputDevice “Mouse0” “CorePointer”
InputDevice “Keyboard0” “CoreKeyboard”
EndSection

Section “InputDevice”

# generated from default
Identifier     "Mouse0"
Driver         "mouse"
Option         "Protocol" "auto"
Option         "Device" "/dev/input/mice"
Option         "Emulate3Buttons" "no"
Option         "ZAxisMapping" "4 5"

EndSection

Section “InputDevice”

keyboard added by rhpxl

Identifier     "Keyboard0"
Driver         "kbd"
Option         "XkbModel" "pc105"
Option         "XkbLayout" "us"

EndSection

Section “Monitor”
Identifier “Monitor0”
VendorName “Unknown”
ModelName “Unknown”
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option “DPMS”
EndSection

Section “Monitor”
Identifier “Monitor1”
VendorName “Unknown”
ModelName “Unknown”
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option “DPMS”
EndSection

Section “Device”
Identifier “Videocard0”
Driver “nvidia”
BusID “PCI:6:0:0”
Screen 0
EndSection

Section “Device”
Identifier “Videocard1”
Driver “nvidia”
BusID “PCI:6:0:0”
Screen 1
EndSection

Section “Screen”
Identifier “Screen0”
Device “Videocard0”
Monitor “Monitor0”
DefaultDepth 24
Option “TwinView” “True”
Option “MetaModes” “nvidia-auto-select, nvidia-auto-select”
SubSection “Display”
Viewport 0 0
Depth 24
EndSubSection
EndSection

Section “Screen”
Identifier “Screen1”
Device “Videocard1”
Monitor “Monitor1”
DefaultDepth 24
Option “TwinView” “True”
Option “MetaModes” “nvidia-auto-select, nvidia-auto-select”
SubSection “Display”
Viewport 0 0
Depth 24
EndSubSection
EndSection

I don’t think your X configuration is doing what you think its doing. You should only have 1 defined Screen for Twinview (on a single GPU), yet you have two. Have you looked at your X log to confirm that both GPUs aren’t being used in X and that Twinview is working?

Hi,

I deleted the second screen and all references to it in xorg.conf - attached at bottom.

I get same behaviour. Twinview is working, but for some reason, X is still running on that 2nd geforce9.

I’ve excerpted some of the relevant parts from the xorg.0.log morlog file, below. I’m kind of troubled by the line:

Connected display device(s) on GeForce 9600 GT at

(–) NVIDIA(GPU-1): PCI:131:0:0: for GPU-1.

I’m not sure how to configure xorg.conf to prevent x from using the 2nd card.

Thank you for your help,

robullelk

(!!) More than one possible primary device found

(–) PCI: (0@6:0:0) nVidia Corporation Geforce 9600 GT 512mb rev 161, Mem @ 0x91000000/16777216, 0xa0000000/268435456, 0x92000000/33554432, I/O @ 0x00003000/128

(–) PCI: (0@131:0:0) nVidia Corporation Geforce 9600 GT 512mb rev 161, Mem @ 0xb1000000/16777216, 0xc0000000/268435456, 0xb2000000/33554432, I/O @ 0x00005000/128

(**) NVIDIA(0): Depth 24, (–) framebuffer bpp 32

(==) NVIDIA(0): RGB weight 888

(==) NVIDIA(0): Default visual is TrueColor

(==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)

(**) NVIDIA(0): Option “TwinView” “True”

(**) NVIDIA(0): Option “MetaModes” “nvidia-auto-select, nvidia-auto-select”

(**) NVIDIA(0): Enabling RENDER acceleration

(II) NVIDIA(0): Support for GLX with the Damage and Composite X extensions is

(II) NVIDIA(0): enabled.

(II) NVIDIA(0): NVIDIA GPU GeForce 9600 GT (G94) at PCI:6:0:0 (GPU-0)

(–) NVIDIA(0): Memory: 1048576 kBytes

(–) NVIDIA(0): VideoBIOS: 62.94.62.00.51

(II) NVIDIA(0): Detected PCI Express Link width: 16X

(–) NVIDIA(0): Interlaced video modes are supported on this GPU

(–) NVIDIA(0): Connected display device(s) on GeForce 9600 GT at PCI:6:0:0:

(–) NVIDIA(0): IBM L170p (CRT-0)

(–) NVIDIA(0): IBM L170p (CRT-1)

(–) NVIDIA(0): IBM L170p (CRT-0): 400.0 MHz maximum pixel clock

(–) NVIDIA(0): IBM L170p (CRT-1): 400.0 MHz maximum pixel clock

(**) NVIDIA(0): TwinView enabled

(II) NVIDIA(0): Assigned Display Devices: CRT-0, CRT-1

(II) NVIDIA(0): Validated modes:

(II) NVIDIA(0): “nvidia-auto-select,nvidia-auto-select”

(II) NVIDIA(0): Virtual screen size determined to be 2560 x 1024

(–) NVIDIA(0): DPI set to (95, 96); computed from “UseEdidDpi” X config

(–) NVIDIA(0): option

(==) NVIDIA(0): Enabling 32-bit ARGB GLX visuals.

(–) Depth 24 pixmap format is 32 bpp

(II) NVIDIA(GPU-1): NVIDIA GPU GeForce 9600 GT (G94) at PCI:131:0:0 (GPU-1)

(–) NVIDIA(GPU-1): Memory: 1048576 kBytes

(–) NVIDIA(GPU-1): VideoBIOS: 62.94.62.00.51

(II) NVIDIA(GPU-1): Detected PCI Express Link width: 16X

(–) NVIDIA(GPU-1): Interlaced video modes are supported on this GPU

(–) NVIDIA(GPU-1): Connected display device(s) on GeForce 9600 GT at

(–) NVIDIA(GPU-1): PCI:131:0:0:

(II) NVIDIA(0): Initialized GPU GART.

(II) NVIDIA(0): Setting mode “nvidia-auto-select,nvidia-auto-select”

(II) Loading extension NV-GLX

(II) NVIDIA(0): NVIDIA 3D Acceleration Architecture Initialized

(==) NVIDIA(0): Disabling shared memory pixmaps

(II) NVIDIA(0): Using the NVIDIA 2D acceleration architecture

(==) NVIDIA(0): Backing store disabled

(==) NVIDIA(0): Silken mouse enabled

nvidia-xconfig: X configuration file generated by nvidia-xconfig

nvidia-xconfig: version 1.0 (buildmeister@builder62) Thu Apr 30 16:21:56 PD

T 2009

Xorg configuration created by pyxf86config

Section “ServerLayout”

Identifier     "Default Layout"

Screen      0  "Screen0" 0 0

InputDevice    "Mouse0" "CorePointer"

InputDevice    "Keyboard0" "CoreKeyboard"

EndSection

Section “InputDevice”

generated from default

Identifier     "Mouse0"

Driver         "mouse"

Option         "Protocol" "auto"

Option         "Device" "/dev/input/mice"

Option         "Emulate3Buttons" "no"

Option         "ZAxisMapping" "4 5"

EndSection

Section “InputDevice”

keyboard added by rhpxl

Identifier     "Keyboard0"

Driver         "kbd"

Option         "XkbModel" "pc105"

Option         "XkbLayout" "us"

EndSection

Section “Monitor”

Identifier     "Monitor0"

VendorName     "Unknown"

ModelName      "Unknown"

HorizSync       28.0 - 33.0

VertRefresh     43.0 - 72.0

Option         "DPMS"

EndSection

Section “Monitor”

Identifier     "Monitor1"

VendorName     "Unknown"

ModelName      "Unknown"

HorizSync       28.0 - 33.0

VertRefresh     43.0 - 72.0

Option         "DPMS"

EndSection

Section “Device”

Identifier     "Videocard0"

Driver         "nvidia"

BusID          "PCI:6:0:0"

Screen          0

EndSection

Section “Device”

Identifier     "Videocard1"

Driver         "nvidia"

BusID          "PCI:6:0:0"

Screen          1

EndSection

Section “Screen”

Identifier     "Screen0"

Device         "Videocard0"

Monitor        "Monitor0"

DefaultDepth    24

Option         "TwinView" "True"

Option         "MetaModes" "nvidia-auto-select, nvidia-auto-select"

SubSection     "Display"

    Viewport    0 0

    Depth       24

EndSubSection

EndSection

Please generate and attach an nvidia-bug-report.log.gz

Hi,

I did startx – -logverbose 6 and generated the attached file.

Can you recommend a 2nd board that may work while I wait for a resolution.

I tried a GTX 8800 and got as far as setting the break point in the kernel, but when I stepped into the kernel I got something to the effect that the GTX 8800 is not supported for cuda-gdb.

Perhaps this problem is invariant w/r to cards on fedora core 9, but if anyone reading this has been able to debug cuda programs on a card that is not attached to the display on fedora core 9, please let me know.

Thank you very much for your assistance,

Robert
nvidia_bug_report.log.gz (53.9 KB)

In your bug report, I see:

(!!) More than one possible primary device found

(–) PCI: (0@6:0:0) nVidia Corporation Geforce 9600 GT 512mb rev 161, Mem @ 0x91000000/16777216, 0xa0000000/268435456, 0x92000000/33554432, I/O @ 0x00003000/128

(–) PCI: (0@131:0:0) nVidia Corporation Geforce 9600 GT 512mb rev 161, Mem @ 0xb1000000/16777216, 0xc0000000/268435456, 0xb2000000/33554432, I/O @ 0x00005000/128

That’s a known X server bug, and I suspect that’s why both GPUs are getting ‘owned’ by X. I think the only workaround is to go to runlevel 5, then switch back to runlevel 3.

G80 GPUs (GTX 8800) are not supported with cuda-gdb.

That workaround works. Do you know what X server revs have the bug fix. I tried the cards on sles10 sp1 and the log states “one primary device found,” but I was unable to try it there because cuda-gdb exits with “floating point exception.” sles10 sp1 has X server rev 6.9.0 with a release date of 12 dec 2005.

Thank you for your assistance, I am up and debugging. Good timely and accurate info.

Regards,

robullelk

The X server bug isn’t present in RHEL5, which is actually the only supported environment for using cuda-gdb anyway.