Problems running CUDA on non-primary display

e.ping · March 2, 2007, 7:05pm

I have installed a GeForce 8800GTX board on an Intel branded ATi motherboard with ATi integrated graphics. When I installed CUDA, the ATi graphics were disabled, and the 8800GTX was the primary monitor. I read on other parts of the CUDA forums that it’s best to use another card for your Windows Desktop and reserve the CUDA card for CUDA calculations - otherwise your system may hang during long kernel executions. So I installed the ATi drivers, and then told Windows to use the ATi graphics as the primary monitor, and not to extend the Windows Desktop to my 8800GTX.
It seems that doing so messed up my system - the Windows Display control panel now reports that “The currently selected graphics driver cannot be used. It was written for a previous version of Windows, and is no longer compatible with this version of Windows. The system has been started using the default VGA driver.”
This is a mistaken error message, as far as I can figure out, since the driver versions are listed correctly in Control Panel->System->Hardware->Device Manager, and the device manager reports that both display adapters are working properly.
Whatever the cause of this error message, CUDA fails to find the 8800GTX at runtime and therefore won’t run anymore.
I’m going to disable the ATi graphics and attempt to use the 8800GTX as my primary display again, so that I can continue with my CUDA programming, but I was hoping someone on this forum would know if there is a solution to this problem, or whether this arises out of inherent incompatibilities between the ATi drivers and the nVidia drivers. It would be nice if I could reserve my 8800GTX for CUDA computations only.

Thanks!

houzet_dominique · March 2, 2007, 8:14pm

Hy

I’ve experienced the same thing. Then I’ve desactivated my ATI board, rebooted, then reactivated it and selected it to be my primary monitor with the GTX the secondary one, then everything is OK. if I not extend the desktop to the GTX, the execution time seams to be faster but the data read from the GTX memory are always 0!

Hope it will help…

D. Houzet

I have installed a GeForce 8800GTX board on an Intel branded ATi motherboard with ATi integrated graphics. When I installed CUDA, the ATi graphics were disabled, and the 8800GTX was the primary monitor. I read on other parts of the CUDA forums that it’s best to use another card for your Windows Desktop and reserve the CUDA card for CUDA calculations - otherwise your system may hang during long kernel executions. So I installed the ATi drivers, and then told Windows to use the ATi graphics as the primary monitor, and not to extend the Windows Desktop to my 8800GTX.

It seems that doing so messed up my system - the Windows Display control panel now reports that “The currently selected graphics driver cannot be used. It was written for a previous version of Windows, and is no longer compatible with this version of Windows. The system has been started using the default VGA driver.”

This is a mistaken error message, as far as I can figure out, since the driver versions are listed correctly in Control Panel->System->Hardware->Device Manager, and the device manager reports that both display adapters are working properly.

Whatever the cause of this error message, CUDA fails to find the 8800GTX at runtime and therefore won’t run anymore.

I’m going to disable the ATi graphics and attempt to use the 8800GTX as my primary display again, so that I can continue with my CUDA programming, but I was hoping someone on this forum would know if there is a solution to this problem, or whether this arises out of inherent incompatibilities between the ATi drivers and the nVidia drivers. It would be nice if I could reserve my 8800GTX for CUDA computations only.

Thanks!

[snapback]166138[/snapback]

Archer · March 5, 2007, 1:51am

Hi,

I think my question can also be posted under this topic. :) I am using a Geforce 6800GT as the primary graphics card for displaying and a 8800 GTX for computation. And the 97_73 driver is installed for both of them.

Although the “unspecified launch failure” error message does not appear again, the computation results are not always correct. It seems when the device runtime
is beyond 5 sec, the results are all “0”, when it is within 5 sec, results are correct.

Can someone help me fix this problem? Thanks

prkipfer · March 5, 2007, 9:47am

Quoting the CUDA release notes:

The watchdog timeout for GDI operations is 5 seconds.

Peter

Archer · March 5, 2007, 10:32am

Hi Peter,

I have used another Nvidia card (6800 GT) as my primary display adapter and 8800 for computing. Do you mean in this case the 5 sec limitation still exists?

prkipfer · March 5, 2007, 10:44am

Haven’t tested that with XP. After all, CUDA talks to the card via the NV driver. If the watchdog hooks in there, the driver might not react correctly. (You do see the 8800 in the control panel after all, right? Even though it is deactivated.)

I can report that on Linux the 5 sec limit only applies if the 8800 is running the desktop. To run CUDA, you actually don’t even need a X11 desktop at all in the machine. So the 8800 can be the only card. Just make sure you load the nvidia kernel driver on startup (modprobe in boot.local) and CUDA runs fine on text-only server systems without timeout.

Peter

Archer · March 5, 2007, 12:16pm

Thanks Peter. I have done some tests under Windows XP and found this problem does exist even G80 is not used as the primary display card. I have posted my problem on the “CUDA programming and developement” branch.

Mark_Harris · March 6, 2007, 10:38am

To answer the ATI/NVIDIA questions: I don’t think it’s legal to have multiple display drivers installed simultaneously in Windows XP. What you are trying to do is install an ATI driver, then install an NVIDIA driver, which disables the ATI driver, then make the ATI GPU your primary display adapter, which won’t work with the NVIDIA driver.

You need to use a different NVIDIA GPU as your primary display adapter, like the guys with 6800s are doing.

Mark

jhanweck · March 25, 2007, 1:25am

I’ve had good success running a GeForce 8800 GTS as the Cuda GPU, and a GeForce FX 5200 as the (primary) video display under Windows XP Pro SP2 on a Dell Precision 360.

The GPU runs kernels > 5 seconds with no problems. Works like a champ!

Note: The GeForce 8800 GTS is PCIe 16x; the FX 5200 is straight PCI. That might have something to do with it.

Here’s what worked for me – though no guarantees:

Install the FX 5200 FIRST as the ONLY video adapter, and get it working with the Cuda drivers (currently 97.73). [This may require disabling any on-board video cards in the machine.] THIS STEP IS VERY IMPORTANT!
If your BIOS has a mechanism for selecting the PCI graphics card (not PCIe) as the default card, do so.
Power down the machine, and install the GeForce 8800 in the PCIe slot.
Grab your lucky charms and power up in Safe Mode… External Media The very brave can try booting directly to regular Windows mode and skip to Step 8… External Media
Under Settings->Control Panel->System->Hardware->Device Manager->Display Adapters, right-click on the GPU card and select “Disable.”
Power down the machine. Reboot in regular Windows mode.
If all goes well, Settings->Control Panel->System->Hardware->Device Manager->Display Adapters, right-click on the GPU card and select “Enable.” This might cause things to flicker a bit… <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />
If all goes well, right click on the Desktop, select Properties->Settings. Make sure FX 5200 (or whatever card you want for your video card) has the “Extend my Windows desktop onto this monitor” box CHECKED, and the card you want as your GPU has this box UNCHECKED. (You select the proper cards using the Display dropdown.)
I was able to run Cuda kernels at this point, with no 5-second limitation. :magic:

Archer · March 26, 2007, 6:39am

I’ve had good success running a GeForce 8800 GTS as the Cuda GPU, and a GeForce FX 5200 as the (primary) video display under Windows XP Pro SP2 on a Dell Precision 360.

The GPU runs kernels > 5 seconds with no problems. Works like a champ!

Note: The GeForce 8800 GTS is PCIe 16x; the FX 5200 is straight PCI. That might have something to do with it.

Here’s what worked for me – though no guarantees:

Install the FX 5200 FIRST as the ONLY video adapter, and get it working with the Cuda drivers (currently 97.73). [This may require disabling any on-board video cards in the machine.] THIS STEP IS VERY IMPORTANT!

If your BIOS has a mechanism for selecting the PCI graphics card (not PCIe) as the default card, do so.

Power down the machine, and install the GeForce 8800 in the PCIe slot.

Grab your lucky charms and power up in Safe Mode… External Media The very brave can try booting directly to regular Windows mode and skip to Step 8… External Media

Under Settings->Control Panel->System->Hardware->Device Manager->Display Adapters, right-click on the GPU card and select “Disable.”

Power down the machine. Reboot in regular Windows mode.

If all goes well, Settings->Control Panel->System->Hardware->Device Manager->Display Adapters, right-click on the GPU card and select “Enable.” This might cause things to flicker a bit… <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />

If all goes well, right click on the Desktop, select Properties->Settings. Make sure FX 5200 (or whatever card you want for your video card) has the “Extend my Windows desktop onto this monitor” box CHECKED, and the card you want as your GPU has this box UNCHECKED. (You select the proper cards using the Display dropdown.)

I was able to run Cuda kernels at this point, with no 5-second limitation. :magic:

[snapback]175603[/snapback]

Thanks jhanweck. I will try your method. Did you try running your program much longer than 5 seconds for example, 1 minutes and check the results? I have tried linux and still got the garbage results when running time is more than such as 20 seconds. I think it’s a bug in the driver.

jhanweck · March 26, 2007, 3:24pm

Archer, after further testing, I’m afraid you may be right… :(

Two problems I’ve encountered with this:

After rebooting, the driver sometimes “forgets” which displays are active and inactive. For instance:

Before reboot: the FX 5200 is display 1 and 2, with 1 as the primary monitor, and desktop extended to these displays; and the 8800 GTS is display 3, with the desktop NOT extended to these displays; things work ok.
After reboot: the displays are shuffled around! The driver assigned displays 1 and 3 to the FX 5200, display 2 to the 8800 GTS, and extended the desktop to all of these! At this point, CUDA will not work…

This can be fixed temporarily by Settings->Control Panel->System->Hardware->Device Manager->Display Adapters; right click the 8800 GTS, and disable it. Then, reboot. The FX 5200 should be fine. Then, Settings->…->Display Adapters, right click the 8800 GTS, and enable it. Back up and running… though this is certainly not ideal.

I haven’t run a program more than 20 seconds, but I am encountering some “instabilities” in the results on shorter kernel runs. The first run is fine, the second run is messed up, the third run is fine again, the 4th is messed up… I’m still investigating… will post when I have more.

All the above is quite a hack anyway. I’m hoping as CUDA and the drivers become more stable, all this will be unnecessary.

jhanweck · March 26, 2007, 8:46pm

Archer, are you using texture memory in your application?

jhanweck · March 27, 2007, 8:39pm

The reason I ask is I’m running into some memory allocation bugs, and they seem to manifest when texture memory is used.

jhanweck · March 27, 2007, 10:42pm

Archer, I’ve run into a strange bug that could be related to your problems:

I have a piece of code that gives different results on every other run. (It’s a fairly long program so I can’t include it here.)

There’s a global (kernel) function in the code that I’m not currently using… it’s NEVER called.

If I comment out that function, the program works perfectly every run!

If I put it back in, the program works only on every other run.

Since the program works under emulator in either case, I suspect something amiss in the compiler. Until I can narrow it down to something simpler, it’s hard to say.

Archer · March 28, 2007, 1:52am

I did not use texture memory in my application.

Archer · March 28, 2007, 2:07am

Hi,

I am using CUDA under Windows XP. According to the following CUDA release notes:

I use a Geforce 6800GT as the primary graphics card for displaying and a 8800 GTX for computation. The Windows Display Driver version 97.73 for CUDA Toolkit Version 0.8 is installed for both of them.

Here is the host code:
#include <stdlib.h>

#include <stdio.h>

#include <string.h>

#include <math.h>

#include <cutil.h>

#include <CUDA_test_kernel.cu>

#define BLOCKNUM 16

#define THREADNUM 32

void runTest( int argc, char** argv);

int

main( int argc, char** argv) 

{

    runTest( argc, argv);

   CUT_EXIT(argc, argv);

}

void

runTest( int argc, char** argv) 

{

	

	FILE* output = fopen("output", "w");

	

	int memsize_byte = sizeof(int) * BLOCKNUM * THREADNUM;

   int *d_output;

    CUDA_SAFE_CALL( cudaMalloc( (void**) &d_output, memsize_byte ) );

   dim3  grid(BLOCKNUM, 1, 1);

    dim3  threads(THREADNUM, 1, 1);

	

	//Set timer

    unsigned int timer = 0;

    CUT_SAFE_CALL( cutCreateTimer( &timer));

    CUT_SAFE_CALL( cutStartTimer( timer));

    

    printf("Begin testing...\n");

    

    testKernel<<<grid, threads>>>(d_output);

   CUT_CHECK_ERROR("Kernel execution failed");

    

    printf("Computation completed.\n");

	

	//Time

    CUT_SAFE_CALL( cutStopTimer( timer));

    printf( "Processing time: %f (ms)\n", cutGetTimerValue( timer));

    CUT_SAFE_CALL( cutDeleteTimer( timer));

    // allocate mem for the result on host side

    int* results = (int*) malloc(memsize_byte);

   // copy result from device to host

    CUDA_SAFE_CALL(cudaMemcpy(results, d_output, memsize_byte, cudaMemcpyDeviceToHost) );

	for (int i = 0; i < BLOCKNUM * THREADNUM; ++i)

	{

  fprintf(output, "%d, %d\n", i, results[i]);

	}

	fclose(output);

    free(results);

    CUDA_SAFE_CALL(cudaFree(d_output));

}
And the kernel:
__global__ void

testKernel(int* output)

{

	const int tid = threadIdx.x + blockIdx.x * blockDim.x;

	int tempValue = 0;

  

	for (int i = 0; i < 10000; ++i)

	{

  for (int j = 0; j < 10000; ++ j)

  {

  	tempValue = max( (i + j) % 4, (i + j) % 3 ) + 2;

  }

	}

	

	output[tid] = tempValue;

	tempValue = 0;

}
There are no compiling or running error issued. However, the computation results are not always correct. By tuning the upper limit of i or j in the kernel, we can get different run time. For the current value, it is about 6 sec. When the upper limit of j is changed to, such as 5000. The run time will within 3 sec. Now, my problem is when the kernel runtime is beyond 5 sec, the results are all 0, when it is within 5 sec, results are correct.

Any suggestions are appreciated.

[snapback]167375[/snapback]

Hi jhanweck, above is a test program I used. The correct results should be positive intergers within [2, 10). You can try it.

jhanweck · March 28, 2007, 3:27pm

Archer,

I ran your example compiled with -D_DEBUG.

With the loop bounds at 1000 each, it ran without trouble.

Changing the loop bounds to 10000 each, it ran for 6.4 seconds and died with:

Cuda error: Kernel execution failed in file ‘cuTestArcher.cu’ in line 62 : unspecified launch failure. [Line 62 is the kernel call.]

That explains the zeros…

Changing the loop bounds to 20000 each, the kernel ran for 25.6 seconds, and terminated with the same error.

This is different behavior from running on the primary display card; in that scenario, if the kernel runs for more than 5 seconds or so, it typically hangs the machine.

So, I suspect something amiss in the driver, not the OS.

[edit] Linux has troubles, too. [url=“http://forums.nvidia.com/index.php?showtopic=30575”]http://forums.nvidia.com/index.php?showtopic=30575[/url]

jhanweck · March 28, 2007, 8:05pm

Also, when running Archer’s test code, my CPU (not GPU!) usage hits 50% and sticks there until the app is finished.

Why is CPU usage so high when the GPU is doing all the work???

tachyon_john · March 29, 2007, 3:13am

I filed a bug on this for Linux already. I don’t know the cause, but I believe the CPU load is probably caused by the CUDA runtime spinning on a mutex or something like that. Presumably they’ll have this fixed in a subsequent beta version. Old OpenGL drivers used to do similar things some years back, so I’m sure this is easy to solve, they just need time to do the work most likely.

John

jornskaa · April 24, 2007, 1:53pm

It seems I’ve stumbled into a similar problem when trying to run CUDA on a secondary monitor while rendering with OpenGL. Although my machine only has one graphics card installed - 8800GTX. Two monitors are connected to the graphics card. When running simpleGL or postProcessGL (the examples delivered with the SDK), they start up gracefully on my main display. However, when dragging from the primary monitor to the secondary monitor, the secondary monitor only shows a black area. And when having completed the drag (the whole window within the secondary monitor), the computer locks and has to be rebooted when running simpleGL. When running postProcessGL the application simply exits when completing the drag.

The software I’m developing needs to be able to render to a secondary monitor in order to be useful. Has anyone experienced something like this? Is there a solution?

I have made an application which when run with one monitor connected works great, but when connecting a second monitor the application only displays a black window. This is even without dragging the window to the secondary monitor. I will debug this further to see if it is my fault or is caused by the same issue.

Thanks,
Jørn

Using display driver version: 97.73
CUDA SDK version: 0.8.1
OS: Windows XP Professional (SP2)
Graphics card: 8800GTX

PS: Many thanks to NVIDIA for releasing CUDA. Just what I needed!