Issues with GTX280 and Mandelbrot

All stuff with the GTX280 is tested with the newest driver and CUDA 2.0 beta 2. Mandelbrot was tested especially on the GTX280 machines (Ubuntu 32 + 64 bit and Windows – all the same result of crashing after some time).

I will do that when i’m back to office (Monday).

Is there a fast possibility to describe how to do this?

Regards,

Manuel

Already found a nice tutorial. I will try that as soon as possible. What logs and reports do you want to see? Or just a backtrace or sth like that?

Regards,

Manuel

I’d like to see the complete serial console output starting from boot until the system crashes, along with an nvidia-bug-report.log.

I did some test with GTX280 and the Mandelbrot example (we have the same problems with our programs…)

done

mwerlberger@tweety:~$ dmesg | grep NVRM

[   35.435507] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  177.13  Tue Jun 10 16:42:55 PDT 2008

added two logs as appendix. Once done in idle mode and another one when the XServer hangs (from ssh remote)

I also did and attach with gdb and a bt full. But when i compile with debug flags that was not possible. For release mode the output is:

(gdb) bt full

#0  0x00007fc463814d53 in select () from /lib/libc.so.6

No symbol table info available.

#1  0x00007fc4614872b6 in ?? () from /usr/lib/libxcb.so.1

No symbol table info available.

#2  0x00007fc461488e5a in xcb_wait_for_reply () from /usr/lib/libxcb.so.1

No symbol table info available.

#3  0x00007fc461ae3f78 in _XReply () from /usr/lib/libX11.so.6

No symbol table info available.

#4  0x00007fc464a0475e in ?? () from /usr/lib/libGL.so.1

No symbol table info available.

#5  0x00007fc4649dae03 in ?? () from /usr/lib/libGL.so.1

No symbol table info available.

#6  0x00007fc462773150 in ?? () from /usr/lib/libGLcore.so.1

No symbol table info available.

#7  0x00007fc4649c8ae6 in ?? () from /usr/lib/libGL.so.1

No symbol table info available.

#8  0x00007fc4649ce961 in glXSwapBuffers () from /usr/lib/libGL.so.1

No symbol table info available.

#9  0x00007fc464257ac3 in glutSwapBuffers () from /usr/lib/libglut.so.3

No symbol table info available.

#10 0x00000000004054b8 in displayFunc ()

No locals.

#11 0x00007fc46425ef13 in ?? () from /usr/lib/libglut.so.3

No symbol table info available.

#12 0x00007fc464262169 in fgEnumWindows () from /usr/lib/libglut.so.3

No symbol table info available.

#13 0x00007fc46425f7df in glutMainLoopEvent () from /usr/lib/libglut.so.3

No symbol table info available.

#14 0x00007fc46425fc48 in glutMainLoop () from /usr/lib/libglut.so.3

No symbol table info available.

#15 0x0000000000405980 in main ()

No locals.

not done yet. Is that also ok with an remote ssh connection? Or is the seriel console needed to execute the program directly from there?

The newest SDK was used for testing.

mwerlberger@tweety:~$ nvcc -V

nvcc: NVIDIA ® Cuda compiler driver

Copyright © 2005-2007 NVIDIA Corporation

Built on Tue_Jun_10_04:42:57_PDT_2008

Cuda compilation tools, release 1.1, V0.2.1221

I also did some testing with the nvcc flags for double presicion. Just makes Mandelbrot freeze Xorg right at program start.

Thanks for any suggestions.

Manuel

Thanks. Are you able to add another GPU to the system such that the GTX280 is only used for CUDA, and not display rendering?

Hi!

Yes this is possible. I already added a 8800GTX to the existing setup. Till now i use the 8800GTX for CUDA calculation to continue with the development. I will change my xorg setup tomorrow in a way that the 280GTX can be used for calculation only.

Is there a particular test i should run?

Regards,

Manuel

done… same problems as before. Mandelbrot again freezes the XServer. Just change the color, zoom and pan a little bit and the fun is over again (fast in every sense). The only positive thing is that with our programs we sometimes get an undefined launch failure instead of a hanging X. But in fact this does not make the situation much better…

The setup did not change except the 8800GTX as device 0 for display rendering of course. Mandelbrot was adapted in a way that cuda uses device 1 (the 280GTX) for calculations.

Any further suggestions? In my optinion everything still points to a bug within CUDA 2.0b2?

[edit]

I played around with my 2 cards setup. It turns out that also the 8800 GTX freezes when Mandelbrot is compiled with CUDA 2.0b. It takes much longer till the XServer hangs but still it is an issue because the stuff should run longer than just a few minutes. I appended a log when a freeze happens with the Mandelbrot application with the 8800GTX as cuda device.

[/edit]

Regards,

Manuel

From your bug report, it looks like you’re running both X and CUDA on the 8800GTX while the GTX280 sits 100% idle. Are you certain that you’re using the 8800GTX for X only?

I just tested with a cudaGetDevice within the fps calculation of the Mandelbrot application. Therefore it returns device 1 which is the GTX280 in my setup.

Manuel

Does nobody else have problems with the new cards in combination with CUDA? Since we can reproduce the error with different setups using CUDA Toolkit 2.0 (Win/Linux, 32/64-bit, 8800GTX/GTX280 and all variations) with an SDK and with other algorithms, i do not think it is a problem with our specific setup.

Would be great if anybody could report about experienced behaviour too.

Regards,
Manuel

I’m not able to reproduce any stability problems with Mandelbrot using 177.13 and a GTX280.

Were all of your tests using the same motherboard?
Have you verified that you’ve applied the most recent motherboard BIOS?

We tested the algorithms on three different systems with three different motherboards (Intel & Nvidia chipsets). I also installed the most recent bios updates, but the problems still remain.

Regards,
Markus

Could you try to lower memory/core clock by 20-40% and try again?

I lowered all clocks by 30% … the programs still crash.

As I already reported, cudaGetDevice is useless, it reports whatever you last passed to e.g. cudaSetDevice, regardless if it works or not. For me a more reliable method was to look at the GPU temperatures with nvidia-settings.

I looked at the temp too and the right one was bound to the CUDA stuff. Maybe this is important and i did not mention that before: We use a EVGA FTW card. Don’t konw if there are some known issues with specific cards? But normally EVGA knows what they are doing?

Thx,

Manuel

It does not work with a GTX 280 from ZOTAC either …

Still the same behaviour on all our machines and no idea what causes the problem. When does the stable version of CUDA Toolkit 2 gets released? Hopefully we can use the new cards then?!

Regards,
Manuel

I am getting some freezes also with Windows XP Professional 64 Bit, 177.35 driver, Cuda SDK 2.0 beta. Hardware is nVidia 8800GT

The Mandelbrot demo stops updating its display window, although the window title still updates the FPS numbers. This occurs randomly, after some seconds of zooming and panning in the fractal.

Christian

We have the same problems with linux (Asus GTX280, 32bit linux, 177. 13 driver, cuda 2.0beta) and the mandelbrot example, all other examples work fine.