anatomy of a 4x C1060 Tyan 7025S system X server, driver, toolkit and SDK installation

Hello All,

Are there some clear instructions for installing the necessary CUDA software elements in a system with C1060 Tesla cards only?

The system I am referring to is full with Tesla cards, and the only video output it has originates from the Tyan motherboard’s VGA AST2050 controller, whose X driver is “ast”. I find it very confusing that all CUDA documentation is asking for the installation of the nVidia display driver, and even more, its configuration using nvidia-xconfig. Why would one need such a thing if there is no corresponding display hardware in the system?

Furthermore, the device nodes do not get created as the documentation indicates. Loading the kernel module does not invoke the generation of the appropriate device nodes. Running

> nvidia-xconfig --query-gpu-info

however, does lead to the creation of the nodes, along with the desired report. What is needed then to create the device nodes, udev rules?

Last, but not least, I am building the system from scratch, compiling every tiny part of the 64 bit Linux from source code. My starting point is LSF, CLFS and CBLFS. The CUDA toolkit is downloadable for various distributions only, and I could not find any info about what is really needed in order for it to work properly. Is there a list of requirements somewhere that I could use to put the necessary elements (such as libraries for example) in place?

Thanks ahead,

Tibor

This thread answers most of the device and installer questions you have.

As for trying to use Linux from scratch, just don’t do it. My experience in trying to get closed source binary packages to work correctly with LFS style build tells me you won’t get it work, and the enormous amount of effort you will waste in trying could be spend doing something useful, like computation.

Best of luck.

3 consecutive days, 3 very similar questions… :)

Wow… avidday is so correct. Don’t try to compile all of linux yourself even if you can. I even spent over six months linux kernel hacking back even before Linux 1.0, making specialized distributions and kernel mods for making cheap legacy hardware (ie Pentium I CPUs!) act like a router… and today I just use a stock Ubuntu distribution.

If you want a system compiled from scratch (again, not much use these days), I have seen some people on the forum have success with Gentoo, but this is of course still an unsupported configuration.

Thank you All for your input.

In the meantime I’ve been restless and I worked almost everything out. The sytem is up and running, passing all tests except the ones needing GLX. I really enjoyed watching

perform, and start generating heat :)

(For now the system runs kernel 2.6.31.6, gcc 4.2.3, Xorg 7.5, all in a multilib setting. In the near future I will rebuild it with kernel 2.6.32, gcc 4.4.1 64-bit only. I semi-automated the build process, and I will publish it for others to see my “adventure”).

Having my system built from scratch is not a drawback. On the contrary, knowing every single tiny part of it, I hope, will help me later with the development work I am planning on undertaking. This is why I did it. I believe most distributions are incomplete and buggy, and I do not trust them. Having things straight from the source at least leaves you with the inherent bugs alone.

Regarding the creation of the nodes, I’we been aware of the script you suggested, but I believe there must be a more elegant and robust way of doing it, without messing or complicating the init scripts up too much. In fact the command

nvidia-xconfig --query-gpu-info

I mentioned does the job very elegantly. I think it could be inserted very nicely in the init scrips. It’s a single command. On the other hand, I am thinking about exploring the possibility to write udev rules.

I maintain my opinion, that the documentation is biased, that is, it addresses the needs of only those, who drive their monitors with nVidia devices. It could use some clarification. nVidia amployees, would it be possible to get some official guidelines?

Along the way so far, I got lots of problems with error messages questioning the existence of the GPU devices. As I can see, such problems as

or

have been experienced by many users, and, at least in my case, they came boiled down to improper pairing of driver-toolkit-SDK versions. I started with 3.0beta for Fedora10, which did not work, and I ended up with a fresh download of 2.3 of everything, employing the RHEL5.3 toolkit (I did not test 2.3 for Fedora10). It works like a charm now.

My way is still not worry-free:

[list=1]

[*]cuda-dbg needs libtermcap.so.3, which I do not have, and I will have to fix it.

[*]I allowed for the installation of the OpenGL(X) headers and libraries of nVidia, which I suspect was a mistake, and it is responsible for not being able to run any of the GL demanding test codes. Would this be a proper assumption?

[*]Based on what I am reading, turning to gcc 4.4.1 might not be a goot idea. What the exact consequences are, I am not sure though.

Thanks ahead,

Tibor