Tesla S1070 With CUDA 6.5?

Hello all,

I recently acquired 2 Tesla S1070 1U units for my server and am trying to get them to work. My server runs Ubuntu 16.04 with a GeForce GT 710 for basic display output. I installed the 4 pci interface cards and the S1070’s turn on properly.

My attempt to get the S1070’s to work started with installing the nvidia 340.96 driver with sudo apt-get install nvidia-340-dev. I then installed the CUDA 6.5 Toolkit by downloading the proper .run file. I extracted the Driver, Toolkit, and Sample Installers from the .run file so that the driver would not be installed. I then individually installed the toolkit and sample installers.

With the toolkit installed, I ran the included program deviceQuery to detect all available CUDA capable devices. Unfortunately, it only detected my GT 710 (the driver also only detected the GT 710). However, when I run lspci, it observes the 8 S1060’s present. I tried installing the 340.29 driver I extracted with the toolkit and samples, but it does not work with the ubuntu 16.04 kernel.

Is there any way to get this system to work and detect the 8 S1060’s? Do I need to install a different CUDA toolkit (maybe 5.5)? Do the S1060’s even work with somewhat modern software and drivers?

Any input would be greatly appreciated

As far as I recall, CUDA 6.5 still had support for GPUs with device capability 1.3, which is what the S1070 is. Note that according to the “Linux Getting Started” document, CUDA 6.5 is supported on Ubuntu 14.04, not 16.04.

I suspect that one challenge is finding an old driver that supports the S1070. Sorry, I cannot help you there, I don’t track old driver versions. The latest driver I see listed in NVIDIA’s legacy Linux driver archive is 340.98, but I do not have the faintest idea whether that is suitable for the S1070 or just consumer cards.

BTW, your post mentions S1060 and S1070, which is it? In my recollection there were no S1060s, only individual C1060 (actively cooled). Are these GPUs properly powered and cooled?

Thanks for the reply,

The Tesla S1070 is a unique setup where 4 GPU’s are stored in a 1U case (basically 4 passively cooled C1060’s). The GPU’s can be passively cooled since the 1U case is basically a wind tunnel + power supply. So when I refer to the S1070, it reflects the unit as a whole. The 1060 refers to the individual GPU’s inside. I know Nvidia didn’t actually make an S1060, but then again, each GPU does not actively cool itself, so I’m not sure if it deserves the “C” in front. Now, I shouldn’t pretend to be an expert, so you could very well be right and the individual GPU’s are simply called C1060’s. That being said, to avoid future confusion, I’ll simply refer to the S1070 as a whole.

I will try the route of installing Ubuntu 14.04, CUDA 6.5, and attempt to find a driver to patch it all together. Also, I’m very happy that you mentioned that CUDA 6.5 has support for 1.3 capable devices; That was one of my major concerns. I shall report back later this evening regarding the outcome of this attempt.

Thanks again

I know what an S1070 is, I was just puzzled by the mention of S1060. Passively cooled GPUs typically have an ‘M’ designation, so the use of M1060 for the individual component cards in the S1070 may be less confusing, although I don’t recall whether NVIDIA actually sold individual passively cooled cards under that name.

When you get your S1070 to work, I hope you won’t be disappointed by the relatively low performance (and massive power consumption!). Technology has come a long way in the past seven years. In practical terms people should be much better off with a modern mid-range Pascal-based GPU than ancient sm_1x devices that were high-end back in the day.

You are correct, with what little documentation I have, nvidia refers to the individual CPU’s as M1060’s; my mistake for saying S1060. Anyway, I installed ubuntu 14.04. I then installed the nvidia 331.38 package driver for ubuntu

sudo apt-get install nvidia-331

However, my server apparently installed 340.96 (not sure how that happened). Thus, my GT 710 works fine, but the driver still does not see the S1070’s. That is after I tried 5 different nvidia drivers I downloaded all of which failed to build the nvidia kernel. So, while CUDA 6.5 may support the CUDA 1.3 capable S1070, my downfall may be simply not being able to communicate with the S1070.

I realize that these old cards are really only good at single precision and are horribly inefficient. But it was a cheap alternative to learn CUDA programming on server grade GPU’s. However, after wasting up until now, 30 hours of my time trying to interface with these cards, it may be best to simply turn to a more mainstream GPU like the GTX 580.

I don’t know what your financial limitations are, but if you can afford it, my recommendation would be to look for at least a Maxwell-class GPU. Since the entire spectrum of Pascal-class GPUs is shipping at this point, you may be able to acquire a second-hand or refurbished Maxwell-based GPU quite cheaply.

Side remark: The S1070 has decent double-precision performance compared to modern consumer cards in which double-precision support has been reduced to the architecturally necessary minimum to reduce cost.