CUDA on demand offerings

Recently, Penguin Computing announced an offer for renting time on their clusters optimized for HPC work, and specifically on machines having Tesla units attached. I’m very interested in that kind of service - I do my work as an external consultant, but I prefer to do all of my development on a laptop, while having arranged with my employers to provide me with remote access to Tesla equipped machine or two, for occasional testing and profiling work. This worked very well for me so far, as I had no need to spend money, or space in my apartment, for bulky and noisy towers; however, oftentimes this kind of arrangement is not working very well for my employers: usually, they do not have guys on board with that much expertise in administering this kind of machine, and the machine does not get utilized very much anyway, so the setup is not that cost-effective to them. So I think having this kind of resource on-demand would be great, and I was crossing my fingers for more of these offerings to appear; thus, I was very excited after learning about Penguin Computing offering, but I’d also like to know does anyone else in forum is aware of, or have any kind of experience, with other alike offerings?

What is great about Penguin Computing entering this market is that these guys have great expertise in HPC clusters. And from what I was able to learn so far - they did it right: user is provided with remote access over SSH as expected, hardware choice is very good (high-end Intel processors, InfiniBand connections if needed, GPU access if needed), and cluster is managed by Scyld (logically, as Scyld is their product; but Scyld is also, in my experience, very nice choice, from user perspective, for that kind of setup). However, there is hefty “setup” price attached ($5000, and it seems no hours are included in this), and core hours are not that cheap either (I was told pricing starts from $0.70 per hour); and on the other side it is rather hard (not much details about the offer available: no public mailing list or forum to inquire, sales guys very kind but somehow slow in providing details) to get more precise information about various peculiarities in using that kind of system (for example: how exactly hours used are calculated, what are policies about drivers and software stack updates, etc.).

Out of other alike offerings, I was long ago aware of hoopoe, but seems like there is no progress from alpha testing phase from these guys long time ago, and on the other side recently I’ve encountered alike offer from ZetaExpress, but I was not inquiring about details (however, their pricing info seems to be publicly available). So I would be interested in eventual experience of forum members on any kind of system of this type, as well as on general opinion of these offerings. I still think even with pricing as high as it seems for Penguin Computing offering, that this kind of service could be cost-effective, at least for the development purposes in the kind of setup I described above: typical high-end multi-GPU machine would cost around $10000, and for that money one should be able to buy enough hours for couple years of periodical code testing and profiling, and still save himself from trouble of maintaining needed hardware, accompanying power bills, etc.

I too am very interested in developments here. With Amazon EC2’s new “Spot Pricing” feature, spare CPU can be had for nearly 8 cents per CPU-hour. This is getting very close to practical for a lot of our processing jobs, so I’m very curious to see how compute-on-demand services with CUDA will grow.

Unfortunately, 2 cents per second of GPU time (for ZetaExpress) is not very appealing unless your job makes only occasional use of the GPU. The $5000 setup for Penguin Computing (thanks for tracking that down, I was curious when I saw their announcement) rules out small scale use. I still hope that hoopoe improves their website (renders very oddly in my browser) and gets out of beta so we can see what the pricing will be.

I’ve been talking with Penguin Computing sales people in the meantime, and here is an update on their pricing. Good news is that pricing options are better than what I mentioned in my previous message - specifically, seems like “setup” fee is not that big as mentioned in my previous message. Bad news is that it seems pricing options are still in flux, so it happened that in successive e-mails in our messages exchange I get contradicting info on pricing; still, Penguin Computing sales people seem to be very nice in that they are really trying hard to understand the customer needs and usage patterns, and come up with an acceptable offer so if you need this kind of service, you should really get in touch with them (contact Emil Hsieh).

In any case - attached is their current pricing sheet.

(Edit: To admins: It’s not big deal, but if possible this thread should be probably moved to “General…” subforum; apologies for not starting it there at the first place.)
POD_Price_List___2_02_10.pdf (106 KB)

Another update on this topic: seems like SGI has come up with an offer of alike type - see the announcement and corresponding service description at SGI site, as well as HPCwire article.

Yet another update on this topic: while Penguin Computing guys are still working with NVIDIA to actually provide access to Tesla cards attached to nodes of their cluster, throughout this week I had an opportunity to try Sabalcore on-demand offering. They have GeForce GTX 285 cards attached to nodes in one of their on-demand clusters (in single GPU per node configuration at the moment). I’ve tried running some simple CUDA programs, and this worked like charm. The job queuing system is Torque, already discussed somewhat on this forum, and which is I think conforming to OpenGroup batch environment services specification, thus very natural to use even for novices, and especially for anyone having to deal with jobs batching on any kind of cluster. Dealing with GPUs fits naturally with Torque, and especially nice is that Sabalcore guys are treating GPU equally to processor core in pricing (full pricing sheet is attached). So - in my opinion, great offer overall.
SCI_pricing_schedule_2010b.pdf (58 KB)

Sabalcore looks more like what I was trying to find. Thanks! (Not sure if I’m ready for $1.60/hr to run a CUDA app yet, but at least the option is there.)

If anyone is interested in Tesla computing in the cloud, please contact NVIDIA: