Distributed SPMD with CUDA Future

NVIDIA: I have come to realize that it may be very advantageous, if years from now, I’m able to do distributed CUDA processing. Let me explain:

If I have a machine at home doing 1 TFLOP/sec, it may be advantageous to be able to get access to it remotely. In addition, if some service sitting on the Internet were to identify “public” machines that have been volunteered for distributed CUDA processing, then the sky is the limit. I can see letting a remote CUDA kernel run on my machine when I’m not using it. This would be similar to the concept currently used with the distributed processing for Protein folding or SETI, etc.

In any case, in my opinion, it would be very good for the future of CUDA if there is a Product Manager or Architect at NVIDIA thinking about how to provide the API and infrastructure on how to make distributed Internet based CUDA execution of kernels possible. :magic:

This should probably happen at a layer above CUDA. A platform like BOINC ( http://boinc.berkeley.edu/ ), which grew out of SETI@Home, would be an ideal setting. I don’t know what the issues would be for calling CUDA functions from BOINC, but that would be a good place to start.

As long as I can get to it from Microsoft tools, I’m happy. I’m not interested in Linux at this point. External Media

I’m working on a Business Plan right now that is has become possible because of the power of CUDA. I wish I didn’t have to spend time writing custom code for the infrastructure to get to my CUDA Server over the Internet.

NVIDIA: We need a CUDA Server with either Web Services access OR just an extension to the CUDA API. Maybe a Windows Service running on the Server that takes care of a stream of kernels and data to/from CUDA for an Internet App.

How is NVIDIA doing their S870 stuff? … Maybe an extension of that.

Maybe a service that runs within the Home Server concept from Microsoft.

Maybe Microsoft should buy NVIDIA External Media

Though an interesting idea, I’m sure it should not be part of CUDA. Distributed processing is a very complex beast. It can be everywhere from “distributed” between a few nodes in a cluster, between clusters, or publicly over the internet.
Solutions for those have been developed over and over again, maybe you should be looking into extending one. Because basically, distributing CUDA isn’t that much different from distributing normal CPU computation. And you still want to use the CPUs too, right?

Agree to wumpus.
CUDA and distributed computing are completely different things.
BOINC provides API and infrastructure you’re talking about, maybe there are some other Windows libraries that do the same.
There’s absolutely no difference between distributing CUDA or CPU. CUDA is just another ‘execution unit’ and all task allocation/partitioning is done at higher level.

BOINC (although not immediately obvious from the website) supports Windows, Mac, Linux, and Solaris. Hopefully we’ll get to see Nvidia’s port of CUDA to Mac soon, and 3 of those 4 will be CUDA-capable platforms.

(There are many other distributed task libraries, but BOINC came to mind first since SETI@Home was mentioned as a model.)

I agree. A general practical solution for distributed SPMD for CUDA is the right way to go. Unfortunately, that may take years to implement. I think I’ll do a subset of the general solution just to get my system on the air.

I’m envisioning a Windows Service that manages a pipeline of kernels for CUDA. Obviously, the Service would have to take care of allocating heap memory as necessary to manage the data going to/from CUDA devices.

ummmm … this can get complicated very quickly. But, aside from some CPU usage for the sake of the overhead needed by CUDA, the local CPU would not be doing any of the application code.