Sorry if this post rambles on a bit, but I had a few hardware-related ideas I wanted to share with nVidia (and see what everyone else thought as well). Here goes…
- Bandwidth doubling add-in card: What about producing a small add-on card that would fit into an x16 slot and have an SLI connector? The idea here would be to connect this card to your CUDA-enabled GPU (for example: GTX295), which would then be able to transfer data over SLI to/from the add-on card. The benefit here would be that a single GPU could have access to (theoretically) x32 bandwidth. In some cases, people may have kernels which are highly bandwidth intensive, but perhaps not very compute-intensive (since some algorithms/code don’t parallelize as well as others). By simply throwing in this add-on card, they could effectively double (+/-) their computing speed. Perhaps the hardware engineers could find a way to daisy-chain these things, so that a single CUDA card (again, lets use the GTX295 as an example) could be connected to two of these cards, giving it a theoretical x48 bandwidth.
I’m not a hardware engineer (though I did have some computer engineering classes in school), but I don’t think that this would even be a particularly complicated card to make…just a simple microprocessor to handle communications between the add-on card and the PCIe bus host, and between the add-on card and the GPU. Perhaps maybe throw in a single memory chip to act as a transfer buffer. The cost should be less than $100 for something like this (and that’s new…once they’ve been out for a while, I would think it could even be much less than that).
This would also be useful to non-CUDA folks (i.e. gamers), since I imagine that some games are fairly intensive when it comes to transferring textures back and forth to the graphics card, and having a cheap add-on like this that doubled the transfer bandwidth would be a great way to take a lead over “that other company” (since they were/are already behind in that department).
- Mobile Tesla card: I don’t know how far off double-precision support is for ‘normal’ mobile GPU’s, but over the past year or so, I’ve seen a lot of people asking when it will be available. CUDA users in highly technical fields (e.g. engineering, finance) really have to have double-precision support, and thus, CUDA sometimes doesn’t ‘cut it’ for them.
What I’m thinking of is to make a mini PCIe board (picture) with a GPU and some memory, so it would basically be a mobile Tesla card. As I mentioned, putting double precision support in this would make it a killer app, especially if nVidia and ATI aren’t planning to have DP available in their normal mobile GPU’s in the near future. I’d say either take one of the higher-end 9M or GT100 series chips, put 1GB of ram on it, and allow the clock to be dynamically controlled by the user (to save battery power, and keep it from overheating)…and you’d have a bunch of people who would like to upgrade their older laptops, use CUDA in their new slim laptops/netbooks, or even those who have a higher-end mobile GPU already (in a larger notebook) but simply want more mobile CUDA power. For DP support, use the GT200b from the new GTX260-216 and clock it way, way down (to like 20% speed).
If you haven’t noticed…lots of newer laptops/netbooks are coming with multiple PCIe slots for things like WWAN, WLAN, and so forth, and unless you’ve totally maxed out (i.e. purchased every available option) on your new computer, chances are you have an available slot. If not, and you can deal with an external wireless card, you could remove the internal WLAN card to put in the mobile Tesla.
Honestly, the reason I haven’t bought a new laptop lately is because it seems impossible to find a reasonably small one (13.3" or 14.1" screen) with a decent GPU to do CUDA programming on, and demonstrate GPU-accelerated applications. Being able to buy a GT200b-equipped Mobile Tesla and just pop it in one of these things would really be ideal for me, and it seems many others as well.