Here's an annoying not related question :)

Someone in my company suggested to drop the GPUs and move to FPGA (now I know only very little about this)…
but thought I’d might ask here for opinions…
What does people here think about this? I know there are a few posters here who probably have a big
GPU cluster and probably played a bit/checked FPGAs as well - what do you think?

for example:

many thanks

Without going into much detail, it is my opinion that programming environments for FPGAs are very difficult to work with compared to CUDA. I would wait for something like this to get picked up and released as a product:

Hi - thanks.

In their whitepaper ( they are again talking about a X4600 speedup over

a Core2 Duo - now we’ve discussed those numbers over and over - they are not optimized CPU algorithms, the CPU computer is old,…

and they compare it to a highly optimized hardware.

However - they claim they can put a cluster of such FPGAs in a machine, with reasonable power consumption, bla bla bla…

and they can fit either to PCI or in a dedicated box (as far as I understood- much like you’d have a box with 100 GPUs in it)…

Is this seems reasonable?

They also claim they can put any algorithm on it (lets say it will take 6 months for them to build) and then you can have minor/big

changes applied with a small effort.

This sounds “too good” why not all of us move to it??? I’d appriciate if someone checked those kind of solutions recently and

decided to go with GPUs nevertheless (nVidia official view on this would be also appriciated ??? :) )



The rule of thumb for FPGAs is that they typically allow you to put 1/4 the hardware on the same area at about 1/4 the speed. The advantages are:

  1. you don’t have to spend millions of dollars to fab your own chip

  2. you can change the hardware any time that you want

The disadvantage is that you still have to design the hardware that goes on the FPGA. Peter Hofstee(lead architect for cell) gave a talk a few years ago where he mentioned that the total design cost for the Cell processor was about 500 engineers working for 4-6 years for a total NRE of about $500 million. A significant portion of that type of design (layout) is not required for doing the equivalent on an FPGA. But you are still looking at around 2/3’s of the work and cost going into everything above VLSI layout.

That type of effort is impossible for the target users of FPGAs, so FPGA companies usually bundle in a compiler that starts from a high level representation of a program and automatically generates hardware. The more abstract the compiler, the less efficient the hardware is on average. The advantage is that the hardware is specifically designed to run only a single application, so it can be optimized for that application. For example, if your application doesn’t use textures, the compiler can throw away texture caches and interpolators and use the area for more cores.

The hardware has the potential to be faster than a GPU. The highest end FPGA that I could find (a Xilinx XC6VSX475T) running at 600mhz could hit over 1.2 TOP/s where each operation is a 25x16-bit integer add/multiply, if it was configured as an array of multipliers. The problem lies in taking a high level application (say in C) and converting that into hardware. Some simple applications will map easily, but a lot of times someone will be stuck doing manual hardware design to fill in the gaps that the compiler couldn’t generate efficient hardware for.

I personally think that hardware design is fun, but if I spent my time designing application specific processors for FPGAs it would take me years to do anything more complicated than matrix multiply.