[Help] K20 Performance with Hadoop?

Hi everyone,

I am looking to get some guidance and information regarding the use of CUDA with Hadoop/MapReduce.

I have been researching this topic for a few days reading over benchmarks and white papers the whole 9 yards but I am struggling to find relevant information that deals with CUDA, the Tesla K20 card and Hadoop.

I am looking for what sort of performance increases we can expect from a K20 while using it with Hadoop over just using CPU based computation.

If you have Hadoop experience please let me know as I would really appreciate the chance to ask some more questions!

For practical purposes the GTX Titan performance is equivalent to Tesla K20, if that helps. (Titan would be slightly faster, actually) Other good keyworkds would be GK110 or Kepler, which is what I used to find the link below.

I don’t have experience with that particular paradigm, but in case you haven’t seen this, it might be worth taking a look:

http://users.sdsc.edu/~jianwu/JianwuWang_files/A%20General%20Architecture%20Facilitating%20Data-Intensive%20Applications%20in%20Scientific%20Workflow%20Systems%20%28Jianwu-SDSC-Final%29.pdf

The TL;DR version is that they tried it and were successful in some cases… probably bad optimization.