Over the past few years, I’ve been toying with the idea of running database operations efficiently on GPUs. Rather than trying to implement something like SQL, I decided to implement Relational Algebra (RA), Relational algebra - Wikipedia , which I consider to be equally expressive and slightly more general in the sense that several database languages including SQL can be mostly or completely represented using RA.
I’m currently at the point where I have implementations that can beat sequential CPU versions[sup]1[/sup] by at least an order of magnitude for all RA operators, and I’m wondering whether there would be interest in a publication on this topic.
None of the algorithms are fundamentally new, the individual operators usually consist of several compound algorithms that are applied at different granularities where they are the most efficient from either an algorithmic perspective or a hardware perspective.
I’m asking because I am dividing my time between this, writing my thesis, and writing another paper on a runtime system. If there isn’t a lot of interest on this topic, I will probably push off writing it down for 6 months or a year. So please let me know if having a reference implementation and a description of the GPU-friendly algorithms used to implement RA operators would help you out in the short term (next 1-2 years). If I get enough responses, I’ll try to get this out by the end of the summer.
Footnote 1. I don’t compare against multi-core because efficient use of any parallel system would require parallel/vector implementations of these operations, which would probably follow the same form as my solutions. Furthermore data dependencies among elements in a vector would make SSE/AVX implementations a nightmare.
I don’t know much about RA or SQL. But if RA is equally powerful, then we need to think why SQL came up in the first place.
What was the motivating factor? Do those factors still exist? OR Can the GPU Power make these factors vanish?
SQL is a programming language whereas RA is a set of operations common to any language. SQL can be completely translated into RA operators. For example, the SQL statements:
SELECT column_name(s)
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name
are equivalent to the relational algebra inner-join operator:
So the idea is that a database system that used SQL as a language would include a compiler that converts SQL to RA, then instantiates CUDA kernels for each RA operator.
Cool, so it would be possible to perform computation heavy sql operations on GPUs. If someone manages to ports this to an open-source database (like mySQL or SQLite), it can probably beat the world fastest database systems:
This would be a longer term, but possible use of this work. There would be a huge engineering effort involved in actually building a compiler that could generate RA from SQL (although several proprietary compilers exist that do just this) and then going through NVCC to generate CUDA code for individual queries, and integrating this with a runtime system that maintained the database and dispatched queries against it.
I’ll count this as a vote for yes. If I get maybe 4-5 more of these I’ll consider writing the paper in a couple of months…
Although databases are not an area of research I’m particularly interested in, it does seem like in-memory databases are becoming more popular (or at least more hyped) in recent years. (Certainly the latest one to cross my news reader was VoltDB.) An in-memory database would be ideal use case for CUDA, since it the overhead of moving data from disk to the GPU would not even be a consideration.
Not necessarily if one can hide the disk-read latency with CUDA operations… Of course, CUDA operations have to be very expensive… (but that could be case for O(N^2) operations like (where column1=column2) )… No?
Hi guys !
I would be very interested to see a CUDA relational algebra implementation.
Btw, I’m developing CUDA SQL implementation - it is called alenka. Translates joins, groupbys and sorts into CUDA operations on columns of data ( so it a columnar language !)
Tested a few TPC-H queries on it and it is very fast. Data are not limited by GPU memory - I implemented external parallel algorithms.
Simple script for it looks like this (query TPCH-1) :
A := LOAD ‘lineitem.tbl’ USING (‘|’) AS (price{6}:float, qty{5}:int, discount{7}:float, tax{8}:float, rf{9}:varchar(1), lf{10}:varchar(1), shipdate{11}:int);
B := FILTER A BY shipdate <= 19980902;
D := SELECT rf AS rf, lf AS lf, SUM(qty) AS sum_qty, SUM(price) AS sum_base_price,
AVG(qty) AS avg_qty, AVG(price) AS avg_price, AVG(discount) AS avg_disc, COUNT(qty) AS cnt FROM B
GROUP BY rf, lf;
STORE D INTO ‘mytest.txt’ USING (‘|’);
I’m too looking for anyone interested in processing some heavy volumes of data using alenka.