Not A Paper - Research Snippets that were not published.


PhD Thesis: Finding the Right Processor for the Job - Co-Processors in a DBMS

Today, more and more Database Management Systems (DBMSs) keep the data com- pletely in memory during processing or even store the entire database there for fast access. In such system more algorithms are limited by the capacity of the processor, be- cause the bottleneck of Input/Output (I/O) to disk vanished. At the same time Graphics Processing Units (GPUs) have exceeded the Central Processing Unit (CPU) in terms of processing power. Research has shown that they can be used not only for graphic pro- cessing but also to solve problems of other domains. However, not every algorithm can be ported to the GPU’s architecture with benefit. First, algorithms have to be adapted to allow for parallel processing in a Single Instruction Multiple Data (SIMD) style. Sec- ond, there is a transfer bottleneck because high performance GPUs are connected via PCI-Express (PCIe) bus. In this work we explore which tasks can be offloaded to the GPU with benefit. We show, that query optimization, query execution and application logic can be sped up under certain circumstances, but also that not every task is suitable for offloading. By giving a detailed description, implementation, and evaluation of four different examples, we explain how suitable tasks can be identified and ported. Nevertheless, if there is not enough data to distribute a task over all available cores on the GPU it makes no sense to use it. Also, if the input data or the data generate during processing does not fit into the GPU’s memory, it is likely that the CPU produces a result faster. Hence, the decision which processing unit to use has to be made at run-time. It is depending on the available implementations, the hardware, the input parameters and the input data. We present a self-tuning approach that continually learns which device to use and automatically chooses the right one for every execution.
Read More

GPUs for Query Execution: Ocelot, gpudb and our approach

On ADBIS we published a paper about query execution on GPUs. A week earlier two papers with the same general topic were presented at VLDB. I'm comparing the approaches and the findings. In my opinion, the Ocelot approach is the right one if we see heterogneous processors in the future. My conclusion for external GPUs connected via PCIe is different: although high end GPUs can outperform comparable CPUs often, not always, they are not suitable for query execution in general because of the data transfer bottleneck. But the paper about gpudb shows different results.
Read More

PGStrom for Postgres

PGStrom is a tool to execute a special type of SQL query on the GPU. The developers(?) claim to achieve a speedup of 50x compared to standard Postgres because they use the GPU. I do not doubt that PGStrom is faster than Postgres, but this is not because of the GPU. In this post I show that a multi-core CPU can process this queries as fast as the GPU. For the test-case described on the PGStrom wiki page the CPU is even a bit faster.
Read More

Get the feeling for data-intensive problems

Before you consider porting an algorithm to the GPU you should check its characteristics. You may have often heard that it should be calculation/computation-intensive. In this post I try to describe what that means and be more specific. On the example of a simple re-encoding algorithm I show that this is not the whole truth. Data intensive algorithms can benefit from the high bandwidth of a GPU's RAM, but only if they fulfil certain conditions: they should support streaming and work in a massively parallel fashion.
Read More

Processing strings on the GPU

In contrast to most other fields, where GPU are used, database management systems usually require string processing. Our experiments show that data transfer is the major problem and that dictionary encoding is not a general solution as often suggested.
Read More

About my topic: Co-Processors in DBMS

My topic is on how we can use Co-Processors such as graphic cards in a database management system. I'm going to discuss the challenges about this topic and some background.
Read More
Click here to enable comments powered by Disqus (third-party service - needs JavaScript)