FMKe: A realistic benchmark for key-value stores

Standard benchmarks are essential tools to evaluate and compare database management systems in terms of relevant semantic properties and performance. They provide the means to evaluate a system with workloads that mimic real applications. Although a number of realistic benchmarks already exist for r...

Full description

Bibliographic Details
Main Author: Tomás, Gonçalo Alexandre Pinto (author)
Format: masterThesis
Language:eng
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10362/59503
Country:Portugal
Oai:oai:run.unl.pt:10362/59503
Description
Summary:Standard benchmarks are essential tools to evaluate and compare database management systems in terms of relevant semantic properties and performance. They provide the means to evaluate a system with workloads that mimic real applications. Although a number of realistic benchmarks already exist for relational database systems, the same cannot be said for NoSQL databases. This latter class of data storage systems has become increasingly relevant for geo-distributed systems, and this has led developers and researchers to either rely on benchmarks that do not model realistic workloads or to adapt the aforementioned benchmarks for relational databases to work for NoSQL databases, in a somewhat ad-hoc fashion. Since these benchmarks assume an isolation and transactional model in the database, they are inherently inadequate to evaluate NoSQL databases. In this thesis, we propose a new benchmark that addresses the lack of realistic evaluation tools for distributed key-value stores. We consider a workload that is based on information we have acquired about a real world deployment of a large-scale application that operates over a distributed key-value store, that is responsible for managing patient prescriptions at a nation-wide level in Denmark. We design our benchmark to be extensible to a wide range of distributed key-value storage systems and some relational database systems with minimal effort for programmers, which only need to design and implement specific data storage drivers to benchmark different alternatives. We further present a study on the performance of multiple database management systems in different deployment scenarios.