Resumo: | Standard benchmarks are essential tools to evaluate and compare database management systems in terms of relevant semantic properties and performance. They provide the means to evaluate a system with workloads that mimic real applications. Although a number of realistic benchmarks already exist for relational database systems, the same cannot be said for NoSQL databases. This latter class of data storage systems has become increasingly relevant for geo-distributed systems, and this has led developers and researchers to either rely on benchmarks that do not model realistic workloads or to adapt the aforementioned benchmarks for relational databases to work for NoSQL databases, in a somewhat ad-hoc fashion. Since these benchmarks assume an isolation and transactional model in the database, they are inherently inadequate to evaluate NoSQL databases. In this thesis, we propose a new benchmark that addresses the lack of realistic evaluation tools for distributed key-value stores. We consider a workload that is based on information we have acquired about a real world deployment of a large-scale application that operates over a distributed key-value store, that is responsible for managing patient prescriptions at a nation-wide level in Denmark. We design our benchmark to be extensible to a wide range of distributed key-value storage systems and some relational database systems with minimal effort for programmers, which only need to design and implement specific data storage drivers to benchmark different alternatives. We further present a study on the performance of multiple database management systems in different deployment scenarios.
|