Summary: | Human Resources data modeling makes heavy use of graph like structure to hold objects (positions, contracts, jobs, cost centers, org units, etc.) and their relationships (an org unit reports to another org unit, a position belongs to an org unit and reports to another position, etc.). Traditionally, Human Resources' databases are relational, and querying graph data stored on a relational database is highly inefficient. The main objective of this dissertation is the comparison, in terms of performance and flexibility, between a relational and a graph database, when both have highly-connected data as it is the case with Human Resources' data. It was then required to model a database based on a real scenario of a company who deals these types of data on a daily basis. Queries were also formulated and they enabled the interaction with all stored entities and relationships between them while allowing the retrieval of performance results for both databases. Written queries were also compared, in terms of readability and expressability, with the objetive of determining which case is easier to understand what is being queried and how much simpler it is to formulate such query. When executing queries that traverse almost completely a data structure, the graph database performed much better than a relational database, even getting execution time values 200 times smaller than those obtained in SQL. As for hierarchical queries, only when we increased the amount of manager relationships per employee, by a factor of ten, were we able to see that Neo4j performed up to 3.5 times faster than SQL. These results regard databases with a maximum of one million employees and it is safe to believe that the differences in performance would grow larger for even bigger and more connected databases. Furthermore, it was concluded that the database modeling in graphs is more intuitive and immediate, and as for queries, its formulation is faster, simpler and more readable in the case of Cypher language when in contrast to its writing in SQL. This is due to the fact that the analysed Cypher queries had half the lines of their SQL query equivalents while also making use of ASCII characters for node and relationship representation when pattern matching. More concise queries result in a lower probability of errors occurring while also making it easier for new developers to catch up, understand and work with previously written queries.
|