Summary: | The marine environment is subject of an increasing attention and constitutes a dynamic and multidimensional environment, that is very demanding for data collection and update, requiring large amounts of data. This data is unique since the campaigns where it is gathered are unrepeatable, due to the existence of a wide range of factors outside the researchers' control. Moreover, funding for these campaigns can be hard to come by. Nevertheless, these datasets are often underused as they are not available to all the involved stakeholders, or involve non-interoperable formats. Currently, metadata and some of the data are registered in paper-based forms, which are later digitalized or transcribed to spreadsheets, with researchers placing emphasis in the publications rather than in the management of the collected data. Data provenance often relates to soil, water and biological samples, as well as sensors, ship routes, photos, videos, sounds and laboratorial analyses. This problem is reflected in the large BIOMETORE project that involves several teams of marine researchers lead by Instituto Português do Mar e da Atmosfera. The ultimate goal of the BIOMETORE is the achievement and maintenance of the Good Environmental Status (GES) of the European Marine Waters. This project has eight campaigns, producing large amounts of marine data that should be organized in order to enable reusability by different stakeholders. On the other hand, the SeaBioData project, lead by INESC TEC, aims at developing a georeferenced database for the BIOMETORE, that can integrate all available data and implement existing standards for data interoperability, as specified in directives such as INSPIRE. Building the database is essential to allow uniform data access by local researchers as well as the international community and, at the same time, reduce the required effort allocated to data management, promoting faster and more accurate scientific results. In order to respect the INSPIRE directive, we adopted the data model from the OGC Sensor Observation Service. This data model has already been adopted by the international community, which ensures that the implementation relies on an interoperable approach. We surveyed available technological options, as well as the datasets supplied by IPMA. We decided on the open source implementation from 52º North, since it supports the majority of the SOS model's concepts and provides a native REST API and Web Services. The 52º North data model does not support the storage of all of the data required by IPMA for internal usage. One of the main data modelling challenges was to extend the existing data model without altering the original tables, thus centralizing the data, while ensuring that the model is compliant with existing services. We had to follow the metadata structure defined by SNIMAR, which implied the study and implementation of SNIMAR's metadata profile. We followed the Darwin Core standard, in order to store more details of the taxonomic rank of the species. Furthermore, we have extended the 52º North data model, in order to address the local needs of the BIOMETORE, since the SOS model simply stored data concerning the observations, disregarding information about entities such as teams, campaigns, users, documents or responsible parties.
|