Resumo: | Data science projects differ from the usual software development project because of their focus on experimentation. From trying different datasets, features and pre-processing tasks, algorithms and hyper parameters, a data science project relies on using a methodology to continuously improving the results. It is essential in this kind of project to be able to compare experiments, as well as to be able to reproduce the results associated with each experiment, so challenges related to data versioning, experiment tracking and guaranteeing the reproducibility of the results arise. There is also a necessity for data annotation in order to facilitate the development of supervised approaches and deployment of models in diverse environments. As of now there are several tools and platforms in development that are able to tackle some of these challenges however there is not a singular one that is aiming to solve them all, so the objective is to develop an integrated solution that provides data science teams with data annotation, collaboration, data versioning, experiment tracking, results reproducibility and models deployment. This will be achieved by drawing upon the already existing software and developing a middle layer between the user and some features from these platforms with the final objective being to resolve all the aforementioned challenges. A platform like this will allow data science teams to cooperate in an easier way as well as have a more efficient development of projects. This platform will be validated by testing its performance on a real project.
|