Summary: | The amount of information present in Portuguese archives has been increasing exponentially over the years. At the moment, the majority of the data is already available to the public in digital format, however, the records are stored as unstructured text, making its data processing challenging. In this way, it is intended to perform a semantic interpretation of these documents through the identification and classification of Named Entities. For this purpose, the use of Natural Language Processing tools is proposed, training Machine Learning algorithms capable of accurately recognizing entities in this context. Finally, it is presented a Web platform that implements all the models trained in this paper, as well as some tools that gave support to the entity extraction process.
|