Homograph ambiguity resolution in front-end design for portuguese TTS systems

In this paper, a module for homograph disambiguation in Portuguese Text-to-Speech (TTS) is proposed. This module works with a part-of-speech (POS) parser, used to disambiguate homographs that belong to different parts-of-speech, and a semantic analyzer, used to disambiguate homographs which belong t...

Full description

Bibliographic Details
Main Author: Braga, Daniela (author)
Other Authors: Coelho, Luís (author), Resende Jr., Fernando Gil V. (author)
Format: conferenceObject
Language:eng
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10400.22/7628
Country:Portugal
Oai:oai:recipp.ipp.pt:10400.22/7628
Description
Summary:In this paper, a module for homograph disambiguation in Portuguese Text-to-Speech (TTS) is proposed. This module works with a part-of-speech (POS) parser, used to disambiguate homographs that belong to different parts-of-speech, and a semantic analyzer, used to disambiguate homographs which belong to the same part-of-speech. The proposed algorithms are meant to solve a significant part of homograph ambiguity in European Portuguese (EP) (106 homograph pairs so far). This system is ready to be integrated in a Letter-to-Sound (LTS) converter. The algorithms were trained and tested with different corpora. The obtained experimental results gave rise to 97.8% of accuracy rate. This methodology is also valid for Brazilian Portuguese (BP), since 95 homographs pairs are exactly the same as in EP. A comparison with a probabilistic approach was also done and results were discussed.