Summary: | Bioinformatics is a new research field that aims at using computer technology to uncover biological knowledge of high relevance to the biotechnology community. An important research topic in Bioinformatics involves the exploration of vast amounts of biological and biomedical scientific literature (BioLiterature). Over the last few decades, text-mining systems have exploited this BioLiterature to reduce the time spent by researchers in its analysis. However, many of these systems rely on manually inserted domain knowledge, which is time-consuming. This thesis proposes an approach where domain knowledge is automatically acquired from publicly available biological databases, instead of using manually inserted domain knowledge. Based on this approach, innovative methods for retrieval, extraction and validation of information published in BioLiterature were developed and evaluated. The results show that the proposed approach is an efficient alternative to domain knowledge explicitly provided by experts. The new methods were also integrated into a system for automatic annotation of genes and proteins, which was successfully demonstrated in several applications
|