Algoritmos incrementais para previsão de variáveis quantitativas usando dados de chamadas móveis

The information flow that circulates nowadays in both local and transnational data networks is huge. That information originates, for example, in the media or as the result of users' everyday activities. The mass storage of information in massive databases, and at a increasing rate, creates gro...

ver descrição completa

Detalhes bibliográficos
Autor principal: Marta Carolina Madeira Bebiano (author)
Formato: masterThesis
Idioma:por
Publicado em: 2015
Assuntos:
Texto completo:https://repositorio-aberto.up.pt/handle/10216/83512
País:Portugal
Oai:oai:repositorio-aberto.up.pt:10216/83512
Descrição
Resumo:The information flow that circulates nowadays in both local and transnational data networks is huge. That information originates, for example, in the media or as the result of users' everyday activities. The mass storage of information in massive databases, and at a increasing rate, creates growing difficulties for the organizations in how this information should be handled, but at the same time, it contains an hidden potential, often misunderstood and poorly acknowledged. With the emergence of this phenomenon of the growing accumulation of data, new problems and challenges have also arisen. How can one identify significant data, useful information and patterns of value amongst seemingly irrelevant information?In most areas information is constantly beeing stored, and, in this context, a new area of investigation, the Data Mining, has evolved over the last three decades.Telecommunication enterprises in particular have at their disposal millions of records of precious information which they could use to develop new services for their clients, that is, if they could find a clear way to use it properly. With that information they could perform several tasks like predicting the length of a call from the moment it begins, which is the goal of this study. This work intended to contribute to the knowledge of how to transform data coming from a big database into relevant information for businesses. Ways to add more value and knowledge to the available information, were searched for in order to boost businesses' profits.Any study in this area is rapidly confronted with a great difficulty, the analysis of an enormous amount of data, a problem of computer capacity in data processing. Difficulty lies not only in identifying useful hidden information but also in the necessity of processing that information in a reasonable ammount of time. Therefore the main goal of this project is to study and compare incremental algorithms for the prediction of the length of a call from the moment it begins, and identifying the best algorithms for this regression problem and included preprocessing tasks. It is a problem of supervised learning in which regression techniques are used.The following methods are used: distance based methods, k-Nearest Neighbor method, search based methods - decision trees, VFDT - Very Fast Decision Tree, and methods for heterogeneous and homogeneous ensembles, where several models are combined to make the best decisions. At the end of the study there will be used evaluation methods which will allow for the comparisso of the algorithms' efficiency. It is expected that through the results one can identify which method is the most efficient in predicting the length of a call, the expected precision for the prediction and which confidence interval the results fall within.