Towards an automated classification of spreadsheets
Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheet...
Autor principal: | |
---|---|
Outros Autores: | , |
Formato: | conferencePaper |
Idioma: | eng |
Publicado em: |
2016
|
Assuntos: | |
Texto completo: | http://hdl.handle.net/1822/70215 |
País: | Portugal |
Oai: | oai:repositorium.sdum.uminho.pt:1822/70215 |
Resumo: | Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work. |
---|