Investment Banking Data Analysis - Screening: A Machine Learning Practical Case Study

The author builds a Screening case study in a Financial Sofware House, to build a Proof of Concept (PoC) for a new Screening process, using Machine Learning Algorithms, to add it to their future product offer. The dissertation presents the state-of-the-art in anomaly and suspicion detection. Concern...

ver descrição completa

Detalhes bibliográficos
Autor principal: Bruno Viana do Nascimento (author)
Formato: masterThesis
Idioma:eng
Publicado em: 2021
Assuntos:
Texto completo:https://hdl.handle.net/10216/135664
País:Portugal
Oai:oai:repositorio-aberto.up.pt:10216/135664
Descrição
Resumo:The author builds a Screening case study in a Financial Sofware House, to build a Proof of Concept (PoC) for a new Screening process, using Machine Learning Algorithms, to add it to their future product offer. The dissertation presents the state-of-the-art in anomaly and suspicion detection. Concerns from the state-of-the art are presented and requirements defined. An exposition of techniques and measurements to be used in this case study, including H2O AutoML, which is an algorithm that generates and evaluates several options of models to create an option that is quickly functional, with low tuning on the hyperparameters of the data used. The author also uses H2O Driverless AI as a means to help interpretability, whose options will also be presented together with machine learning techniques and measurements as libraries from such subject are mostly available in such product. Further, the experiment states 3 questions to be answered about AutoML precision and recall performance, its optimality, interpretability methods' impact to whom have never seen them before, and a proposed solution, not complete, but with working steps enough to be seen as a functional MVP, before the data entry step for the Screening system. In the end, the author presents threats to validity for this dissertation, and concludes on how AutoML attends a minimum reasonable performance, but not optimal, even against its own algorithms; a rank on how well the Screening development received the interpretation methods, and direction for future work, related on transforming the PoC and the proposed solution into a real and complete implementation, where all the possible algorithms here chosen can be used, or even combined among them, having interpretability present so the final customer does not need to blindly trust the machine learning choices.