Insights on neural networks

The many advances that machine learning, and especially its workhorse, deep learning, has provided to our society are undeniable. However, there is an increasing feeling that the field has become little understood, with researchers going as far as to make the analogy that it has developed into a for...

ver descrição completa

Detalhes bibliográficos
Autor principal: Jesus, Ricardo Jorge Bastos Cordeiro de (author)
Formato: masterThesis
Idioma:eng
Publicado em: 2020
Assuntos:
Texto completo:http://hdl.handle.net/10773/29562
País:Portugal
Oai:oai:ria.ua.pt:10773/29562
Descrição
Resumo:The many advances that machine learning, and especially its workhorse, deep learning, has provided to our society are undeniable. However, there is an increasing feeling that the field has become little understood, with researchers going as far as to make the analogy that it has developed into a form of alchemy. There is the need for a deeper understanding of the tools being used since, otherwise, one is only making progress in the dark, frequently relying on trial and error. In this thesis, we experiment with feedforward neural networks, trying to deconstruct the phenomenons we observe, and finding their root cause. We start by experimenting with a synthetic dataset. Using this toy problem, we find that the weights of trained networks show correlations that can be well-understood by the structure of the data samples themselves. This insight may be useful in areas such as Explainable Artificial Intelligence, to explain why a model behaves the way it does. We also find that the mere change of the activation function used in a layer may cause the nodes of the network to assume fundamentally different roles. This understanding may help to draw firm conclusions regarding the conditions in which Transfer Learning may be applied successfully. While testing with this problem, we also found that the initial configuration of weights of a network may, in some situations, ultimately determine the quality of the minimum (i.e., loss/accuracy) to which the networks converge, more so than what could be initially suspected. This observation motivated the remainder of our experiments. We continued our tests with the real-world datasets MNIST and HASYv2. We devised an initialization strategy, which we call the Dense sliced initialization, that works by combining the merits of a sparse initialization with those of a typical random initialization. Afterward, we found that the initial configuration of weights of a network “sticks” throughout training, suggesting that training does not imply substantial updates — instead, it is, to some extent, a fine-tuning process. We saw this by training networks marked with letters, and observing that those marks last throughout hundreds of epochs. Moreover, our results suggest that the small scale of the deviations caused by the training process is a fingerprint (i.e., a necessary condition) of training — as long as the training is successful, the marks remain visible. Based on these observations and our intuition for the reasons behind them, we developed what we call the Filter initialization strategy. It showed improvements in the training of the networks tested, but at the same time, it worsened their generalization. Understanding the root cause for these observations may prove to be valuable to devise new initialization methods that generalize better.