Summary: | Since their existence, computers have been a great asset to mankind, primarily because of their ability to perform specific tasks at speeds humans could never compete with. However, there are many tasks that humans consider easy which are quite difficult for computers to perform. For instance, a human can be shown a picture of an automobile and a bicycle and then be able to easily discriminate between future automobiles and bicycles. For a computer to perform such a task using current algorithms, typically, it must first be shown a large number of images of the two classes, with varying features and positions, and then spend a great deal of time learning to extract and identify features so that it can successfully distinguish between the two. Nevertheless, it is still able to perform the task (eventually) and, after the computational training is complete, would be able to classify images of automobiles and bicycles faster, and sometimes better, than the human. Nonetheless, the real out-performance displayed by the human is when another class is added to the mix, e.g., “aeroplane”. The human can immediately add aeroplanes to its set of known objects, whereas a computer would typically have to go almost back to the start and re-learn all the classes from scratch. The reason the network requires to be retrained is because of a phenomenon named Catastrophic Forgetting, where the changes made to the system during the acquisition of new knowledge bring about the loss of previous knowledge. In this dissertation, we explore Continual Learning, where we propose a way to deal with Catastrophic Forgetting by making a framework capable of learning new information without having to start from scratch and even “improving” its knowledge on what it already knows. With the above in mind, we implemented a Modular Dynamic Neural Network (MDNN) framework, which is primarily made up of modular sub-networks and progressively grows and re-arranges itself as it learns continuously. The network is structured in such a way that its internal components function independently from one another so that when new information is learned, only specific sub-networks are altered in a way that most of the old information is not forgotten. The network is divided into two main blocks, the feature extraction component which is based on a ResNet50 and the modular dynamic classification sub-networks. We have, so far, achieved results below those of the state of the art using ImageNet and CIFAR10, nevertheless, we demonstrate that the framework can meet its initial purpose, which is learning new information without having to start from scratch.
|