Resumo: | Deep learning (DL) is widely used nowadays, with several applications in image classification and object detection. Among many of these applications is the use of Convolutional Neural Networks (CNNs) whose operation is: for a given input (image) and output (label/class), generate representations that define and allow to distinguish different kinds of objects. Neural Networks are computationally demanding, taking hours to train. Convolutional Neural Networks are even more demanding since their input data are usually images – a rich data type that holds a lot of information. The fast evolution in Computer Vision, using deep learning techniques, and computing power recently allowed to train CNNs which can classify images with high precision. In car classifieds websites images are one of the most important types of content. However, until today, little knowledge/metadata is produced from such images. In order to insert an advert in the platform, the user must upload an image of the car for sale and fill a certain number of fields, among them the vehicle category, the color of the car and its respective make, model and version. In this dissertation, CNNs are used for the recognition of the make, model and version of cars where transfer learning and fine-tuning are two approaches used for transferring the knowledge learned in one task and adapting it to another. We extend the work to also validate the efficacy of these neural networks on the tasks of vehicle category and cars’ color recognition. We pretend to validate how CNNs behave in these different tasks. Approaches like background removal and data augmentation are explored for reducing overfitting. We collected one of the largest datasets to date for the task of make, model and version recognition of cars, composed of 1.2 million images belonging to 790 labels.The results obtained in the scope of this dissertation set a new state-of-the-art performance for this type of task (accuracy of 92.7% on an ensemble method) considering the number of classes to classify and the number of images used. It is demonstrated the efficacy of the recent advances in CNN architectures in fine-grained classification where intra-class variation is small and viewpoint variation is high, when a largescale dataset is used.
|