Summary: | Agricultural robots need image processing algorithms, which should be reliable under all weather conditions and be computationally efficient. Furthermore, several limitations may arise, such as the characteristic vineyard terrain irregularities or overfitting in the training of neural networks that may affect the performance. In parallel with this, the evolution of Deep Learning models became more complex, demanding an increased computational complexity. Thus, not all processors can handle such models efficiently. So, developing a system with a real-time performance for low-power processors becomes demanding and is nowadays a research and development challenge because there is a lack of real data sets annotated and expedite tools to support this work. To support the deployment of deep-learning technology in agricultural robots, this dissertation presents a public VineSet dataset, the first public large collection of vine trunk images. The dataset was built from scratch, having a total of 9481 real image frames and providing the vine trunks annotations in each one of them. VineSet is composed of RGB and thermal images of 5 different Douro vineyards, with 952 initially collected by AgRob V16 robot, and others 8529 image frames resulting from a vast number of augmentation operations. To check the validity and usefulness of this VineSet dataset, in this work is presented an experimental baseline study, using state-of-the-art Deep Learning models together with Google Tensor Processing Unit. To simplify the task of augmentation in the creation of future datasets, we propose an assisted labelling procedure - by using our trained models - to reduce the labelling time, in some cases ten times faster per frame. This dissertation presents preliminary results to support future research in this topic, for example with VineSet leads possible to train (by transfer learning procedure) existing deep neural networks with Average Precision (AP) higher than 80% for vineyards trunks detection. For example, an AP of 84.16% was achieved for SSD MobileNet-V1. Also, the models trained with VineSet present good results in other environments such as orchards or forests. Our automatic labelling tool proves this, reducing annotation time by more than 30% in various areas of agriculture and more than 70% on vineyards. In this dissertation, we also propose the segmentation of the vine trunks. Firstly, object detection models were used together with VineSet to perform the trunk segmentation. To evaluate the performance of the different models, a script that implements some metrics of semantic segmentation was built. The results showed that the object detection models trained with VineSet were not only suitable for trunk detection but also trunk segmentation. For example, a DICE Similarity Index (DSI) of 70.78% was achieved for SSD MobileNet-V1. Finally, semantic segmentation was also briefly approached. A subset of the images of VineSet was used to train several models. Results show that semantic segmentation can substitute DL-based object detection models for pixel-based classification if a proper training set is provided. In this way, all the work done will allow the integration of edge-AI algorithms in SLAM, like Vine-SLAM, which will serve for the localisation and mapping of the robot, through natural markers in the vineyards.
|