Object detection for augmented reality applications
Object detection in digital image (2D) is a widely researched area due to its countless applications. The evolution of the performance of the algorithms developed and the growth of new approaches is due to the integration with machine learning, namely the use of artificial neural networks in deep le...
Main Author: | |
---|---|
Format: | masterThesis |
Language: | eng |
Published: |
2022
|
Subjects: | |
Online Access: | http://hdl.handle.net/10773/35069 |
Country: | Portugal |
Oai: | oai:ria.ua.pt:10773/35069 |
Summary: | Object detection in digital image (2D) is a widely researched area due to its countless applications. The evolution of the performance of the algorithms developed and the growth of new approaches is due to the integration with machine learning, namely the use of artificial neural networks in deep learning. The most commonly used methods are R-CNN (Region-based Convolutional Neural Networks) plus it’s variants (Fast R-CNN and Faster R-CNN) and for live feed applications it is used YOLO (You Only Look Once). Although a vast number of researches are made in 2D object detection a common problem that needs more attention is the pose estimation of the bounding boxes returned in the process of detection and classification of objects. The problem of the absence of pose estimation in the camera relatively to the scene in which it is analyzed has an impact in the bounding box position, not presenting a perfect match with the object when it is not paralleled or aligned relatively to the optical camera plain. The importance of correcting the pose estimation is justified by allowing an overlap of text using augmented reality. This application has a lot of benefits when used for aiding technicians while troubleshooting some equipments or in learning how to do difficult tasks. Three solutions are explored in this dissertation to try to solve this problem. The first uses information from external sensors for the camera in a mobile device giving the algorithm the information of the mobile device’s position in order to make the needed correction. The second method no longer involves external sensors. Instead it needs previous knowledge of the usual dimension ratios for the bounding box for each class to correct said box until the ratio is close to the predicted values. The third method requires the previous knowledge of the local features for each object class in order to predict if the object is aligned or not to the predicted bounding box and make adjustments until the ratio provided by the local features is within a threshold. After the correction it is overlapped text using augmented reality. |
---|