Resumo: | In recent years, the development of technologies based on artificial intelligence has been evolving at an unprecedented pace. Some of these technologies are computer vision and Deep Learning, whose utilization has been extended to the most varied sectors, such as in industrial applications, in healthcare, safety, marketing and individual use applications. The main focus is on the optimization and automation of processes that would otherwise prove to be quite time consuming or even impossible to execute. With the objective of detecting and counting the number of individuals, as well as quantifying crowds in indoor and public spaces through video surveillance cameras, the present Dissertation, carried out in line with the Safe Cities project, - a partnership between Universidade do Porto and Bosch Security Systems, S.A. - proposes a solution that takes advantage of computer vision technologies and Deep Learning methods. The bibliographic review of the state of the art allowed to identify the main characteristics of object detectors by computer vision, further enhancing the choice of the pre-trained model YOLOv4, which was designed to perform generic object detection. This object detector stands out for its high inference speed, suitable for processing video streams in real time, and for its precision in object detection. In addition, the model is based on the use of a neural network to perform the extraction of various features in images or videos, and then performs the classification and detection of the objects present in these contents. The detections produced by the algorithm were filtered so that only pedestrian detections were returned, as well as their respective numerical count. The developed algorithm also implements an adjustable criterion by the user to determine from which level a crowd will be considered, since this parameter can vary depending on the angle, the installation position of the video surveillance camera and the scenario being monitored. In order to validate the developed algorithm, it was implemented directly in the counting and detection of people in sets of images (i.e., datasets), in video excerpts and by capturing video in real time through video surveillance cameras from Bosch, having a qualitative analysis of the algorithm performance been performed according to the variation of several parameters in the input data, such as brightness, image quality, color scheme, among others.
|