Application specific architecture for hardware accelerating HOG-SVM to achieve high throughput on HD frames

Computer Vision is an emerging field with diverse applications which encompasses many algorithms with heavy computations. Histogram of Oriented Gradients-Support Vector Machine (HOG-SVM) is one such versatile algorithm used for object detection and image classification despite it's heavy comput...

ver descrição completa

Detalhes bibliográficos
Autor principal: Ranawaka, Piyumal (author)
Outros Autores: Ekpanyapong, Mongkol (author), Tavares, Adriano (author), Cabral, Jorge (author), Athikulwongse, Krit (author), Silva, Vítor Alberto Teixeira (author)
Formato: conferencePaper
Idioma:eng
Publicado em: 2019
Assuntos:
Texto completo:http://hdl.handle.net/1822/71344
País:Portugal
Oai:oai:repositorium.sdum.uminho.pt:1822/71344
Descrição
Resumo:Computer Vision is an emerging field with diverse applications which encompasses many algorithms with heavy computations. Histogram of Oriented Gradients-Support Vector Machine (HOG-SVM) is one such versatile algorithm used for object detection and image classification despite it's heavy computation load. Processing such an algorithm in real time with adequate throughput is a challenging task for a general purpose processor. Moreover, an embedded CPU with very limited processing power could least cater such heavy processing. Therefore our research in general focuses on developing application specific architectures for hardware acceleration of computer vision algorithms. This paper presents a continuation of a series of research to hardware accelerate HOG-SVM algorithm on FPGA. In this paper we mainly present the high performance application specific architecture for hardware acceleration of HOG-SVM which was successful in achieving a high throughput of 240fps on HD frames of size 1920x1080 which is a significant improvement of performance compared to previous research. On the other-hand, both hardware utilization and power consumption are minimized. A mechanism based around Block RAM (BRAM) structures and deep pipelining are used as the key architectural techniques of achieving high performance. The proposed design was deployed on Zynq 7000 FPGA platform which contains a hardwired ARM CPU along with the programmable FPGA fabric. The accelerator is deployed on the FPGA and integrated with the ARM CPU using AXI memory interfaces. A hardware thread model and bare-metal device drivers were developed which encapsulate the behavior of the accelerator as a hardware thread to the applications running on the ARM CPU.