Lite-CNN: a high-performance architecture to execute CNNs in low density FPGAs

Due to the computational complexity of Convolutional Neural Networks (CNNs), high performance platforms are generally considered for their execution. However, CNNs are very useful in embedded systems and its execution right next to the source of data has many advantages, like avoiding the need for d...

Full description

Bibliographic Details
Main Author: Véstias, Mário (author)
Other Authors: Duarte, Rui (author), De Sousa, Jose (author), Cláudio de Campos Neto, Horácio (author)
Format: conferenceObject
Language:eng
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10400.21/8903
Country:Portugal
Oai:oai:repositorio.ipl.pt:10400.21/8903
Description
Summary:Due to the computational complexity of Convolutional Neural Networks (CNNs), high performance platforms are generally considered for their execution. However, CNNs are very useful in embedded systems and its execution right next to the source of data has many advantages, like avoiding the need for data communication. In this paper, we propose an architecture for CNN inference (Lite-CNN) that can achieve high performance in low density FPGAs. Lite-CNN adopts a fixed-point representation for both neurons and weights, which was already shown to be sufficient for most CNNs. Also, with a simple and known dot product reorganization, the number of multiplications is reduced to half. We show implementation results for 8 bit fixed-point in a ZYNQ7020 and extrapolate for other larger FPGAs. Lite-CNN achieves 410 GOPs in a ZYNQ7020.