Practical embedded AI on FPGA: A compact CNN achieving high throughput and low power for digit recognition

Pham Van Khoa; La Cong Loc; Nguyen Duc Trong; Huynh Minh Quy; Ho Gia Huyen; Vo Quang Huy; Pham Ba Loc; Huynh Thi Thu Hien

doi:10.52111/qnjs.2026.20304

Practical embedded AI on FPGA: A compact CNN achieving high throughput and low power for digit recognition

Authors: Pham Van Khoa; La Cong Loc; Nguyen Duc Trong; Huynh Minh Quy; Ho Gia Huyen; Vo Quang Huy; Pham Ba Loc; Huynh Thi Thu Hien

Journal: Quy Nhon University Journal of Science

Published: 2026/06/28

Volume/Issue: Vol. 20, Issue 3

Pages: 43-54

DOI: https://doi.org/10.52111/qnjs.2026.20304

Abstract

This work presents the design and deployment of a Convolutional Neural Network on an SoC–FPGA platform for handwritten digit classification using the MNIST dataset. The goal is a compact, efficient FPGA-based CNN accelerator with fewer than 1,000 parameters that integrates seamlessly with the ARM processor on the PYNQ-Z2 board via DMA and AXI interfaces. The accelerator is realized at the register-transfer level and undergoes simulation, synthesis, and resource-focused optimization while preserving inference accuracy. On 10,000 MNIST test images, the system attains 91.28% accuracy—about 5 percentage points below a CPU implementation on dual ARM Cortex-A9 cores (96.26%)—but delivers a 7–8× speedup and a 36% reduction in power consumption. The design highlights effective parallelization and pipelining of convolution operations directly on the FPGA, achieving low resource usage and power draw. These results provide a practical foundation for real-time embedded AI applications—such as character recognition, image monitoring, intelligent IoT systems, and edge computing—on SoC–FPGA platforms .

Practical embedded AI on FPGA: A compact CNN achieving high throughput and low power for digit recognition

Abstract

Links