FURI | Spring 2026

RTL Implementation and Verification of an FP4 Systolic Array for Edge AI Workloads

Data icon, disabled. Four grey bars arranged like a vertical bar chart.

Today, AI models require enormous computational resources for training and inference, making efficient deployment more difficult. This research project implements a systolic array using 4-bit floating-point (FP4) arithmetic, performing matrix multiplication with significantly reduced hardware complexity compared to higher-precision data types. The design supports dual FP4 formats with INT8 accumulation. This energy-efficient matrix multiplication hardware will advance edge AI, enabling AI capabilities on devices where power and size constraints are limited.

Student researcher

Daniel J Pace-Farr

Computer systems engineering

Hometown: Phoenix, AZ, United States

Graduation date: Spring 2027