Hardware designs for machine learning acceleration
FADES (Fused architecture for DEnse and Sparse matrices) hardware architecture is optimized for sparse and dense matrix processing in TensorFlow Lite and compatible with embedded-heterogeneous devices that integrate CPU and FPGA resources. It offers multiple configuration options that trade-off parallelism and complexity and uses a dataflow model to create four stages that read, compute, scale and write results. All stages are designed to support TensorFlow Lite operations including asymmetric quantized activations, column-major matrix write, per-filter/per-axis bias values and current scaling specifications.
Learn about accelerating Tensorflow Lite inference engine in heterogeneous devices
J. Nunez-Yanez,, 'Fused Architecture for Dense and Sparse Matrix Processing in TensorFlow Lite,',in IEEE Micro, 2022, doi: 10.1109/MM.2022.3196705.
Mohammad Hosseinabady,Jose Nunez-Yanez,A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level Synthesis', IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 6, pp. 1272-1285, June 2020, doi: 10.1109/TCAD.2019.2912923.
Jose Nunez-Yanez, Mohammad Hosseinabady,Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks', Array,Volume 12,2021, doi: 10.1016/j.array.2021.100101.
Check out the github site