M. V. Maceiras, M. Waqar Azhar and P. Trancoso, “VSA: A Hybrid Vector-Systolic Architecture,” 2022 IEEE 40th International Conference on Computer Design (ICCD), Olympic Valley, CA, USA, 2022, pp. 368-376, doi: 10.1109/ICCD56317.2022.00061

In order to deliver high performance efficiently, modern processors include dedicated hardware to accelerate different application domains. For example, several recent processors include dedicated Machine Learning (ML) accelerators. However, while adding dedicated hardware improves efficiency compared to general-purpose CPUs, it also requires a larger area, making it unfeasible for smaller devices. Therefore, exploring ways to use the existing hardware for different functionalities becomes desirable in those setups. In this work, we explore the reuse of the components in a Vector Processing Unit (VPU) to offer the functionality of a Systolic Array (SA) for General Matrix Multiplication (GEMM), a kernel extensively used in machine learning, big data, and scientific computing. This hybrid Vector-Systolic Architecture (VSA) can thus support Single Instruction Multiple Data (SIMD) instruction extensions with the VPU functionality and efficiently compute GEMM with the SA functionality. We present an implementation of VSA as a RISC-V co-processor that adds minimal hardware overhead of less than 0.1% compared to a baseline RISC-V implementation with a VPU. In our evaluation using different Deep Neural Network (DNN) models, VSA shows a speedup of up to 3.5x and a reduction in energy consumption of up to 70%.

Download Here