Design & Implementation of Embedded Hardware Acceleration for Transformers


Design & Implementation of Embedded Hardware Acceleration for Transformers

<Text wird generiert, bitte warten...>
Context

Transformers are widely adopted in Natural Language Processing (e.g. chatbots and autocompletions) and Computer Vision often outperforming Convolutional Neural Networks. However, they are still a challenge for many computer systems today due to their computational complexity. FleXNNgine, ITIV's existing accelerator, so far only supports CNNs and the purpose of this master thesis is to extend its capabilities for transformers by supporting flexible matrix multiplications for self-attention mechanisms.

Targets
  • In this work state-of-the-art (SOTA) of Transformer accelerators will be researched with a focus towards systolic array architectures
  • FleXNNgine will be extended to perform matrix multiplications

  • The performance of the design will be evaluated on common datasets as well as against SOTA models.

Requirements
  • Interest in the design of hardware accelerators and SoCs
  • Knowledge of VHDL/Verilog as well as C


[1] Fabian Lesniak, Annina Gutermann, Tanja Harbaum, and Jürgen Becker. 2024. Enhanced Accelerator Design for Efficient CNN Processing with Improved Row-Stationary Dataflow. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI '24). Association for Computing Machinery, New York, NY, USA, 151-157. https://doi.org/10.1145/3649476.3658737