Sebastian Zielinski
High-level Synthesis of Attention-based Networks for Reconfigurable Hardware

Abstract
This thesis dives into the viability of deploying light-weight attention-based networks on reconfigurable hardware, namely field-programmable gate arrays (FPGA). As the self-attention based transformer architecture enjoys huge popularity for natural language processing (NLP) tasks since its appearance in 2017, in recent years transformers have been adapted to the computer vision (CV) domain. However, due to their size, these networks are difficult to deploy on resource constrained edge devices. Hybrid light-weight vision transformer architectures (ViT) have been proposed, which include convolutional properties in order to reduce size. Existing ViT hardware implementations use slow external memory, in order to deal with the high memory requirements of the architecture. In this thesis, the viability of deploying the light-weight ViT network MobileViT using fast on-chip memory is examined. Furthermore, optimization techniques are applied to the FPGA implementation of MobileViT, which is created in this thesis, making use of the parallelizing hardware advantages of FPGAs.