Sebastian Zielinski
High-level Synthesis of Attention-based Networks for Reconfigurable Hardware
Abstract
This thesis dives into the viability of deploying light-weight attention-based networks
on reconfigurable hardware, namely field-programmable gate arrays (FPGA). As the
self-attention based transformer architecture enjoys huge popularity for natural language
processing (NLP) tasks since its appearance in 2017, in recent years transformers
have been adapted to the computer vision (CV) domain. However, due to their size,
these networks are difficult to deploy on resource constrained edge devices. Hybrid
light-weight vision transformer architectures (ViT) have been proposed, which include
convolutional properties in order to reduce size. Existing ViT hardware implementations
use slow external memory, in order to deal with the high memory requirements
of the architecture. In this thesis, the viability of deploying the light-weight ViT network
MobileViT using fast on-chip memory is examined. Furthermore, optimization
techniques are applied to the FPGA implementation of MobileViT, which is created in
this thesis, making use of the parallelizing hardware advantages of FPGAs.