On-device Training and Neural Architecture Search Hardware Accelerator

Context

The ability to adapt to the changing environment is a key aspect of the future integrated AI systems. Current approaches to AI at the edge only cover the optimization techniques of the inference stage of the neural networks, delegating the network training and optimization to the cloud. Despite the availability of computational resources, training in the cloud raises data privacy concerns and introduces the dependency on the internet connection. In contrary being able to adapt the neural network locally, on-device reduces the amount of data traffic and enables autonomous operations of the integrated AI systems. This work aims to explore the hardware architectures for on-device training and neural architecture search.

Goals

In this work the state-of-the-art in on-device DL training and NAS approaches will be researched. Based on these results the methodology for building hybrid accelerators for neural network training and architecture search will be developed. The performance of the accelerator will be evaluated on the classical computer vision datasets and the FPGA based platform. Finally, the FPGA-based solution will be compared to the server and embedded GPU target and based on these results the analysis of the applicability of proposed methodology will be conducted.

Requirements

Experience with hardware description languages like VHDL or Verilog
Experience with Deep Learning and one of the DL Frameworks like PyTorch or Keras/TensorFlow
Experience with Python and C++