On-device training

Context

The ability to adapt to the changing environment is a key aspect of the future integrated AI systems. Current approaches to AI at the edge only cover the optimization techniques of the inference stage of the neural networks, delegating the network training and optimization to the cloud. Despite the availability of computational resources, training in the cloud raises data privacy concerns and introduces the dependency on the internet connection. In contrast being able to adapt the neural network locally, on-device reduces the amount of data traffic and enables autonomous operations of the integrated AI systems. This work aims to explore the hardware architectures for on-device training.

Targets

In this work, the state-of-the-art in on-device DL training approaches will be researched. Based on these results the methodology for building accelerators for on-device neural network training will be designed and implemented. The performance of the accelerator will be evaluated on common datasets and the FPGA based platform. Finally, the FPGA-based solution will be compared to the server and embedded GPU target and based on these results the analysis of the applicability of proposed methodology will be conducted.

Requirements

Experience with hardware description languages like VHDL or Verilog
Experience with Deep Learning and one of the DL Frameworks like PyTorch or Keras/TensorFlow
Experience with Python and C++
Motivation and independent working style