Knowledge Distillation based on Monoclass Teachers for Edge Infrastructure

Cédric Maron1,2, Virginie Fresse1, Karynn Morand2 and Freddy Havart2, 1Laboratoire Hubert Curien, France, 2SEGULA Technologie, France; Cédric Maron1,2, Virginie Fresse1, Karynn Morand2 and Freddy Havart2, 1Laboratoire Hubert Curien, France, 2SEGULA Technologie, France

Knowledge Distillation based on Monoclass Teachers for Edge Infrastructure

Authors

Cédric Maron^1,2, Virginie Fresse¹, Karynn Morand² and Freddy Havart², ¹Laboratoire Hubert Curien, France, ²SEGULA Technologie, France

Abstract

With the growing interest in neural network compression, several methods aiming to improve the networks accuracy have emerged. Data augmentation aims to enhance model robustness and generalization by increasing the diversity of the training dataset. Knowledge distillation, aims to transfer knowledge from a teacher network to a student network. Knowledge distillation is generally carried out using high-end GPUs because teacher network architectures are often too heavy to be implemented on the small resources present in the Edge. This paper proposes a new distillation method adapted to an edge computing infrastructure. By employing multiple monoclass teachers of small sizes, the proposed distillation method becomes applicable even within the constrained computing resources of the edge. The proposed method is evaluated with classical knowledge distillation based on bigger teacher network, using different data augmentation methods and using different amount of training data.

Keywords

Neural network compression, knowledge distillation, edge infrastructure, data augmentation

CS&IT Conference Proceedings

Knowledge Distillation based on Monoclass Teachers for Edge Infrastructure