Quantum enhanced knowledge distillation
Conference item
Simone, P., Lavagna, L., De Falco, F,, Ceschini, A., Rosato, A., Windridge, D. and Panella, M. 2024. Quantum enhanced knowledge distillation. Quantum Techniques in Machine Learning 2024 Conference. Melbourne, Australia 24 - 29 Nov 2024
Title | Quantum enhanced knowledge distillation |
---|---|
Authors | Simone, P., Lavagna, L., De Falco, F,, Ceschini, A., Rosato, A., Windridge, D. and Panella, M. |
Abstract | Knowledge distillation (KD) is a process of training a “student” machine learning system using the outputs of a pre-trained “teacher” model and it is a well-established practice in the optimization of classical Deep Neural Networks (DNNs) [1]. Usually, it is enacted via the substitution of the output softmax layer of the trained DNN teacher network with an equivalent layer of Boltzmann-temperature parameterized sigmoid functions, leveraging gradient information implicit in the softened logits for the training of the smaller student network. The smaller network is thus trained to replicate the output sigmoid layer of the larger network during training [2]. However, KD remains relatively unexplored within the quantum machine learning domain, with only a few pioneering studies [3], [4]. Some challenges in the quantum domain include the incongru-ence of the respective learning architectures and the transferability of gradient information in inter-domain approaches (e.g., classical-to-quantum distillation) or intra-domain transfer (e.g., quantum-to-quantum distillation). Additionally, the scarcity of quantum-to-quantum distillation research could be due to the current absence of sufficiently large and efficient quantum network architectures neces-sitating a distillation step a priori. In this work, we focus on the classical-to-quantum paradigm and investigate the extent to which a hybrid quantum-classical architecture can effectively learn from the softmax outputs of a classical Multi-Layer Perceptron (MLP) in multi-class classification tasks. The multi-class scenario is chosen both for its representativeness of typical DNN usage and its inherently greater potential for meaningful gradient information transfer. In doing so, we demonstrate substantial empirical efficiency gains for classical-to-quantum KD in relation to an emblematic non-linearly separable 3-class problem. Our findings reveal that classical-to-quantum KD enhances the performances of standard hybrid quantum architectures and paves the way for the applicability of distillation techniques in the quantum realm. |
Sustainable Development Goals | 9 Industry, innovation and infrastructure |
Middlesex University Theme | Creativity, Culture & Enterprise |
Research Group | Artificial Intelligence group |
Conference | Quantum Techniques in Machine Learning 2024 Conference |
Publication process dates | |
Accepted | 01 Sep 2024 |
Completed | 29 Nov 2024 |
Deposited | 11 Oct 2024 |
Output status | Published |
Accepted author manuscript | File Access Level Restricted |
Web address (URL) | https://indico.qtml2024.org/event/1/contributions/254/ |
Language | English |
https://repository.mdx.ac.uk/item/1v1097
Restricted files
Accepted author manuscript
27
total views5
total downloads6
views this month2
downloads this month