Knowledge Distillation for Efficient Learning in Heterogeneous Federated Systems

Title：MAIB-Talk-027: Knowledge Distillation for Efficient Learning in Heterogeneous Federated Systems 7- Date：10:00pm US East time, 08/26/2023
Date：10:00am Beijing time, 08/27/2023
Zoom ID：933 1613 9423
Zoom PWD：416262
Zoom: https://uwmadison.zoom.us/meeting/register/tJcudu-prTIuGNda1MsF8PKyRQlnGn06TP2E

MAIB: Manifold learning, Artificial Intelligence, Biology Forum (MAIB)

Presentation Record(Previous Presentation will be showed here if the video is not released for this talk)

Dr. Zhuangdi Zhu, https://zhuangdizhu.github.io/

Zhuangdi is currently a senior data and application scientist at Microsoft. She received her Ph.D. in Computer Science from Michigan State University in August 2022. Zhai Zhuang has broad research interests in machine learning theory and its applications. Her current research focuses on federated machine learning and reinforcement learning. Zhai Zhuang has rich industry experience in both IT and financial technology companies. During her work and research, she solved practical problems such as machine learning and human-computer interaction, wireless networks, Internet of Things, user recommendation and digital markets.

Background

The rise of federated learning (FL) has brought machine learning to edge computing by leveraging data scattered across edge devices. However, the heterogeneous capacity of edge devices and the difference in data distribution are two major obstacles to the widespread application of FL, leading to long learning convergence time and high communication costs. The research we present in this talk addresses two fundamental challenges in FL, commonly referred to as system heterogeneity and data heterogeneity. In particular, to address the data heterogeneity in FL, we propose a data-free knowledge distillation FL algorithm. Our method benefits and calibrates on-device training by distilling inductive bias learned in a data-free manner. Next, to address system heterogeneity, we propose to learn self-distilling neural networks for arbitrary FL devices that can be flexibly pruned to different sizes and capture domain knowledge in a nested, progressive manner. Empirical studies echo the theoretical implications, showing the significant advantages of our method in achieving efficient and effective FL.

Reference

McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelli- gence and Statistics, pp. 1273–1282. PMLR, 2017.
Lin, T., Kong, L., Stich, S. U., and Jaggi, M. Ensemble distillation for robust model fusion in federated learning. arXiv preprint arXiv:2006.07242, 2020.
Horvath, S., Laskaridis, S., Almeida, M., Leontiadis, I., Ve- nieris, S. I., and Lane, N. D. Fjord: Fair and accurate federated learning under heterogeneous targets with or- dered dropout. 35th Conference on Neural Information Processing Systems (NeurIPS)., 2021.
Diao, E., Ding, J., and Tarokh, V. Heterofl: Computation and communication efficient federated learning for het- erogeneous clients. arXiv preprint arXiv:2010.01264, 2020.”

How this work will be useful for drug discovery and development?

The work on addressing system heterogeneity and data heterogeneity in federated learning (FL) can be highly beneficial for drug discovery and development in several ways:

Efficient Data Utilization: Drug discovery often involves analyzing vast amounts of data from various sources, including clinical trials, genetic databases, and chemical libraries. FL allows pharmaceutical companies and researchers to collaborate and leverage data from multiple sources without centrally aggregating sensitive patient data. By addressing data heterogeneity in FL, the proposed data-free knowledge distillation algorithm can effectively utilize diverse datasets from different sources without the need to share raw data. This ensures privacy and data security while improving the overall data utilization efficiency.
Edge Computing for Drug Discovery: The use of edge devices in FL enables distributed data processing and model training closer to the data sources, reducing the need for data transfer to a central server. This is particularly useful in drug discovery, where data from hospitals, research labs, and clinics can be spread across different geographic locations. The self-distilling neural network approach tailored for arbitrary FL devices allows each edge device to train models that suit its computational capacity, optimizing the learning process for individual devices and potentially speeding up drug discovery efforts.
Accelerated Model Training: Drug discovery requires the development of complex machine learning models for various tasks, such as predicting drug-protein interactions, analyzing molecular structures, or identifying potential drug candidates. System heterogeneity in FL can cause slow convergence and high communication costs. The proposed self-distilling neural network approach addresses this issue by dynamically adapting model complexity to the capabilities of each edge device. This accelerates the model training process, leading to faster iterations and quicker results.
Generalization across Diverse Data: Drug discovery often involves dealing with data from different patient populations, disease types, and genetic variations. Addressing data heterogeneity through the data-free knowledge distillation algorithm can help improve model generalization across diverse datasets. By transferring distilled knowledge across devices without sharing raw data, the FL system can learn from the collective experience of different sources and produce models that perform well on various data distributions.
Privacy and Security: Drug discovery involves sensitive information about patients, molecular structures, and potential drug compounds. Centralizing data for model training can raise privacy and security concerns. FL, coupled with the proposed methods, allows different parties to collaborate without sharing raw data, protecting individual privacy while still benefiting from a collective learning process.

Overall, the work on addressing system heterogeneity and data heterogeneity in FL can contribute significantly to drug discovery and development efforts. It enables more efficient and secure collaboration among researchers, accelerates model training, improves model generalization, and ensures the privacy of sensitive data. These advancements can lead to faster and more effective drug development processes, potentially accelerating the discovery of new treatments and therapies for various diseases.

朱庄翟(dí)目前微软的一名高级数据和应用科学家。她于2022年8月在密歇根州立大学获得了计算机科学博士学位，庄翟对机器学习理论及其应用有广泛的研究兴趣。她目前的研究重点是联邦机器学习和强化学习。庄翟在IT和金融科技公司都拥有丰富的行业经验，在工作研究期间她解决了机器学习与人机交互、无线网络、物联网、用户推荐和数字市场等实际问题。

联邦学习（FL）的崛起通过利用分散在边缘设备上的数据，将机器学习带到了边缘计算。然而，边缘设备容量的异构性和数据分布的不同是FL广泛应用的两个主要障碍，导致学习收敛时间过长和通信成本高昂。我们在本次演讲中介绍的研究解决了FL中的两个基础挑战，通常称为系统异构性和数据异构性。特别地，为了解决FL中的数据异构性，我们提出了一种无数据知识蒸馏FL算法。我们的方法通过蒸馏以无数据方式学习的归纳偏差来使本地设备训练受益并校准。接下来，为了解决系统异构性，我们提出了为任意FL设备学习自我蒸馏神经网络，这些网络可以灵活地剪枝到不同大小，并以嵌套、渐进的方式捕获领域知识。实证研究与理论启示相呼应，显示了我们方法在实现高效、有效的FL方面具有显著优势。