计算机科学
人工智能
合成数据
机器学习
匹配(统计)
蒸馏
集合(抽象数据类型)
数据挖掘
训练集
深度学习
数据集
睡眠阶段
模式识别(心理学)
睡眠(系统调用)
数据建模
培训(气象学)
作者
Hanfei Guo,Junhao Xu,Chang Li,Wei Zhao,Hu Peng,Zhihui Han,Yuanguo Wang,Xun Chen
标识
DOI:10.1088/1741-2552/ae1f3c
摘要
Abstract Objective. With the advancement of deep learning technologies, more and more researchers have begun developing end-to-end automatic sleep stage classification frameworks. However, these frameworks typically require access to large electroencephalogram (EEG) datasets for training, which imposes a significant computational burden. Furthermore, EEG data contains patient privacy information, and using such data for training raises concerns about privacy infringement. To address these issues, we propose a hybrid data distillation method. We aim to enable single-channel EEG sleep stage classification with less training cost and privacy risk by distilling large real datasets into a tiny, privacy-preserving synthetic set for training from scratch. Approach. We first apply the gradient matching method to optimize the randomly initialized synthetic dataset. The gradient changes in the early stages of model training can quickly reduce the performance gap between the synthetic dataset and the source dataset. Subsequently, to avoid oscillations near the optimal solution during gradient matching, we switch to distribution matching to further optimize the synthetic dataset. This method aligns the data distribution at a global level, enhancing overall consistency. In addition, we adopt a novel mini-batch iteration method to assist the synthetic dataset in learning temporal dependencies. Main results. We validated our framework on three public datasets and achieved robust results. Significance. This study proposes an efficient and robust hybrid data distillation algorithm, providing a feasible approach for implementing sleep stage staging based on privacy protection.
科研通智能强力驱动
Strongly Powered by AbleSci AI