文献收藏与分享平台

张浩彬 (2022-09-21 11:01):

#paper https://doi.org/10.48550/arXiv.2106.00750 Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding 21年ICLR论文，时间序列对比学习代码：https://github.com/sanatonek/TNC_ representation_learning 样本的选择思想是，认为领域内的信号是相似的，领域外的信号是需要区分的正样本的选择：邻域的信号都是服从某个高斯分布，均值为t*,方差是窗口大小和邻域长度.领域内是正样本正样本。如果确定邻域，使用ADF检验。负样本：不在邻域内的就是负样本，但是这一点，作者在损失函数里进一步优化了损失函数：作者认为，不在一个领域不能都认为是负样本，因为时序问题具有周期性，因此应该把它归为正无标记样本（即正类和负类混合）。在处理上，根据PU学习的一些经验，它在上面的负样本中引入权重，同时进入损失函数。、数据：总共3个数据：1个模拟数据（4个类别，HMM生成），1个医疗临床房颤数据（MIT-BIH，特点是类别交替进行，类别非常不平衡，少量个体（人）具体非常长的数据），1个人类活动数据（UCI-HAR数据）下游任务：聚类与分类，其中主要目标是为了尽可能比较表征学习，因此对于同一任务，不同的模型都用了相同的，并且简单的编码器结构。由于不同数据集特点不一样，因此不同任务的编码器不同。聚类用了简单的kmeans；分类用了简单的knn；本文的TNC都取得了最好的结果

arXiv, 2021. DOI: 10.48550/arXiv.2106.00750

Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding

翻译

Sana Tonekaboni, Danny Eytan, Anna Goldenberg

Abstract:

Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for non-stationary time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal's generative process to define neighborhoods in time with stationary properties. Using a debiased contrastive objective, our framework learns time series representations by ensuring that in the encoding space, the distribution of signals from within a neighborhood is distinguishable from the distribution of non-neighboring signals. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients' latent states in settings where labeling data is practically impossible. We compare our method to recently developed unsupervised representation learning approaches and demonstrate superior performance on clustering and classification tasks for multiple datasets.

翻译