来自用户 张浩彬 的文献。
当前共找到 29 篇文献分享,本页显示第 1 - 20 篇。
1.
张浩彬 (2024-06-30 10:34):
@paper https://doi.org/10.48550/arXiv.2403.10131 RAFT: Adapting Language Model to Domain Specific RAG 对我而言很有启发性的paper。在大型文本数据集上预训练大型语言模型(LLMs)已成为一种标准范式。当将这些LLMs用于许多下游应用时,通常会将新的知识(例如,时效性新闻或私有领域知识)通过基于RAG(Retrieval-Augmented Generation,检索增强生成)的提示或微调,融入到预训练模型中。然而,模型如何以最优方式获取这种新知识仍然是一个开放的问题。在这篇论文中,提出了检索增强微调(Retrieval Augmented Fine Tuning,RAFT),简单来说,就是你要用rag的东西微调一下,并使用思维链熟悉一下要做的事情。当然,rag本身和微调就是两个套路,现在合在一起,似乎有点本末倒置,这也是这篇论文我认为没有讨论清楚的地方。不过这些不清楚的地方也是新的研究空间。
Abstract:
Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake … >>>
Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain such new knowledge remains an open question. In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "open-book" in-domain settings. In RAFT, given a question, and a set of retrieved documents, we train the model to ignore those documents that don't help in answering the question, which we call, distractor documents. RAFT accomplishes this by citing verbatim the right sequence from the relevant document that would help answer the question. This coupled with RAFT's chain-of-thought-style response helps improve the model's ability to reason. In domain-specific RAG, RAFT consistently improves the model's performance across PubMed, HotpotQA, and Gorilla datasets, presenting a post-training recipe to improve pre-trained LLMs to in-domain RAG. RAFT's code and demo are open-sourced at github.com/ShishirPatil/gorilla. <<<
翻译
2.
张浩彬 (2024-05-31 07:31):
#paper doi:https://doi.org/10.48550/arXiv.2403.10131 RAFT: Adapting Language Model to Domain Specific RAG 简单但有效的思路。传统大模型变为领域 应用,我们可以微调也可以使用rag,但微软说,我们可以应该基于rag微调。RAFT 是一种将预训练的大型语言模型微调到特定领域 RAG 设置的通用方法。在特定领域 RAG 中,模型需要根据特定领域的一组文档回答问题,例如企业中的私有文件。这与通用 RAG 不同,因为通用 RAG 中的模型并不知道它将在哪个领域进行测试。简单来说,微调是闭卷考试,靠记忆回答。rag是开卷开始,虽然我没记忆,但是考试的时候可以翻书,那么raft就是开卷考试前,我还是先看了一下教科书,虽然没看全,但是大概知道考题长什么样子,但没关系,因为考试的时候我还可以翻书。
3.
张浩彬 (2024-04-29 20:35):
#paper doi: https://doi.org/10.48550/arXiv.2211.14730 A Time Series is Worth 64 Words: Long-term Forecasting with Transformers ICLR2023的文章,提出了PatchTST。受vision Transformer的启发,把patch技术引入到时序问题。并且回应了早期另一篇认为Transformer用在时间序列其实并不比传统线性模型好的文章(Are transformers effective for time series forecasting?(2022)),重新取得了sota。然而23年底,又有新方法出现了,讨论了其实关键不是transformer,而是patch技术
Abstract:
We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into … >>>
We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. Patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models. We also apply our model to self-supervised pre-training tasks and attain excellent fine-tuning performance, which outperforms supervised training on large datasets. Transferring of masked pre-trained representation on one dataset to others also produces SOTA forecasting accuracy. Code is available at: https://github.com/yuqinie98/PatchTST. <<<
翻译
4.
张浩彬 (2023-06-30 11:45):
#paper The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting doi: https://doi.org/10.48550/arXiv.2304.05206Focus to learn more 专门研究了针对多元时间序列的预测问题,探讨了使用独立预测以及联合预测的差异,证明了由于分布偏移的存在,独立预测的方法更好,应为其更加有利于缓解分布偏移的问题,提高模型的繁华性。并且文章证明了独立预测和联合预测,是一种模型容量和模型鲁棒性的权衡。随州论文提出了包括正则化,低秩分解、采用MAE代替MSE,调整序列长度等方法提高联合预测的精度
Abstract:
Multivariate time series data comprises various channels of variables. The multivariate forecasting models need to capture the relationship between the channels to accurately predict future values. However, recently, there has … >>>
Multivariate time series data comprises various channels of variables. The multivariate forecasting models need to capture the relationship between the channels to accurately predict future values. However, recently, there has been an emergence of methods that employ the Channel Independent (CI) strategy. These methods view multivariate time series data as separate univariate time series and disregard the correlation between channels. Surprisingly, our empirical results have shown that models trained with the CI strategy outperform those trained with the Channel Dependent (CD) strategy, usually by a significant margin. Nevertheless, the reasons behind this phenomenon have not yet been thoroughly explored in the literature. This paper provides comprehensive empirical and theoretical analyses of the characteristics of multivariate time series datasets and the CI/CD strategy. Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series. In contrast, the CI approach trades capacity for robust prediction. Practical measures inspired by these analyses are proposed to address the capacity and robustness dilemma, including a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy. We hope our findings can raise awareness among researchers about the characteristics of multivariate time series and inspire the construction of better forecasting models. <<<
翻译
5.
张浩彬 (2023-05-30 11:48):
#paper:doi:10.48550/arXiv.2010.04515 Principal Component Analysis using Frequency Components of Multivariate Time Series 提出了一个新的谱分解方法,使得对多元时间序列(二阶平稳,宽平稳)进行分解,从而使得分解后的子序列在组内是有非零的谱相关,而跨组的子序列则具有零的谱相关性。从写作上,则是典型的问题引入,方法介绍、理论的渐近性质证明,数值模拟,实证研究,其中有大量的推导。
Abstract:
Dimension reduction techniques for multivariate time series decompose the observed series into a few useful independent/orthogonal univariate components. We develop a spectral domain method for multivariate second-order stationary time series … >>>
Dimension reduction techniques for multivariate time series decompose the observed series into a few useful independent/orthogonal univariate components. We develop a spectral domain method for multivariate second-order stationary time series that linearly transforms the observed series into several groups of lower-dimensional multivariate subseries. These multivariate subseries have non-zero spectral coherence among components within a group but have zero spectral coherence among components across groups. The observed series is expressed as a sum of frequency components whose variances are proportional to the spectral matrices at the respective frequencies. The demixing matrix is then estimated using an eigendecomposition on the sum of the variance matrices of these frequency components and its asymptotic properties are derived. Finally, a consistent test on the cross-spectrum of pairs of components is used to find the desired segmentation into the lower-dimensional subseries. The numerical performance of the proposed method is illustrated through simulation examples and an application to modeling and forecasting wind data is presented. <<<
翻译
6.
张浩彬 (2023-04-28 13:45):
#paper An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling DOI:arXiv:1803.01271 . 最近密集地做时序问题的分享,认真看了一下TCN的原文.除了RNN那一套,TCN还是用得比较多。为了在不增加太多层的情况下实现大的感受野,通过空洞卷积来实现,并通过padding和裁剪的方式避免了数据泄露问题。一个TCN块有两个空洞因果卷积,激活层,norm层以及一个残差链接组成。实验证明了TCN的超参数相对不敏感,但卷积核大小k是个关键,另外drop out 和梯度裁剪也有较大的帮助。
Abstract:
For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and … >>>
For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks. To assist related work, we have made code available at this http URL . <<<
翻译
7.
张浩彬 (2023-03-27 15:40):
#paper 10.1109/ijcnn52387.2021.9533426 Self-Supervised Pre-training for Time Series Classification 少有的时间序列迁移学习文章,利用DTW计算距离建立代理任务构建正负样本来做学习,encoder用的transformer,新意少了点。
Abstract:
Recently, significant progress has been made in time series classification with deep learning. However, using deep learning models to solve time series classification generally suffers from expensive calculations and difficulty … >>>
Recently, significant progress has been made in time series classification with deep learning. However, using deep learning models to solve time series classification generally suffers from expensive calculations and difficulty of data labeling. In this work, we study self-supervised time series pre-training to overcome these challenges. Compared with the existing works, we focus on the universal and unlabeled time series pretraining. To this end, we propose a novel end-to-end neural network architecture based on self-attention, which is suitable for capturing long-term dependencies and extracting features from different time series. Then, we propose two different self-supervised pretext tasks for time series data type: Denoising and Similarity Discrimination based on DTW (Dynamic Time Warping). Finally, we carry out extensive experiments on 85 time series datasets (also known as UCR2015 [2]). Empirical results show that the time series model augmented with our proposed self-supervised pretext tasks achieves state-of-the-art / highly competitive results. <<<
翻译
8.
张浩彬 (2023-02-28 15:49):
#paper 10.5555/2503308.2188396 Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics啃一下nce。nce主要是解决一个问题,当分类类别太多的失衡,softmax的归一化因子计算量太大,于是作者提出nce作为一个替代。作者很巧妙地设计了一个代理任务,把原有的分类问题,转化为一个吧目标从噪声样本中识别出来的二分类问题,从而规避了计算规范化因子的计算量问题。并且作者证明了,当样本趋向于无穷的时候,nce等价于mle。
Abstract:
We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the … >>>
We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, the model is only specified up to the partition function. The partition function normalizes a model so that it integrates to one for any choice of the parameters. However, it is often impossible to obtain it in closed form. Gibbs distributions, Markov and multi-layer networks are examples of models where analytical normalization is often impossible. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. We propose here a new objective function for the estimation of both normalized and unnormalized models. The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise. With this approach, the normalizing partition function can be estimated like any other parameter. We prove that the new estimation method leads to a consistent (convergent) estimator of the parameters. For large noise sample sizes, the new estimator is furthermore shown to behave like the maximum likelihood estimator. In the estimation of unnormalized models, there is a trade-off between statistical and computational performance. We show that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models. As an application to real data, we estimate novel two-layer models of natural image statistics with spline nonlinearities. <<<
翻译
9.
张浩彬 (2023-01-30 13:34):
#paper https://doi.org/10.48550/arXiv.2202.01575 COST: CONTRASTIVE LEARNING OF DISENTANGLED SEASONAL-TREND REPRESENTATIONS FOR TIME SERIES FORECASTING 1.  文章认为一个时间序列可由3个部分组成,趋势项+季节项+误差项。我们需要学习的趋势项和季节项 2.  从整体结构上看,对于原始序列通过编码器(TCN)将原始序列映射到隐空间中,之后分别通过两个结构分理出趋势项及季节项分别进行对比学习 a.  对于趋势项来说,对于获得的隐空间表示,输入到自回归专家混合提取器中进行趋势提取,并通过时域进行对比损失学习。时域的对比损失学习参考了Moco进行 b.  对于季节项,用离散傅里叶变换将隐空间映射到频域,频域损失函数定义为波幅和相位的损失。 3.  最终总的损失函数时域+频域的损失函数 4.  基于5个数据和多个基线模型进行对比,包括TS2Vec、TNC,Moco,Informer、LogTrans、TCN等,大部分取得了SOTA的效果
Abstract:
Deep learning has been actively studied for time series forecasting, and the mainstream paradigm is based on the end-to-end training of neural network architectures, ranging from classical LSTM/RNNs to more … >>>
Deep learning has been actively studied for time series forecasting, and the mainstream paradigm is based on the end-to-end training of neural network architectures, ranging from classical LSTM/RNNs to more recent TCNs and Transformers. Motivated by the recent success of representation learning in computer vision and natural language processing, we argue that a more promising paradigm for time series forecasting, is to first learn disentangled feature representations, followed by a simple regression fine-tuning step -- we justify such a paradigm from a causal perspective. Following this principle, we propose a new time series representation learning framework for time series forecasting named CoST, which applies contrastive learning methods to learn disentangled seasonal-trend representations. CoST comprises both time domain and frequency domain contrastive losses to learn discriminative trend and seasonal representations, respectively. Extensive experiments on real-world datasets show that CoST consistently outperforms the state-of-the-art methods by a considerable margin, achieving a 21.3% improvement in MSE on multivariate benchmarks. It is also robust to various choices of backbone encoders, as well as downstream regressors. Code is available at this https URL. <<<
翻译
10.
张浩彬 (2022-12-31 23:07):
#paper doi:10.1145/3447548.3467401 A transformer-based framework for multivariate time series representation learning 1.多头transformer可以对应到时间序列的多周期。 2.  在通用框架中:原始数据先进行投影并加入位置信息得到第一次引入位置的编码 3.  只用transformer的编码器提取特征,而不适用解码器,使得其更能适应各种下游任务 4.  另外由于transformer对顺序不敏感,因此模型也将位置编码到输入向量 5.  对于变长数据的处理,本文使用任意值掩码进行填充,并为填充位置的注意力分数提供了一个很大的负值迫使忽略填充位置(这个掩码是初始值,后续是否有可能更新到非负值?) 6.  掩码的实际应用了一定的技巧。另外对掩码的预测实际上就将其变为了一个非时间序列问题,而是一个nlp的填空问题 7.  预训练模型:对于多变量的时间序列,对于每个变量随机独立地屏蔽一段子序列。而在损失函数中,仅考虑对被屏蔽段的损失。 8.  模型最后的任务是回归和分类。但是回归并不是用于对未来时间的预测,而是类似于利用房屋的气压,湿度,风速数据预测房屋的当天能耗,使用的是MSE。分类任务则是使用交叉熵 9.  下游任务似乎只是简单的全连接层 10.  模型的比较对象是reocket,lstm,xgb--这个比较就有点差强人意了
Abstract:
We present a novel framework for multivariate time series representation learning based on the transformer encoder architecture. The framework includes an unsupervised pre-training scheme, which can offer substantial performance benefits … >>>
We present a novel framework for multivariate time series representation learning based on the transformer encoder architecture. The framework includes an unsupervised pre-training scheme, which can offer substantial performance benefits over fully supervised learning on downstream tasks, both with but even without leveraging additional unlabeled data, i.e., by reusing the existing data samples. Evaluating our framework on several public multivariate time series datasets from various domains and with diverse characteristics, we demonstrate that it performs significantly better than the best currently available methods for regression and classification, even for datasets which consist of only a few hundred training samples. Given the pronounced interest in unsupervised learning for nearly all domains in the sciences and in industry, these findings represent an important landmark, presenting the first unsupervised method shown to push the limits of state-of-the-art performance for multivariate time series regression and classification. <<<
翻译
11.
张浩彬 (2022-11-10 00:03):
#paper Momentum Contrast for Unsupervised Visual Representation Learning doi:10.1109/cvpr42600.2020.00975 大名鼎鼎的moco。之前只是粗略了解,今天算是认真精读了一下。受nlp的影响,cv也开始了自监督方法的新一代卷了。 自监督,其实也是无监督了。倒是为了和前人分开,又起了self supervised learning的名字。moco这篇论文,算是趟平了有监督和无监督的差距了,第一次用无监督的方法取得了比有监督预训练任务更好的结果。毕竟人工打标签还是很贵的,如果可以利用无监督的方法对模型进行预训练,那么可以说大大降低了受限于标注数据的性能瓶颈了。 说回本文的技术,可以算是自监督目前的一个主流了(另一个是生成式)。在对比学习中,关键在于:1代理任务;2损失函数。当然本文的突出主要贡献还是在于动量更新方法。 1.moco中的代理任务,选择了比较简单的个体判别,即对于原始数据某个样本x_i,通过两个不同的数据增强,获得锚点样本和正样本;而其他样本,则是相对于该基础样本的负样本。另外,作者提到,把锚点称之为q(query),正样本和负样本对称之为k(key)。 2.损失函数是infonce,本质是其实还是类似于softmax。但是考虑到我们有这么多负样本,实际上就有这么多类别,所以选用了infonce,超参数是“温度” 3.接下里是本文的两个贡献,或者说回归对比学习,作者也提到受到两个问题制约:1是字典大小;2是字典一致性(字典姑且理解为负样本集,对比学习中,负样本集越大越好。另外moco是一个正样本,但也有文献证明,使用多个正样本更好)。(1)字典大小问题:在simclr这样的方法中,实际上每个batch都是对应的字典,这样就保证字典一致性,但是问题是字典大小受限制。要对这么多一个字典是反向传播,需要GPU非常大的内存,其次就是大batchsize的优化,相对也更难。(2)字典一致性问题:相比于simclr这样的端到端方法,另一个套路是使用memory bank的方式。我们依然有一个很大的字典,每次从字典抽样k个负样本进行梯度更新。但是这样的问题在于每次我们只更新了抽样的k个样本,实际上,这时候整个字典的特征是很不一致的(因为我们每次用一个较大梯度去更新所选k个样本,这样对于一个大字典,一个epoch后,第一次更新和最后一次更新的特征会差别很大) 4.针对以上问题,作者首先提出队列作为字典。即每次更新新的特征后,最久的特征剔除出字典,新的特征进度。 5.其次就是动量更新。及我们依然使用梯度下降更新q的encoder。但是我们不再使用梯度更新更细k的encoder,而是使用动量的方式,即,(另外,初始化时,k的encoder是直接复制q的encoder)并且m选择一个非常大的数,例如0.999,这就保证了每次更新都只更新一点,从而保证了字典一致性。 5.接下里是下游任务实现。作者冻结了主干网络,之后只使用了Linear Classification Protocol(一个全连接层和softmax层)作为最后的分类器,与经过ImageNet预训练后的其他模型进行比较,除少部分任务外,moco基本都取得了sota结果。 6.另一个有意思的地方是,moco的分类器,用girdsearch日常搜索发现最优的学习率是30.作者解释到,这也说明了自监督得到的特征确实与有监督得到的特征差别很大。另外在后续比较中,考虑到使用grid search不方便,作者使用了归一化处理。
Abstract:
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged … >>>
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks. <<<
翻译
12.
张浩彬 (2022-10-20 16:20):
#paper 1.Unsupervised Scalable Representation Learning for Multivariate Time Series,https://doi.org/10.48550/arXiv.1901.10738 论文关键是:正负样本构造, triplet loss以及因果空洞卷积 适用:该无监督学习模型可以用于不定长的序列;短序列及长序列均可使用; 代码:https://github.com/White-Link/UnsupervisedScalableRepresentationLearningTimeSeries 正负样本构造: 有N个序列对于某序列,随机选择长度,构造一个子序列ref。在这个子序列中,随机抽样一个子序列作为正样本pos;从其他序列(如果有的话)中随机抽样K个作为负样本neg;其中K是超参数 编码器有三个要求:(1)能够提取序列特征;(2)允许变长输入;(3)可以节省时间和内存;(个人觉得,只是为了给使用卷积找的理由);因此使用exponentially dilated causal convolutions作为特征提取器代替传统的rnn、lstm 改造的triplet loss 在时间序列分类任务中结果表明由于现有的无监督方法,并且不差于有监督方法。在序列预测任务中,没做太多的比较 在单序列分类任务:使用了UCR数据集上的所有时间序列分类任务
Abstract:
Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle this challenge by … >>>
Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle this challenge by proposing an unsupervised method to learn universal embeddings of time series. Unlike previous works, it is scalable with respect to their length and we demonstrate the quality, transferability and practicability of the learned representations with thorough experiments and comparisons. To this end, we combine an encoder based on causal dilated convolutions with a novel triplet loss employing time-based negative sampling, obtaining general-purpose representations for variable length and multivariate time series. <<<
翻译
13.
张浩彬 (2022-09-21 11:01):
#paper https://doi.org/10.48550/arXiv.2106.00750 Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding 21年ICLR论文,时间序列对比学习 代码:https://github.com/sanatonek/TNC_ representation_learning 样本的选择思想是,认为领域内的信号是相似的,领域外的信号是需要区分的 正样本的选择:邻域的信号都是服从某个高斯分布,均值为t*,方差是窗口大小和邻域长度.领域内是正样本正样本。如果确定邻域,使用ADF检验。 负样本:不在邻域内的就是负样本,但是这一点,作者在损失函数里进一步优化了 损失函数:作者认为,不在一个领域不能都认为是负样本,因为时序问题具有周期性,因此应该把它归为正无标记样本(即正类和负类混合)。在处理上,根据PU学习的一些经验,它在上面的负样本中引入权重,同时进入损失函数。、 数据:总共3个数据:1个模拟数据(4个类别,HMM生成),1个医疗临床房颤数据(MIT-BIH,特点是类别交替进行,类别非常不平衡,少量个体(人)具体非常长的数据),1个人类活动数据(UCI-HAR数据) 下游任务:聚类与分类,其中主要目标是为了尽可能比较表征学习,因此对于同一任务,不同的模型都用了相同的,并且简单的编码器结构。由于不同数据集特点不一样,因此不同任务的编码器不同。 聚类用了简单的kmeans;分类用了简单的knn;本文的TNC都取得了最好的结果
Abstract:
Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for … >>>
Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for non-stationary time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal's generative process to define neighborhoods in time with stationary properties. Using a debiased contrastive objective, our framework learns time series representations by ensuring that in the encoding space, the distribution of signals from within a neighborhood is distinguishable from the distribution of non-neighboring signals. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients' latent states in settings where labeling data is practically impossible. We compare our method to recently developed unsupervised representation learning approaches and demonstrate superior performance on clustering and classification tasks for multiple datasets. <<<
翻译
14.
张浩彬 (2022-08-24 09:56):
#paper doi: 10.1007/s11222-022-10130-1 Merlo, L., Maruotti, A., Petrella, L., & Punzo, A. (2022). Quantile hidden semi-Markov models for multivariate time series. Statistics and Computing, 32(4). https://doi.org/10.1007/s11222-022-10130-1 模型关键词: 解决问题:多元时间序列,分位数回归; 解决技术:隐藏半马尔科夫(解决停留时间不满足几何分布有偏问题,模型可以选择更多的分布形式,从而更加灵活)、多元非对称拉普拉斯分布(解决一般非位数回归扩到高维的问题) 估计方法:极大似然估计,EM算法 实证:意大利大气空气质量预测,尤其是极端分位数的估计
Abstract:
This paper develops a quantile hidden semi-Markov regression to jointly estimate multiple quantiles for the analysis of multivariate time series. The approach is based upon the Multivariate Asymmetric Laplace (MAL) … >>>
This paper develops a quantile hidden semi-Markov regression to jointly estimate multiple quantiles for the analysis of multivariate time series. The approach is based upon the Multivariate Asymmetric Laplace (MAL) distribution, which allows to model the quantiles of all univariate conditional distributions of a multivariate response simultaneously, incorporating the correlation structure among the outcomes. Unobserved serial heterogeneity across observations is modeled by introducing regime-dependent parameters that evolve according to a latent finite-state semi-Markov chain. Exploiting the hierarchical representation of the MAL, inference is carried out using an efficient Expectation-Maximization algorithm based on closed form updates for all model parameters, without parametric assumptions about the states' sojourn distributions. The validity of the proposed methodology is analyzed both by a simulation study and through the empirical analysis of air pollutant concentrations in a small Italian city. <<<
翻译
15.
张浩彬 (2022-08-23 15:36):
#paper doi: 10.1080/10618600.2021.1909601 Moon, S. J., Jeon, J.-J., Lee, J. S. H., & Kim, Y. (2021). Learning Multiple Quantiles With Neural Networks. Journal of Computational and Graphical Statistics, 30(4), 1238–1248. 提出了一个神经网络模型,用于估计满足非交叉属性的多个条件分位数。 传统的分位数回归会面临一个问题就是可能会出现分位数交叉,即85%分位数的值大于90%分位数的值。一般来说有两种处理策略:(1)调整转转模型参数;(2)将模型空间限制为非交叉分位数。本文采用了第二种思路,借鉴了线性非交叉分位数回归(非交叉SVR中的一个策略,这个策略问题在于计算量可能比较大),提出了一种具有不等式约束的非交叉分位数神经网络模型(把不等式约束用在了神经网络隐藏层)。 解决了交叉问题,第二个贡献是计算效率。为了使用一阶优化方法,文章开发了一种新算法来拟合所提出的模型。 该算法在没有需要多项式计算时间的投影梯度步骤的情况下给出了几乎最优的解决方案。
Abstract:
We present a neural network model for estimation of multiple conditional quantiles that satisfies the noncrossing property. Motivated by linear noncrossing quantile regression, we propose a noncrossing quantile neural network … >>>
We present a neural network model for estimation of multiple conditional quantiles that satisfies the noncrossing property. Motivated by linear noncrossing quantile regression, we propose a noncrossing quantile neural network model with inequality constraints. In particular, to use the first-order optimization method, we develop a new algorithm for fitting the proposed model. This algorithm gives a nearly optimal solution without the projected gradient step that requires polynomial computation time. We compare the performance of our proposed model with that of existing neural network models on simulated and real precipitation data. Supplementary materials for this article are available online. <<<
翻译
16.
张浩彬 (2022-08-11 16:10):
#paper 10.48550/arXiv.1901.10738 Unsupervised Scalable Representation Learning for Multivariate Time Series 论文关键是:正负样本构造, triplet loss以及因果空洞卷积 适用:该无监督学习模型可以用于不定长的序列;短序列及长序列均可使用; 1.正负样本构造:对于某序列,随机选择长度,构造一个子序列。在这个子序列中,随机抽样一个子序列作为正样本;从其他序列中随机抽样作为一个负样本 2.改造的triplet loss 3. exponentially dilated causal convolutions作为特征提取器代替传统的rnn、lstm 结果表明由于现有的无监督方法,并且不差于有监督方法。
Abstract:
Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle this challenge by … >>>
Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle this challenge by proposing an unsupervised method to learn universal embeddings of time series. Unlike previous works, it is scalable with respect to their length and we demonstrate the quality, transferability and practicability of the learned representations with thorough experiments and comparisons. To this end, we combine an encoder based on causal dilated convolutions with a novel triplet loss employing time-based negative sampling, obtaining general-purpose representations for variable length and multivariate time series. <<<
翻译
17.
张浩彬 (2022-08-11 16:09):
#paper https://doi.org/10.48550/arXiv.2103.07719 Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting 对输入使用“Latent Correlation Layer”自动生成图结构;对图结构输入StemGNN层; 该层首先使用GFT(图傅里叶变换)将图转为谱矩阵( 其中每个节点的单变量时间序列变为线性独立),然后使用离散傅里叶变换对每个单变量分量转到频域,并利用一维卷积以及GLU提取特征模式,再通过逆离散傅里叶变换变回时域。另外,模型产生一个预测损失(对未来值),一个回溯损失(对历史值),对两个损失合并作为联合的损失函数。
Abstract:
Multivariate time-series forecasting plays a crucial role in many real-world applications. It is a challenging problem as one needs to consider both intra-series temporal correlations and inter-series correlations simultaneously. Recently, … >>>
Multivariate time-series forecasting plays a crucial role in many real-world applications. It is a challenging problem as one needs to consider both intra-series temporal correlations and inter-series correlations simultaneously. Recently, there have been multiple works trying to capture both correlations, but most, if not all of them only capture temporal correlations in the time domain and resort to pre-defined priors as inter-series relationships. In this paper, we propose Spectral Temporal Graph Neural Network (StemGNN) to further improve the accuracy of multivariate time-series forecasting. StemGNN captures inter-series correlations and temporal dependencies \textit{jointly} in the \textit{spectral domain}. It combines Graph Fourier Transform (GFT) which models inter-series correlations and Discrete Fourier Transform (DFT) which models temporal dependencies in an end-to-end framework. After passing through GFT and DFT, the spectral representations hold clear patterns and can be predicted effectively by convolution and sequential learning modules. Moreover, StemGNN learns inter-series correlations automatically from the data without using pre-defined priors. We conduct extensive experiments on ten real-world datasets to demonstrate the effectiveness of StemGNN. Code is available at this https URL <<<
翻译
18.
张浩彬 (2022-08-11 12:06):
#paper    10.1137/1.9781611976700.60 Attention-Based Autoregression for Accurate and Efficient Multivariate Time Series Forecasting AttnAR:提出一个新模型,结合了注意力机制将变量的相关转化为时不变注意力图。并且由于其当中使用了共线参数,比一般的深度神经网络时序模型的参数量降低到了1%左右,并且对模型有较好的解释性。总结来看,在1块1080ti跑完了所有模型,确实很有亲切感。 具体结构中: (1)使用深度卷积层和浅的全连接层分别对每个序列提取模式。(这里应该是共享了权重) (2)结合注意力机制,从前面的序列模式中生成注意力图(序列模式可直接输入,也可考虑经过embedding再输入) 最后把序列模式ui以及经过注意力机制提取的vi链接在一起,并通过全连接层产生最终输出
Abstract:
Given a multivariate time series, how can we forecast all of its variables efficiently and accurately? The multivariate forecasting, which is to predict the future observations of a multivariate time … >>>
Given a multivariate time series, how can we forecast all of its variables efficiently and accurately? The multivariate forecasting, which is to predict the future observations of a multivariate time series, is a fundamental problem closely related to many real-world applications. However, previous multivariate models suffer from large model sizes due to the inefficiency of capturing complex intra-variable patterns and inter-variable correlations, resulting in poor accuracy. In this work, we propose AttnAR (attention-based autoregression), a novel approach for general multivariate forecasting which maximizes its model efficiency via separable structure. AttnAR first extracts variable-wise patterns by a mixed convolution extractor that efficiently combines deep convolution layers and shallow dense layers. Then, AttnAR aggregates the patterns by learning time-invariant attention maps between the target variables. AttnAR accomplishes the state-of-the-art forecasting accuracy in four datasets with up to 117.3 times fewer parameters than the best competitors. <<<
翻译
19.
张浩彬 (2022-08-10 22:51):
#paper 10.1609/aaai.v34i04.6056 在现实中,具备缺失值的时序很常见。在预测中,我们往往借助缺失位置的局部信息,或者全局均值等方式对缺失值进行插补在进行预测。但是对于缺失率较高,或存在连续缺失的情况,这些方法就可能不够了。本文提出了称为Lgnet的网络结构,在基于LSTM的基础上,对于多时间序列预测问题,借助其他序列的信息,对于序列的缺失值构建基于局部和全局的插补,并且结合gan增强对全局的估计 局部特征构造:经验均值和距离该值往后最近的一点 全局特征:对整体序列进行模式的识别(模式的数量是一个超参数),然后利用局部特征作为索引,找到相似的序列模式,并进行加权构造 以数据点的局部特征作为索引 最后对缺失值的估计有,由4部分取平均:经验均值,最近值,LSTM原始网络的预测值以及全局特征。另外,本文引入gan增强对输出的预测。 最后的实验来看:(1)Lgnet能够提高预测准确率;(2)对数据缺失率进行实验,Lgnet对缺失比例有比较强的鲁棒性。 消融实验:(1)基于内存模块所构造的全局特征,对数据确实的鲁棒性有比较重要的影响;(2)加入gan,能够提高2%-10%的预测精度,尤其是对缺失较高的数据集来说,引入gan更有利于捕捉全局的数据分布
Abstract:
<jats:p>Multivariate time series (MTS) forecasting is widely used in various domains, such as meteorology and traffic. Due to limitations on data collection, transmission, and storage, real-world MTS data usually contains … >>>
<jats:p>Multivariate time series (MTS) forecasting is widely used in various domains, such as meteorology and traffic. Due to limitations on data collection, transmission, and storage, real-world MTS data usually contains missing values, making it infeasible to apply existing MTS forecasting models such as linear regression and recurrent neural networks. Though many efforts have been devoted to this problem, most of them solely rely on local dependencies for imputing missing values, which ignores global temporal dynamics. Local dependencies/patterns would become less useful when the missing ratio is high, or the data have consecutive missing values; while exploring global patterns can alleviate such problem. Thus, jointly modeling local and global temporal dynamics is very promising for MTS forecasting with missing values. However, work in this direction is rather limited. Therefore, we study a novel problem of MTS forecasting with missing values by jointly exploring local and global temporal dynamics. We propose a new framework øurs, which leverages memory network to explore global patterns given estimations from local perspectives. We further introduce adversarial training to enhance the modeling of global temporal distribution. Experimental results on real-world datasets show the effectiveness of øurs for MTS forecasting with missing values and its robustness under various missing ratios.</jats:p> <<<
翻译
20.
张浩彬 (2022-08-09 17:26):
#paper 10.48550/arXiv.2203.03423 Multivariate Time Series Forecasting with Latent Graph Inference 2022的文章。我觉得比较有意思的是,我感觉作者是把简单的东西套在了一个高级的框架里面(这种写作思路值得学习)文章把多变量预测问题分成了两个路线,一个是全局单变量建模(变量共享),一个是直接全局建模全局预测。而作者说他的办法是在第一个方法的基础上进行模块化扩展。具体来说,就是每个单独序列输入编码器生成隐变量。隐变量三会进入一图结构中然后得到隐变量的预测输出。再将输出解码得到最终输出。然后作者说中间的图结构,我们有两种方式,一种是全连接图网络(FC-GNN),一种是二分法图网络(BP-GNN)(我理解是GNN中聚类的一种变体,至于多少类别,则是一个超参数)。这种思路,显然效率会有很大的提升,即使是作者提到的全局GNN,因为只是对隐变量作图,效率也是有提升,更不要说通过抽样构造子图了。所以比起基线模型效率最高,完全可以理解。倒是在准确率的讨论上,实际上作者提出的网络也不全是最优的(两个数据集,一个大部分最优,另一个不是)。虽然做了个简单的消融实验,但是作者也没怎么解释。 总结下来几点: (1)往上套一个大框架:多变量预测分成两种;embedding变成隐变量;图模型中提供了全连接+二分图的性能-效率权衡() (2)实验不够,加模拟(这一点还真类似统计中oracle性质的讨论,貌似在深度学习的会议中相对少见)
Abstract:
This paper introduces a new approach for Multivariate Time Series forecasting that jointly infers and leverages relations among time series. Its modularity allows it to be integrated with current univariate … >>>
This paper introduces a new approach for Multivariate Time Series forecasting that jointly infers and leverages relations among time series. Its modularity allows it to be integrated with current univariate methods. Our approach allows to trade-off accuracy and computational efficiency gradually via offering on one extreme inference of a potentially fully-connected graph or on another extreme a bipartite graph. In the potentially fully-connected case we consider all pair-wise interactions among time-series which yields the best forecasting accuracy. Conversely, the bipartite case leverages the dependency structure by inter-communicating the N time series through a small set of K auxiliary nodes that we introduce. This reduces the time and memory complexity w.r.t. previous graph inference methods from O(N^2) to O(NK) with a small trade-off in accuracy. We demonstrate the effectiveness of our model in a variety of datasets where both of its variants perform better or very competitively to previous graph inference methods in terms of forecasting accuracy and time efficiency. <<<
翻译
回到顶部