张浩彬
(2024-04-29 20:35):
#paper doi:
https://doi.org/10.48550/arXiv.2211.14730
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
ICLR2023的文章,提出了PatchTST。受vision Transformer的启发,把patch技术引入到时序问题。并且回应了早期另一篇认为Transformer用在时间序列其实并不比传统线性模型好的文章(Are transformers effective for time series forecasting?(2022)),重新取得了sota。然而23年底,又有新方法出现了,讨论了其实关键不是transformer,而是patch技术
arXiv,
2022.
DOI: 10.48550/arXiv.2211.14730
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
翻译
Abstract:
We propose an efficient design of Transformer-based models for multivariatetime series forecasting and self-supervised representation learning. It isbased on two key components: (i) segmentation of time series intosubseries-level patches which are served as input tokens to Transformer; (ii)channel-independence where each channel contains a single univariate timeseries that shares the same embedding and Transformer weights across all theseries. Patching design naturally has three-fold benefit: local semanticinformation is retained in the embedding; computation and memory usage of theattention maps are quadratically reduced given the same look-back window; andthe model can attend longer history. Our channel-independent patch time seriesTransformer (PatchTST) can improve the long-term forecasting accuracysignificantly when compared with that of SOTA Transformer-based models. We alsoapply our model to self-supervised pre-training tasks and attain excellentfine-tuning performance, which outperforms supervised training on largedatasets. Transferring of masked pre-trained representation on one dataset toothers also produces SOTA forecasting accuracy. Code is available at:https://github.com/yuqinie98/PatchTST.
翻译
Related Links: