尹志 (2025-05-31 21:23):
#paper https://doi.org/10.48550/arXiv.2012.07436 Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting。这是AAAI2021上的一篇关于长序列时序建模的经典工作。文章对传统Transformer进行了改进,提出了一类新的模型Informer,通过对self attention的改进和蒸馏,以及generative style decoder的构建,在时间复杂度、空间复杂度上都改善了传统Transformer存在的问题。该工作在多个数据集上取得了良好的性能。上述的几个思路在后续的时序建模中被频繁使用,非常具有启发性。
arXiv, 2020-12-14T11:43:09Z. DOI: 10.48550/arXiv.2012.07436
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, Wancai Zhang
Abstract:
Many real-world applications require the prediction of long sequence<br>time-series, such as electricity consumption planning. Long sequence<br>time-series forecasting (LSTF) demands a high prediction capacity of the model,<br>which is the ability to capture precise long-range dependency coupling between<br>output and input efficiently. Recent studies have shown the potential of<br>Transformer to increase the prediction capacity. However, there are several<br>severe issues with Transformer that prevent it from being directly applicable<br>to LSTF, including quadratic time complexity, high memory usage, and inherent<br>limitation of the encoder-decoder architecture. To address these issues, we<br>design an efficient transformer-based model for LSTF, named Informer, with<br>three distinctive characteristics: (i) a $ProbSparse$ self-attention mechanism,<br>which achieves $O(L \log L)$ in time complexity and memory usage, and has<br>comparable performance on sequences' dependency alignment. (ii) the<br>self-attention distilling highlights dominating attention by halving cascading<br>layer input, and efficiently handles extreme long input sequences. (iii) the<br>generative style decoder, while conceptually simple, predicts the long<br>time-series sequences at one forward operation rather than a step-by-step way,<br>which drastically improves the inference speed of long-sequence predictions.<br>Extensive experiments on four large-scale datasets demonstrate that Informer<br>significantly outperforms existing methods and provides a new solution to the<br>LSTF problem.
回到顶部