来自杂志 arXiv 的文献。
当前共找到 142 篇文献分享,本页显示第 121 - 140 篇。
121.
前进 (2022-08-24 22:22):
#paper arXiv:2208.04939v1 ,2022,U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration? 基于Transformer的网络由于其长距离建模能力,在可变形图像配准中越来越流行。然而本文认为,一个具有5层卷积Unet网络的感受野足以在不需要依赖长距离建模能力的情况下捕捉精确的图像形变。本文想要探究UNet网络在应用于医学图像配准时,与现代基于Transformer的方法相比是否已经过时?为此,作者提出了一个具有大的卷积核的UNet网络(LKU-Net),即通过在一个普通的UNet网络内嵌入平行的卷积块来争强网络的感受野。在公用3D IXI 大脑数据集上进行基于atlas的配准实验,作者证明了LKU-Net的变现依旧可以和如今最先进的基于Transformer的方法相当甚至超越,而且只用了TransMorph 1.12%的参数量和10.8%的计算量。作者进一步将算法应用在MICCAI 2021的配准比赛中,同样超越了Transmorph,目前排在第一。只对UNet进行了简单的改造,基于Unet的配准算法依旧可以达到最先进的效果,证明基于UNet的配准网络并未过时。
Abstract:
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net … >>>
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based methods are outdated compared to modern transformer-based approaches when applied to medical image registration. For this, we propose a large kernel U-Net (LKU-Net) by embedding a parallel convolutional block to a vanilla U-Net in order to enhance the effective receptive field. On the public 3D IXI brain dataset for atlas-based registration, we show that the performance of the vanilla U-Net is already comparable with that of state-of-the-art transformer-based networks (such as TransMorph), and that the proposed LKU-Net outperforms TransMorph by using only 1.12% of its parameters and 10.8% of its mult-adds operations. We further evaluate LKU-Net on a MICCAI Learn2Reg 2021 challenge dataset for inter-subject registration, our LKU-Net also outperforms TransMorph on this dataset and ranks first on the public leaderboard as of the submission of this work. With only modest modifications to the vanilla U-Net, we show that U-Net can outperform transformer-based architectures on inter-subject and atlas-based 3D medical image registration. Code is available at this https URL. <<<
翻译
122.
张浩彬 (2022-08-11 16:10):
#paper 10.48550/arXiv.1901.10738 Unsupervised Scalable Representation Learning for Multivariate Time Series 论文关键是:正负样本构造, triplet loss以及因果空洞卷积 适用:该无监督学习模型可以用于不定长的序列;短序列及长序列均可使用; 1.正负样本构造:对于某序列,随机选择长度,构造一个子序列。在这个子序列中,随机抽样一个子序列作为正样本;从其他序列中随机抽样作为一个负样本 2.改造的triplet loss 3. exponentially dilated causal convolutions作为特征提取器代替传统的rnn、lstm 结果表明由于现有的无监督方法,并且不差于有监督方法。
Abstract:
Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle this challenge by … >>>
Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle this challenge by proposing an unsupervised method to learn universal embeddings of time series. Unlike previous works, it is scalable with respect to their length and we demonstrate the quality, transferability and practicability of the learned representations with thorough experiments and comparisons. To this end, we combine an encoder based on causal dilated convolutions with a novel triplet loss employing time-based negative sampling, obtaining general-purpose representations for variable length and multivariate time series. <<<
翻译
123.
张浩彬 (2022-08-11 16:09):
#paper https://doi.org/10.48550/arXiv.2103.07719 Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting 对输入使用“Latent Correlation Layer”自动生成图结构;对图结构输入StemGNN层; 该层首先使用GFT(图傅里叶变换)将图转为谱矩阵( 其中每个节点的单变量时间序列变为线性独立),然后使用离散傅里叶变换对每个单变量分量转到频域,并利用一维卷积以及GLU提取特征模式,再通过逆离散傅里叶变换变回时域。另外,模型产生一个预测损失(对未来值),一个回溯损失(对历史值),对两个损失合并作为联合的损失函数。
Abstract:
Multivariate time-series forecasting plays a crucial role in many real-world applications. It is a challenging problem as one needs to consider both intra-series temporal correlations and inter-series correlations simultaneously. Recently, … >>>
Multivariate time-series forecasting plays a crucial role in many real-world applications. It is a challenging problem as one needs to consider both intra-series temporal correlations and inter-series correlations simultaneously. Recently, there have been multiple works trying to capture both correlations, but most, if not all of them only capture temporal correlations in the time domain and resort to pre-defined priors as inter-series relationships. In this paper, we propose Spectral Temporal Graph Neural Network (StemGNN) to further improve the accuracy of multivariate time-series forecasting. StemGNN captures inter-series correlations and temporal dependencies \textit{jointly} in the \textit{spectral domain}. It combines Graph Fourier Transform (GFT) which models inter-series correlations and Discrete Fourier Transform (DFT) which models temporal dependencies in an end-to-end framework. After passing through GFT and DFT, the spectral representations hold clear patterns and can be predicted effectively by convolution and sequential learning modules. Moreover, StemGNN learns inter-series correlations automatically from the data without using pre-defined priors. We conduct extensive experiments on ten real-world datasets to demonstrate the effectiveness of StemGNN. Code is available at this https URL <<<
翻译
124.
王昊 (2022-08-10 11:27):
#paper 10.48550/arXiv.2109.07872 TAN S, GE M, GUO D, 等. Knowledge-based Embodied Question Answering[J/OL]. 2021[2022-08-09]. https://arxiv.org/abs/2109.07872v1.清华孙富春组的文章,主要介绍具身智能体在AI2thor空间里回答针对周围环境的问题,且这些问题需要外部知识库的支持才能回答. 之前存在的问题:具身问答(EQA)不具备回答需要外部知识图谱的问题的能力(其实在KBVQA领域已经有人这么做过了),且不具备推理能力(其实什么可以被定义为推理挺难说的),多跳问答是一个较难的问题.,且现在的EQA系统不能使用遗忘的记忆来节省智能体重新探索的时间. 本文贡献: 1.提出了knowledge-EQA的任务,基于AI2THOR虚拟环境; 2.建立了数据集(数据集的种类只有一些很简单的问题,不是很难) 3.提出了基于 神经编程诊断、3D场景图、3D重建、问题转换为SQL语句、蒙特卡洛树搜索 等技术综合起来的方法来解决上述问题。
Abstract:
In this paper, we propose a novel Knowledge-based Embodied Question Answering (K-EQA) task, in which the agent intelligently explores the environment to answer various questions with the knowledge. Different from … >>>
In this paper, we propose a novel Knowledge-based Embodied Question Answering (K-EQA) task, in which the agent intelligently explores the environment to answer various questions with the knowledge. Different from explicitly specifying the target object in the question as existing EQA work, the agent can resort to external knowledge to understand more complicated question such as "Please tell me what are objects used to cut food in the room?", in which the agent must know the knowledge such as "knife is used for cutting food". To address this K-EQA problem, a novel framework based on neural program synthesis reasoning is proposed, where the joint reasoning of the external knowledge and 3D scene graph is performed to realize navigation and question answering. Especially, the 3D scene graph can provide the memory to store the visual information of visited scenes, which significantly improves the efficiency for the multi-turn question answering. Experimental results have demonstrated that the proposed framework is capable of answering more complicated and realistic questions in the embodied environment. The proposed method is also applicable to multi-agent scenarios. <<<
翻译
125.
张浩彬 (2022-08-09 17:26):
#paper 10.48550/arXiv.2203.03423 Multivariate Time Series Forecasting with Latent Graph Inference 2022的文章。我觉得比较有意思的是,我感觉作者是把简单的东西套在了一个高级的框架里面(这种写作思路值得学习)文章把多变量预测问题分成了两个路线,一个是全局单变量建模(变量共享),一个是直接全局建模全局预测。而作者说他的办法是在第一个方法的基础上进行模块化扩展。具体来说,就是每个单独序列输入编码器生成隐变量。隐变量三会进入一图结构中然后得到隐变量的预测输出。再将输出解码得到最终输出。然后作者说中间的图结构,我们有两种方式,一种是全连接图网络(FC-GNN),一种是二分法图网络(BP-GNN)(我理解是GNN中聚类的一种变体,至于多少类别,则是一个超参数)。这种思路,显然效率会有很大的提升,即使是作者提到的全局GNN,因为只是对隐变量作图,效率也是有提升,更不要说通过抽样构造子图了。所以比起基线模型效率最高,完全可以理解。倒是在准确率的讨论上,实际上作者提出的网络也不全是最优的(两个数据集,一个大部分最优,另一个不是)。虽然做了个简单的消融实验,但是作者也没怎么解释。 总结下来几点: (1)往上套一个大框架:多变量预测分成两种;embedding变成隐变量;图模型中提供了全连接+二分图的性能-效率权衡() (2)实验不够,加模拟(这一点还真类似统计中oracle性质的讨论,貌似在深度学习的会议中相对少见)
Abstract:
This paper introduces a new approach for Multivariate Time Series forecasting that jointly infers and leverages relations among time series. Its modularity allows it to be integrated with current univariate … >>>
This paper introduces a new approach for Multivariate Time Series forecasting that jointly infers and leverages relations among time series. Its modularity allows it to be integrated with current univariate methods. Our approach allows to trade-off accuracy and computational efficiency gradually via offering on one extreme inference of a potentially fully-connected graph or on another extreme a bipartite graph. In the potentially fully-connected case we consider all pair-wise interactions among time-series which yields the best forecasting accuracy. Conversely, the bipartite case leverages the dependency structure by inter-communicating the N time series through a small set of K auxiliary nodes that we introduce. This reduces the time and memory complexity w.r.t. previous graph inference methods from O(N^2) to O(NK) with a small trade-off in accuracy. We demonstrate the effectiveness of our model in a variety of datasets where both of its variants perform better or very competitively to previous graph inference methods in terms of forecasting accuracy and time efficiency. <<<
翻译
126.
林海onrush (2022-08-07 22:47):
#paper arXiv:2207.03530v1 [cs.RO] 7 Jul 2022,VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning,https://deepai.org/publication/vmas-a-vectorized-multi-agent-simulator-for-collective-robot-learning 剑桥大学提出多智能体联合强化学习框架VMAS 虽然许多多机器人协调问题可以通过精确的算法得到最佳解决,但解决方案在机器人的数量上往往是不可扩展的。多智能体强化学习(MARL)作为解决这类问题的一个有希望的解决方案,在机器人界越来越受到关注。然而,仍然缺乏能够快速有效地找到大规模集体学习任务解决方案的工具。在这项工作中,介绍了VMAS。VMAS是一个开源的框架,为高效的MARL基准测试而设计。它由一个用PyTorch编写的矢量二维物理引擎和一套12个具有挑战性的多机器人场景组成。其他场景可以通过一个简单的模块化接口来实现。 本文展示了矢量化是如何在不增加复杂性的情况下在加速硬件上实现并行仿真的,比较了VMAS和目前的最优框架OpenAI MPE,表明了其速度超过了MPE100倍,同时本文使用VMAS进行了各种基准测试,表明了现有算法存在的挑战。 VMAS 能够在 10 秒内执行 30,000 次并行仿真,速度提高了 100 倍以上。使用 VMAS 的 RLlib 接口,我们使用各种基于近端策略优化 (PPO) 的 MARL 算法对我们的多机器人场景进行基准测试。 VMAS 的场景在最先进的 MARL 算法的正交方法。 VMAS 框架可在以下网址获得并可进行复现:https://github.com/proroklab/VectorizedMultiAgentSimulator
arXiv, 2022. DOI: arXiv:2207.03530
Abstract:
While many multi-robot coordination problems can be solved optimally by exact algorithms, solutions are often not scalable in the number of robots. Multi-Agent Reinforcement Learning (MARL) is gaining increasing attention … >>>
While many multi-robot coordination problems can be solved optimally by exact algorithms, solutions are often not scalable in the number of robots. Multi-Agent Reinforcement Learning (MARL) is gaining increasing attention in the robotics community as a promising solution to tackle such problems. Nevertheless, we still lack the tools that allow us to quickly and efficiently find solutions to large-scale collective learning tasks. In this work, we introduce the Vectorized Multi-Agent Simulator (VMAS). VMAS is an open-source framework designed for efficient MARL benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of twelve challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface. We demonstrate how vectorization enables parallel simulation on accelerated hardware without added complexity. When comparing VMAS to OpenAI MPE, we show how MPE's execution time increases linearly in the number of simulations while VMAS is able to execute 30,000 parallel simulations in under 10s, proving more than 100x faster. Using VMAS's RLlib interface, we benchmark our multi-robot scenarios using various Proximal Policy Optimization (PPO)-based MARL algorithms. VMAS's scenarios prove challenging in orthogonal ways for state-of-the-art MARL algorithms. The VMAS framework is available at this https URL. A video of VMAS scenarios and experiments is available at this https URL}{here}\footnote{\url{this https URL. <<<
翻译
127.
尹志 (2022-07-30 22:41):
#paper https://doi.org/10.48550/arXiv.2205.01529 Masked Generative Distillation ECCV 2022. 这是一篇知识蒸馏的文章,通过类似对比学习的方式去生成特征,从而实现蒸馏。我们知道,知识蒸馏作为一个通用的技巧,已经被用于各类 机器学习任务,在视觉上比如分类、分割、检测等。一般来说蒸馏算法通过使得学生模仿老师特征去提高学生特征的表征能力。但这篇文章提出,学生不用去模仿老师的特征了,干脆自己生成特征好了,即通过对学生特征进行随机遮盖,然后用学生的部分特征去生成老师特征。这样学生特征就具有了较强的表征能力。这个想法很有意思,我打个比方(可能不太合适),就像本来是要学习老师的一举一动,但是现在这个老师不太出现,你不方便直接模仿,那就学生自己通过监督,去盲猜老师的特征什么样的,这样多猜几次,每次都能猜准的时候,说明对这位老师已经很熟悉了,然后说明学生的表征能力就比较强了。通过这个方式,作者在图像分类、目标检测、语义分割、实例分割等多种任务上,在不同数据集不同model的基础上,做了大量实验,发现性能都得到了提升(基本上都有2-3个点的提升,具体数值见文献)。
Abstract:
Knowledge distillation has been applied to various tasks successfully. The current distillation algorithm usually improves students' performance by imitating the output of the teacher. This paper shows that teachers can … >>>
Knowledge distillation has been applied to various tasks successfully. The current distillation algorithm usually improves students' performance by imitating the output of the teacher. This paper shows that teachers can also improve students' representation power by guiding students' feature recovery. From this point of view, we propose Masked Generative Distillation (MGD), which is simple: we mask random pixels of the student's feature and force it to generate the teacher's full feature through a simple block. MGD is a truly general feature-based distillation method, which can be utilized on various tasks, including image classification, object detection, semantic segmentation and instance segmentation. We experiment on different models with extensive datasets and the results show that all the students achieve excellent improvements. Notably, we boost ResNet-18 from 69.90% to 71.69% ImageNet top-1 accuracy, RetinaNet with ResNet-50 backbone from 37.4 to 41.0 Boundingbox mAP, SOLO based on ResNet-50 from 33.1 to 36.2 Mask mAP and DeepLabV3 based on ResNet-18 from 73.20 to 76.02 mIoU. Our codes are available at this https URL. <<<
翻译
128.
王昊 (2022-07-28 09:51):
#paper doi:10.48550/arXiv.2207.04630 Yi Ma, Doris Tsao, and Heung-Yeung Shum. 2022. On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence. 作者马毅数学功底很好,和做神经科学的Doris Tsao合作的一篇讲述他们认为的2个重要的AI基本原理的文章。本文提出了一个理解深度神经网络的新框架:压缩闭环转录,并回答了从数据中学习的目标是什么,如何衡量?(信息编码论)以及 如何通过高效和有效的计算实现这样的目标?(控制)这两个问题。提出理解AI的两个基本原理:简约性与自洽性。
Abstract:
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in … >>>
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that address two fundamental questions regarding Intelligence: what to learn and how to learn, respectively. We believe the two principles are the cornerstones for the emergence of Intelligence, artificial or natural. While these two principles have rich classical roots, we argue that they can be stated anew in entirely measurable and computable ways. More specifically, the two principles lead to an effective and efficient computational framework, compressive closed-loop transcription, that unifies and explains the evolution of modern deep networks and many artificial intelligence practices. While we mainly use modeling of visual data as an example, we believe the two principles will unify understanding of broad families of autonomous intelligent systems and provide a framework for understanding the brain. <<<
翻译
129.
张德祥 (2022-07-19 18:49):
#paper https://doi.org/10.48550/arXiv.2207.04630 On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence 马毅的这篇论文已经有公众号报道过了,马毅结合自己的之前的两个工作,LDR 数据压缩及闭环生成模型的深度网络,将压缩和闭环生成提炼为简约和自洽的智能原则,本论文继续提出了更多通用性的想法,并扩展到3d视觉及强化学习并预测对神经科学及高级智能的影响。
Abstract:
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in … >>>
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that address two fundamental questions regarding Intelligence: what to learn and how to learn, respectively. We believe the two principles are the cornerstones for the emergence of Intelligence, artificial or natural. While these two principles have rich classical roots, we argue that they can be stated anew in entirely measurable and computable ways. More specifically, the two principles lead to an effective and efficient computational framework, compressive closed-loop transcription, that unifies and explains the evolution of modern deep networks and many artificial intelligence practices. While we mainly use modeling of visual data as an example, we believe the two principles will unify understanding of broad families of autonomous intelligent systems and provide a framework for understanding the brain. <<<
翻译
130.
王昊 (2022-06-30 17:08):
#paper doi:https://doi.org/10.48550/arXiv.2201.12086 BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. arXiv:2201.12086 [cs]. BLIP 是一个统一的视觉语言预训练(vision-language pre-training, VLP)框架,从有噪声的图像文本对中学习。 BLIP 通过自展标注(bootstrapping the captions),可以有效地利用带有噪声的 web 数据,其中标注器(captioner)生成标注,过滤器(filter)去除有噪声的标注。本模型属于开源的视觉语言模型中性能较好的(2022年6月),可以直接docker部署,应用于多个视觉语言下游任务。我们尝试了以后可以一定程度上实现zero-shot的功能。在VQA 2.0数据集上性能较好。思考下一步将其作为预训练模型,微调后应用于落地的其它下游任务。
Abstract:
Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been … >>>
Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. We achieve state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval (+2.7% in average recall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score). BLIP also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner. Code, models, and datasets are released at this https URL. <<<
翻译
131.
Ricardo (2022-05-30 23:39):
#paper https://arxiv.org/abs/2102.04159v3 Deep Residual Learning in Spiking Neural Networks. 2021年发表于NIPS。基于人工神经网络的现代深度学习技术在各个领域上都取得了相当大的进展,但是由于其数学上的黑箱不可解释性、功耗高的问题,有一部分研究开始关注于基于生物脉冲神经元的脉冲神经网络上(SNN)。SNN有较高的生物解释性、事件驱动性和低功耗等特点,被视为人工神经网络的潜在竞争对手。但是SNN仍然面临许多理论和工程问题,在一些复杂任务上的表现仍然比ANN差。基于残差学习在ANN上取得的巨大成功,自然会去研究如何利用残差学习去训练SNN。之前的一些研究仿照ANN中标准的残差模块,简单地将relu激活函数替换成脉冲神经元,但是这样的网络伴随着深度的增加会出现退化问题,从而难以实现残差学习。在这篇论文里,作者证明了之前在SNN上的残差学习方法会导致梯度爆炸/消失问题,从而难以实现identity mapping。因此,他们提出了一个方法用来解决这么一个梯度爆炸/消失问题。实验结果也挺漂亮的,在多个数据集上都比之前的snn方法更好,当然不如ann的结果啦。并且能够通过加深网络深度提高snn的performance。而且,也首次实现了能够直接训练超过100层的snn。
Abstract:
Deep Spiking Neural Networks (SNNs) present optimization difficulties for gradient-based approaches due to discrete binary activation and complex spatial-temporal dynamics. Considering the huge success of ResNet in deep learning, it … >>>
Deep Spiking Neural Networks (SNNs) present optimization difficulties for gradient-based approaches due to discrete binary activation and complex spatial-temporal dynamics. Considering the huge success of ResNet in deep learning, it would be natural to train deep SNNs with residual learning. Previous Spiking ResNet mimics the standard residual block in ANNs and simply replaces ReLU activation layers with spiking neurons, which suffers the degradation problem and can hardly implement residual learning. In this paper, we propose the spike-element-wise (SEW) ResNet to realize residual learning in deep SNNs. We prove that the SEW ResNet can easily implement identity mapping and overcome the vanishing/exploding gradient problems of Spiking ResNet. We evaluate our SEW ResNet on ImageNet, DVS Gesture, and CIFAR10-DVS datasets, and show that SEW ResNet outperforms the state-of-the-art directly trained SNNs in both accuracy and time-steps. Moreover, SEW ResNet can achieve higher performance by simply adding more layers, providing a simple method to train deep SNNs. To our best knowledge, this is the first time that directly training deep SNNs with more than 100 layers becomes possible. <<<
翻译
132.
张浩彬 (2022-05-30 19:14):
#paper Wen, Ruofeng, et al. A Multi-Horizon Quantile Recurrent Forecaster. #paper Wen, Ruofeng, et al. A Multi-Horizon Quantile Recurrent Forecaster. DOI: 10.48550/arXiv.1711.11053 MQRNN,又是亚马逊的时序论文。之前看了DeepAR,可以对多个序列进行建模,并且也有很好的鲁棒性。但是相比之前的prophet和DeepAR,MQRNN走了另外一个路子,基于分位数的预测。这样的一个好处是,它认为我们不再去预测序列在t时刻的分布,而是预测t时刻的分位数,走了分位数回归的路子。另外,相比于DeepAR,MQRNN使用了水平多无预测,即不再采用迭代方式预测多步,而是一次性产生多步预测。按照论文的说法,这样的好处是提高了预测效率(毕竟可以并行),减少了累积误差(个人觉得这点,见仁见智,本质其实一样)
Abstract:
We propose a framework for general probabilistic multi-step time series regression. Specifically, we exploit the expressiveness and temporal nature of Sequence-to-Sequence Neural Networks (e.g. recurrent and convolutional structures), the nonparametric … >>>
We propose a framework for general probabilistic multi-step time series regression. Specifically, we exploit the expressiveness and temporal nature of Sequence-to-Sequence Neural Networks (e.g. recurrent and convolutional structures), the nonparametric nature of Quantile Regression and the efficiency of Direct Multi-Horizon Forecasting. A new training scheme, *forking-sequences*, is designed for sequential nets to boost stability and performance. We show that the approach accommodates both temporal and static covariates, learning across multiple related series, shifting seasonality, future planned event spikes and cold-starts in real life large-scale forecasting. The performance of the framework is demonstrated in an application to predict the future demand of items sold on this http URL, and in a public probabilistic forecasting competition to predict electricity price and load. <<<
翻译
133.
尹志 (2022-05-30 13:31):
#paper https://doi.org/10.48550/arXiv.1907.05600 Generative Modeling by Estimating Gradients of the Data Distribution NeurIPS 2019 (Oral) (2019). 继续生成模型啊,这篇文章作者提出了一种基于评分的生成模型。我们知道现在主流的生成模型基本可以分为likelihood-based和类似GAN那样通过对抗而不计算具体的概率密度函数的隐式模型。前者的代表如VAE、normalizing flow等。而本文的模型也属于这个范畴。在这类模型中,由于需要对条件概率进行积分,归一化常数Z的计算非常困难,因此派生出各类解决方法。本文其核心思想是通过对概率密度的梯度进行建模估计(准确来说是对log概率密度函数)。这里的log概率密度函数的梯度被定义为score function,而作者也是通过评分匹配(score matching)进行估计的。在生成模型建立之后,进而通过Langevin动力学进行采样,即生成样本。部分细节还在推,代码也在复现中,感觉是一类比较有效的生成模型,生成图片的质量较高,改进版本已经可以和GAN的生成质量一较高下。但目前最大的问题是废卡,非常废卡,希望后面自己可以在如何提高其训练效率及抽样效率上做一些工作。
Abstract:
We introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching. Because gradients can be ill-defined and hard … >>>
We introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching. Because gradients can be ill-defined and hard to estimate when the data resides on low-dimensional manifolds, we perturb the data with different levels of Gaussian noise, and jointly estimate the corresponding scores, i.e., the vector fields of gradients of the perturbed data distribution for all noise levels. For sampling, we propose an annealed Langevin dynamics where we use gradients corresponding to gradually decreasing noise levels as the sampling process gets closer to the data manifold. Our framework allows flexible model architectures, requires no sampling during training or the use of adversarial methods, and provides a learning objective that can be used for principled model comparisons. Our models produce samples comparable to GANs on MNIST, CelebA and CIFAR-10 datasets, achieving a new state-of-the-art inception score of 8.87 on CIFAR-10. Additionally, we demonstrate that our models learn effective representations via image inpainting experiments. <<<
翻译
134.
张德祥 (2022-05-02 09:28):
#paper https://doi.org/10.48550/arXiv.2001.04385 Universal Differential Equations for Scientific Machine Learnin 我们提供一流的工具来求解微分方程 我们提供用于推导和拟合科学模型的工具 我们提供高级域特定建模工具,使科学建模更易于访问 我们提供科学机器学习中最新算法的高级实现 我们为所有常见科学编程语言的用户提供使用我们工具的能力 我们提供用于研究科学机器学习方法的工具 我们的目标是什么 我们构建的一切都与自动微分兼容 性能被视为优先事项,性能问题被视为错误 我们的软件包使用科学模拟和机器学习工具进行了常规和稳健的测试 我们紧跟计算硬件的进步,以确保与最新的高性能计算工具兼容。 https://mp.weixin.qq.com/s/jR_2A1IqqZ1J8idmXb9Tpg
Abstract:
In the context of science, the well-known adage "a picture is worth a thousand words" might well be "a model is worth a thousand datasets." In this manuscript we introduce … >>>
In the context of science, the well-known adage "a picture is worth a thousand words" might well be "a model is worth a thousand datasets." In this manuscript we introduce the SciML software ecosystem as a tool for mixing the information of physical laws and scientific models with data-driven machine learning approaches. We describe a mathematical object, which we denote universal differential equations (UDEs), as the unifying framework connecting the ecosystem. We show how a wide variety of applications, from automatically discovering biological mechanisms to solving high-dimensional Hamilton-Jacobi-Bellman equations, can be phrased and efficiently handled through the UDE formalism and its tooling. We demonstrate the generality of the software tooling to handle stochasticity, delays, and implicit constraints. This funnels the wide variety of SciML applications into a core set of training mechanisms which are highly optimized, stabilized for stiff equations, and compatible with distributed parallelism and GPU accelerators. <<<
翻译
135.
张德祥 (2022-05-01 09:56):
#paper https://doi.org/10.48550/arXiv.2204.07953 Learning with Signatures mnist等识别100% ,这个结果一下子炸了锅了,reddit质疑诋毁一片, https://github.com/decurtoydiaz/learning_with_signatures/issues 的讨论也很激动,但是作者开放了代码,回应了质疑,https://www.kaggle.com/code/mlsnatcher/replicate-results-signature-model/notebook也有可以直接运行的代码,在issue5讨论中作者也承认了有一个不足,除了不认可,是否可以深入了解一下这个技术具体使用的方法?论文不用深度学习,使用了:The signature was first defined for smooth paths by Chen in the 60s (Chen, 1957; 1958; 1977) and was rediscovered in the 90s in the context of rough path theory;这个数学很难,想真正搞懂这个论文的底细很难,挑战很大,搞懂了也是本事,如果技术真的ok,那也是领先一步。
Abstract:
In this work we investigate the use of the Signature Transform in the context of Learning. Under this assumption, we advance a supervised framework that potentially provides state-of-the-art classification accuracy … >>>
In this work we investigate the use of the Signature Transform in the context of Learning. Under this assumption, we advance a supervised framework that potentially provides state-of-the-art classification accuracy with the use of few labels without the need of credit assignment and with minimal or no overfitting. We leverage tools from harmonic analysis by the use of the signature and log-signature, and use as a score function RMSE and MAE Signature and log-signature. We develop a closed-form equation to compute probably good optimal scale factors, as well as the formulation to obtain them by optimization. Techniques of Signal Processing are addressed to further characterize the problem. Classification is performed at the CPU level orders of magnitude faster than other methods. We report results on AFHQ, MNIST and CIFAR10, achieving 100% accuracy on all tasks assuming we can determine at test time which probably good optimal scale factor to use for each category. <<<
翻译
136.
Ricardo (2022-04-30 20:39):
#paper https://doi.org/10.48550/arXiv.1806.09055 DARTS: differentiable architecture search ICLR(2019) Neural Architectural Search (NAS) 这个问题是出了名的消耗算力,动不动就需要消耗上千个gpu hour,基本也只能在顶级的研究机构做这类研究。这篇文章没有使用类似于进化算法或者强化学习这样的方法在离散和不可微的空间中搜索网络架构, 而是通过对神经网络的架构表征进行松弛,将NAS问题转化为一个可微分的形式,从而能够使用梯度下降法在连续空间中搜索神经网络架构。作者将这个问题建模成一个bilevel的优化问题,然后提出了一个类似于EM算法的优化方法,通过交替优化模型架构参数\alpha和模型权重w来找到较优的模型架构\alpha 。由于优化过程中涉及二阶导的计算,作者进一步对二阶导的计算做了松弛,将其转化为形式为一阶导的估计,从而进一步降低了方法的复杂度。结果也都很漂亮,相比于之前那些动辄需要上千个gpu day的计算量,darts方法只需要几个gpu day的计算,而且也能达到差不多的效果。
Abstract:
This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and … >>>
This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms. <<<
翻译
137.
尹志 (2022-04-28 22:10):
#paper https://doi.org/10.48550/arXiv.1503.03585 Deep Unsupervised Learning using Nonequilibrium Thermodynamics ICML (2015). 这是一篇还没完全看懂的论文,但是非常有意思。说起这篇文章的扩散模型大家一不定熟悉,但是提到最近大火的openai的工作dall-e 2,大家可能会更熟悉一点。对,Dall-E 2最早的启发就是这篇文章。本文受非平衡热力学的启发,设计了一个称之为扩散模型(diffusion model)的生成模型。我们知道,在机器学习中,对一堆数据的分布进行估计是一个极具挑战的事情。特别是要兼顾模型的灵活性(flexible)和过程的可解性(tractable)。如果把建模隐变量z到观测量x的映射作为任务,那么扩散模型的想法是, 假设整个映射是一个马尔科夫链(MC),然后数据的初始状态是由一步步不断添加高斯噪声,最终获得某种最终形态,那么反过来,可以将去噪的过程看做是生成的过程。我们针对这个MC过程进行训练,那么逆过程则可以作为生成模型生成符合分布的数据。是的,很像VAE。考虑到这类生成模型通过不断的改进,已经达到Dall-E 2的效果,值得我们深入理解背后的机制,以及是否可以在数据合成上产生更好的效果。
Abstract:
A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. … >>>
A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm. <<<
翻译
138.
张德祥 (2022-03-24 23:05):
#paper https://doi.org/10.48550/arXiv.2112.14045 Learning from What’s Right and Learning from What’s Wrong 最新的贝叶斯推理论文,详见推文:https://mp.weixin.qq.com/s/OEcXvyqxYNTCbTK7KUrEjw
Abstract:
The concept of updating (or conditioning or revising) a probability distribution is fundamental in (machine) learning and in predictive coding theory. The two main approaches for doing so are called … >>>
The concept of updating (or conditioning or revising) a probability distribution is fundamental in (machine) learning and in predictive coding theory. The two main approaches for doing so are called Pearl's rule and Jeffrey's rule. Here we make, for the first time, mathematically precise what distinguishes them: Pearl's rule increases validity (expected value) and Jeffrey's rule decreases (Kullback-Leibler) divergence. This forms an instance of a more general distinction between learning from what's right and learning from what's wrong. The difference between these two approaches is illustrated in a mock cognitive scenario. <<<
翻译
139.
Vincent (2022-02-28 15:50):
#paper What are the most important statistical ideas of the past 50 years? #Link: https://arxiv.org/abs/2012.00174 导读:作者Andrew Gelman是哥伦比亚大学统计系的教授,也是经济学人等杂志的资深统计顾问,2020年当选美国科学院院士。2021年他在arxiv上发布了这篇备受统计学家关注的文章。文中总结了过去50年来统计学领域最为重要的八大思想(he thinks) 1. 因果推断;2. bootstrap和基于模拟的推断;3.超参数模型和正则化;4.层次结构模型;5.通用计算算法;6.自适应判定分析;7.鲁棒性推断;8.探索性数据分析。个人认为第一点和第三点尤其得当。第三点基本可以囊括很多machine leanring的算法。而第一点直接影响着人们的决策和认知,多数时候我们总把相关关系误认为因果(在社会科学领域尤甚),大家如果有幸观察到网上的各类争论,不妨从这点来审视他们在论证中有没有犯这种常识性的错误。
140.
物品师 (2022-02-21 05:03):
#paper doi.10.48550 [arxiv.2111.08575] 标题GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving作者Raphael Chekroun, Marin Toromanoff, Sascha Hornauer, Fabien Moutarde领域Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV).链接https://arxiv.org/abs/2111.08575引用arXiv:2111.08575 [cos.RO](or arXiv:2111.08575v1 [cs.RO] for this version) https://doi.org/10.48550/arXiv.2111.08575摘要:深度强化学习 (DRL) 已被证明对自动驾驶和机器人等多种复杂决策应用程序有效。 然而,众所周知,DRL 因其高样本复杂性和缺乏稳定性而受到限制。 先验知识,例如 作为专家演示,通常可用,但难以利用来缓解这些问题。 在本文中,我们提出了通用强化模仿 (GRI),这是一种结合了探索和专家数据的好处的新方法,并且可以直接在任何非策略 RL 算法上实现。 我们做了一个简化的假设:专家演示可以被视为完美的数据,其基础策略会获得持续的高回报。 基于这个假设,GRI 引入了离线演示代理的概念。 该代理发送专家数据,这些数据与来自在线 RL 探索代理的经验同时处理且无法区分。 我们表明,我们的方法可以在城市环境中对基于视觉的自动驾驶进行重大改进。 我们进一步验证了具有不同离策略 RL 算法的 Mujoco 连续控制任务的 GRI 方法。 我们的方法在 CARLA 排行榜上排名第一,并且比之前最先进的 World on Rails 的性能高出 17%。
回到顶部