来自杂志 arXiv 的文献。
当前共找到 132 篇文献分享,本页显示第 101 - 120 篇。
101.
张德祥
(2022-10-18 10:58):
#paper https://doi.org/10.48550/arXiv.2208.10601Deriving time-averaged active inference from control principles 通过观察随时反馈调整规划的理论实现, 假设固定的动作空间和前馈规划,这可能导致非常高维的递归优化问题。这些假设在经验上和计算上都是有问题的。有机体并不是生来就知道[9];他们学习[40]. 噪音[13,32], 不确定[23], 和可变性[47] 在运动控制方面不够完善,因此必须通过在线反馈来稳定运动控制。
随机最优反馈控制需要一个最优性原则,允许在行动步骤之间整合观察。而不是递归优化单独的动作,通过观察随时反馈调整规划序列。
尽管优化了“全局”(不确定)惊奇率(等式),它只需要在情境中规划和调整行为。
泰德帕里和 Ok[55] 1998 年发表了第一个基于模型的 RL 算法,而 Baxter 和 Bartlett[5] 给出了有偏的政策梯度估计量。亚历山大和布朗又花了十年时间[2]以给出平均成本时间差异学习的递归分解。张与罗斯[61] 直到最近,我才首次发表了“深度”强化学习算法(基于函数逼近)对平均成本标准的适应,该标准仍然是无模型的。Jafarnia-Jahromi 等人[26]最近给出了第一个算法 , 用 于 求 解 具 有 已 知 观 测 密 度 和 未 知 动 态 的 无 限 时 域 平 均 代 价 部 分 可 观 测 问 题 。
结论 这结束了主动推理的无限视野、平均惊奇公式的推导。由于我们的公式将行为情节置于情境中,所以尽管优化了“全局”(不确定)惊奇率(等式),它只需要在情境中规划和调整行为(例如,从时间步长 1 到 T)15). 我们认为,这种积极推理公式可以推进基于模型的概率方法,分层反馈控制[40,33].
arXiv,
2022.
DOI: 10.48550/arXiv.2208.10601
Abstract:
Active inference offers a principled account of behavior as minimizing average sensory surprise over time. Applications of active inference to control problems have heretofore tended to focus on finite-horizon or …
>>>
Active inference offers a principled account of behavior as minimizing average sensory surprise over time. Applications of active inference to control problems have heretofore tended to focus on finite-horizon or discounted-surprise problems, despite deriving from the infinite-horizon, average-surprise imperative of the free-energy principle. Here we derive an infinite-horizon, average-surprise formulation of active inference from optimal control principles. Our formulation returns to the roots of active inference in neuroanatomy and neurophysiology, formally reconnecting active inference to optimal feedback control. Our formulation provides a unified objective functional for sensorimotor control and allows for reference states to vary over time.
<<<
翻译
102.
Arwen
(2022-09-30 23:41):
#paper doi:https://doi.org/10.48550/arXiv.2202.02000,Cross-Modality Multi-Atlas Segmentation via Deep Registration and Label Fusion 基于多图谱的分割技术是医学影像分割问题中一个比较有效的方法。一般来说,多图谱技术通过将多个图谱非线性配准到个体图像,并将对应的图谱分割图变换到个体图像空间,并利用融合算法融合多图谱分割图得到个体图像的分割图。但是,传统的多图谱分割技术受限两点:一是配准过程计算量太大,二是标签融合算法会影响到最终分割图的精度。这篇文章构建了两个神经网络,一个网络用于生成形变场,将图谱映射到个体空间,另一个网络用于计算各个图谱分割标签的融合权重,用于后续的分割图融合。不过这篇文章做的一般,我个人觉得不咋地。配准网络部分明明使用scaling and squaring算法就可以生成合理的形变场,非要做没啥必要的创新,应该就是强行扩充文章内容吧。
arXiv,
2022.
DOI: 10.48550/arXiv.2202.02000
Abstract:
Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation. Generally, MAS methods register multiple atlases, i.e., medical images with corresponding labels, to a target image; and the transformed …
>>>
Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation. Generally, MAS methods register multiple atlases, i.e., medical images with corresponding labels, to a target image; and the transformed atlas labels can be combined to generate target segmentation via label fusion schemes. Many conventional MAS methods employed the atlases from the same modality as the target image. However, the number of atlases with the same modality may be limited or even missing in many clinical applications. Besides, conventional MAS methods suffer from the computational burden of registration or label fusion procedures. In this work, we design a novel cross-modality MAS framework, which uses available atlases from a certain modality to segment a target image from another modality. To boost the computational efficiency of the framework, both the image registration and label fusion are achieved by well-designed deep neural networks. For the atlas-to-target image registration, we propose a bi-directional registration network (BiRegNet), which can efficiently align images from different modalities. For the label fusion, we design a similarity estimation network (SimNet), which estimates the fusion weight of each atlas by measuring its similarity to the target image. SimNet can learn multi-scale information for similarity estimation to improve the performance of label fusion. The proposed framework was evaluated by the left ventricle and liver segmentation tasks on the MM-WHS and CHAOS datasets, respectively. Results have shown that the framework is effective for cross-modality MAS in both registration and label fusion.
<<<
翻译
103.
Ricardo
(2022-09-30 23:32):
#paper doi:https://doi.org/10.48550/arXiv.2202.03563,Aladdin: Joint Atlas Building and Diffeomorphic Registration Learning with Pairwise Alignment 图谱构建和图像配准是医学影像分析中的重要任务,但是图谱估计和无参形变的计算需要极高的计算代价。此外,以前的图谱构建方法通常计算模糊图谱和每个单独的图像之间的相似度驱动模型优化,这可能会增加预估的图谱和个体图像之间配准的难度,因为预估的模糊图谱相比个体图像不具有更清楚的解剖结构。这篇文章基于forward model从多个角度约束了图谱的生成空间,并做了充足的理论分析。但是由于模型较为复杂,并且涉及所有图像的同时优化,所以不太适合3d图像数据,目前还只是在2d图像数据上做实验。
arXiv,
2022.
DOI: 10.48550/arXiv.2202.03563
Abstract:
Atlas building and image registration are important tasks for medical image analysis. Once one or multiple atlases from an image population have been constructed, commonly (1) images are warped into …
>>>
Atlas building and image registration are important tasks for medical image analysis. Once one or multiple atlases from an image population have been constructed, commonly (1) images are warped into an atlas space to study intra-subject or inter-subject variations or (2) a possibly probabilistic atlas is warped into image space to assign anatomical labels. Atlas estimation and nonparametric transformations are computationally expensive as they usually require numerical optimization. Additionally, previous approaches for atlas building often define similarity measures between a fuzzy atlas and each individual image, which may cause alignment difficulties because a fuzzy atlas does not exhibit clear anatomical structures in contrast to the individual images. This work explores using a convolutional neural network (CNN) to jointly predict the atlas and a stationary velocity field (SVF) parameterization for diffeomorphic image registration with respect to the atlas. Our approach does not require affine pre-registrations and utilizes pairwise image alignment losses to increase registration accuracy. We evaluate our model on 3D knee magnetic resonance images (MRI) from the OAI-ZIB dataset. Our results show that the proposed framework achieves better performance than other state-of-the-art image registration algorithms, allows for end-to-end training, and for fast inference at test time.
<<<
翻译
104.
林海onrush
(2022-09-30 22:25):
#paper arXiv, 2209.00796 (2022) , Diffusion Models: A Comprehensive Survey of Methods and Applications, Diffusion model在诸多领域都有着优异的表现,并且考虑到不同领域的应用中diffusion model产生了不同的变形,论文系统地介绍了diffusion model的应用研究,其中包含如下领域:计算机视觉,NLP、波形信号处理、多模态建模、分子图建模、时间序列建模、对抗性净化。工作的主要贡献总结如下:新的分类方法:我们对扩散模型和其应用提出了一种新的、系统的分类法。具体将模型分为三类:采样速度增强、最大似然估计增强、数据泛化增强。进一步地,将扩散模型的应用分为七类:计算机视觉,NLP、波形信号处理、多模态建模、分子图建模、时间序列建模、对抗性净化。全面地概述了现代扩散模型及其应用,展示了每种扩散模型的主要改进,和原始模型进行了必要的比较,并总结了相应的论文。扩散模型的基本思想是正向扩散过程来系统地扰动数据中的分布,然后通过学习反向扩散过程恢复数据的分布,这样就了产生一个高度灵活且易于计算的生成模型。
arXiv,
2022.
DOI: 10.48550/arXiv.2209.00796
Abstract:
Diffusion models are a class of deep generative models that have shown impressive results on various tasks with a solid theoretical foundation. Despite demonstrated success than state-of-the-art approaches, diffusion models …
>>>
Diffusion models are a class of deep generative models that have shown impressive results on various tasks with a solid theoretical foundation. Despite demonstrated success than state-of-the-art approaches, diffusion models often entail costly sampling procedures and sub-optimal likelihood estimation. Significant efforts have been made to improve the performance of diffusion models in various aspects. In this article, we present a comprehensive review of existing variants of diffusion models. Specifically, we provide the taxonomy of diffusion models and categorize them into three types: sampling-acceleration enhancement, likelihood-maximization enhancement, and data-generalization enhancement. We also introduce the other generative models (i.e., variational autoencoders, generative adversarial networks, normalizing flow, autoregressive models, and energy-based models) and discuss the connections between diffusion models and these generative models. Then we review the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification. Furthermore, we propose new perspectives pertaining to the development of generative models. Github: this https URL.
<<<
翻译
105.
尹志
(2022-09-30 11:06):
#paper doi:10.48550/arXiv.1907.10830 U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation, ICLR 2020. 这又是一篇图像翻译的文章,还是在网络结构上做了有效的改进。作者通过提出一个新的注意力模块和一种新的归一化函数实现无监督的图像翻译工作。作者提出的注意力模块对于图像的几何形变能够做出很好的处理,这也让文章的架构对于很多艺术风格的变化处理具有优越的效果。
arXiv,
2019.
DOI: 10.48550/arXiv.1907.10830
Abstract:
We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner. The attention module guides our …
>>>
We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner. The attention module guides our model to focus on more important regions distinguishing between source and target domains based on the attention map obtained by the auxiliary classifier. Unlike previous attention-based method which cannot handle the geometric changes between domains, our model can translate both images requiring holistic changes and images requiring large shape changes. Moreover, our new AdaLIN (Adaptive Layer-Instance Normalization) function helps our attention-guided model to flexibly control the amount of change in shape and texture by learned parameters depending on datasets. Experimental results show the superiority of the proposed method compared to the existing state-of-the-art models with a fixed network architecture and hyper-parameters. Our code and datasets are available at this https URL or this https URL.
<<<
翻译
106.
前进
(2022-09-29 12:12):
#paper Affine Medical Image Registration with Coarse-to-Fine Vision Transformer Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 20835-20844
仿射配准是综合医学图像配准中不可缺少的一部分。然而,关于快速、鲁棒的仿射配准算法的研究很少。这些研究大多都是联合仿射和变形配准的CNN模型,而对仿射子网络的独立性能研究较少。此外,现有的基于CNN的仿射配准方法要么关注输入的局部错位,要么关注输入的全局方向和位置,以预测仿射变换矩阵,这种方法对空间初始化敏感,泛化能力有限。这篇论文提出了一种快速、鲁棒的基于学习的三维仿射医学图像配准算法C2FViT。该方法自然地利用Transformer的全局连通性和CNN的局部性以及多分辨率策略来学习全局仿射配准,并且在3D脑图谱配准中评估了该方法。结果表明该方法在配准精度、鲁棒性、配准速度和泛化性都表现良好。
arXiv,
2022.
DOI: 10.48550/arXiv.2203.15216
Abstract:
Affine registration is indispensable in a comprehensive medical image registration pipeline. However, only a few studies focus on fast and robust affine registration algorithms. Most of these studies utilize convolutional …
>>>
Affine registration is indispensable in a comprehensive medical image registration pipeline. However, only a few studies focus on fast and robust affine registration algorithms. Most of these studies utilize convolutional neural networks (CNNs) to learn joint affine and non-parametric registration, while the standalone performance of the affine subnetwork is less explored. Moreover, existing CNN-based affine registration approaches focus either on the local misalignment or the global orientation and position of the input to predict the affine transformation matrix, which are sensitive to spatial initialization and exhibit limited generalizability apart from the training dataset. In this paper, we present a fast and robust learning-based algorithm, Coarse-to-Fine Vision Transformer (C2FViT), for 3D affine medical image registration. Our method naturally leverages the global connectivity and locality of the convolutional vision transformer and the multi-resolution strategy to learn the global affine registration. We evaluate our method on 3D brain atlas registration and template-matching normalization. Comprehensive results demonstrate that our method is superior to the existing CNNs-based affine registration methods in terms of registration accuracy, robustness and generalizability while preserving the runtime advantage of the learning-based methods. The source code is available at this https URL.
<<<
翻译
107.
张浩彬
(2022-09-21 11:01):
#paper https://doi.org/10.48550/arXiv.2106.00750
Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding
21年ICLR论文,时间序列对比学习
代码:https://github.com/sanatonek/TNC_ representation_learning
样本的选择思想是,认为领域内的信号是相似的,领域外的信号是需要区分的
正样本的选择:邻域的信号都是服从某个高斯分布,均值为t*,方差是窗口大小和邻域长度.领域内是正样本正样本。如果确定邻域,使用ADF检验。
负样本:不在邻域内的就是负样本,但是这一点,作者在损失函数里进一步优化了
损失函数:作者认为,不在一个领域不能都认为是负样本,因为时序问题具有周期性,因此应该把它归为正无标记样本(即正类和负类混合)。在处理上,根据PU学习的一些经验,它在上面的负样本中引入权重,同时进入损失函数。、
数据:总共3个数据:1个模拟数据(4个类别,HMM生成),1个医疗临床房颤数据(MIT-BIH,特点是类别交替进行,类别非常不平衡,少量个体(人)具体非常长的数据),1个人类活动数据(UCI-HAR数据)
下游任务:聚类与分类,其中主要目标是为了尽可能比较表征学习,因此对于同一任务,不同的模型都用了相同的,并且简单的编码器结构。由于不同数据集特点不一样,因此不同任务的编码器不同。
聚类用了简单的kmeans;分类用了简单的knn;本文的TNC都取得了最好的结果
arXiv,
2021.
DOI: 10.48550/arXiv.2106.00750
Abstract:
Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for …
>>>
Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for non-stationary time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal's generative process to define neighborhoods in time with stationary properties. Using a debiased contrastive objective, our framework learns time series representations by ensuring that in the encoding space, the distribution of signals from within a neighborhood is distinguishable from the distribution of non-neighboring signals. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients' latent states in settings where labeling data is practically impossible. We compare our method to recently developed unsupervised representation learning approaches and demonstrate superior performance on clustering and classification tasks for multiple datasets.
<<<
翻译
108.
张德祥
(2022-09-19 19:40):
#paper https://doi.org/10.48550/arXiv.2206.00426 Semantic Probabilistic Layers for Neuro-Symbolic Learning 论文为结构化输出预测设计了一个预测层,可以嵌入神经网络中,保证预测与标签约束一致,通过建模复杂的相关性和约束,结合了概率推理和逻辑推理。是现在唯一满足六个条件的实现。(概率性,高表达力,保证逻辑约束一致,通用-支持各种约束的形式语言表达,模块化嵌入神经网络端对端训练,高效的线性时间);核心是论文通过带约束的概率线路来实现。应用:路径规划(有障碍物、水路等限制),层级多标签训练等。
arXiv,
2022.
DOI: 10.48550/arXiv.2206.00426
Abstract:
We design a predictive layer for structured-output prediction (SOP) that can be plugged into any neural network guaranteeing its predictions are consistent with a set of predefined symbolic constraints. Our …
>>>
We design a predictive layer for structured-output prediction (SOP) that can be plugged into any neural network guaranteeing its predictions are consistent with a set of predefined symbolic constraints. Our Semantic Probabilistic Layer (SPL) can model intricate correlations, and hard constraints, over a structured output space all while being amenable to end-to-end learning via maximum likelihood. SPLs combine exact probabilistic inference with logical reasoning in a clean and modular way, learning complex distributions and restricting their support to solutions of the constraint. As such, they can faithfully, and efficiently, model complex SOP tasks beyond the reach of alternative neuro-symbolic approaches. We empirically demonstrate that SPLs outperform these competitors in terms of accuracy on challenging SOP tasks including hierarchical multi-label classification, pathfinding and preference learning, while retaining perfect constraint satisfaction.
<<<
翻译
109.
song
(2022-09-09 09:04):
#paper https://doi.org/10.48550/arXiv.2206.13236 Pruned RNN-T for fast, memory-efficient ASR training
来自于小米新一代kaldi团队。RNN-T是目前端到端语音识别的主流范式之一,是目前流式解码模型中表现最好和最易工业化部署的,缺点是训练时内存比其他主流模型占用内存至少高一个数量级。究其原因是因为比其他模型如CTC和attention模型的内存多了一个解码器的输出帧数,U,导致的。U值一般在几十到几百之间。本文提出了一种在不降低模型性能的情况下对模型进行剪枝以降低U值的方法。该团队首先发现在RNN-T loss计算过程中,并不是每个计算节点都参与进了计算过程中。计算节点的数量和输出帧数U成正比,只要选择并只保留对模型训练有作用的计算节点便可减少模型内存提高模型训练速度。在计算梯度过程中,只有中间一段连续的计算节点参与进训练之中,根据不同的常见,这个连续节点数,S,为4或5。在实验中,训练时间达到之前sota的约十六分之一,内存占用达到之前的约五分之一,模型性能仅降了0.05%。个人尝试下来,仅用4张V100已经较少的调参便可完全重现并部署。中小型公司将sota模型应用于产品之中的成本和人力将大大减少
arXiv,
2022.
DOI: 10.48550/arXiv.2206.13236
Abstract:
The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the …
>>>
The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the drawbacks of RNN-T is that its loss function is relatively slow to compute, and can use a lot of memory. Excessive GPU memory usage can make it impractical to use RNN-T loss in cases where the vocabulary size is large: for example, for Chinese character-based ASR. We introduce a method for faster and more memory-efficient RNN-T loss computation. We first obtain pruning bounds for the RNN-T recursion using a simple joiner network that is linear in the encoder and decoder embeddings; we can evaluate this without using much memory. We then use those pruning bounds to evaluate the full, non-linear joiner network.
<<<
翻译
110.
张德祥
(2022-09-01 22:03):
#paper https://doi.org/10.48550/arXiv.2208.11970 Understanding Diffusion Models: A Unified Perspective ;最近大火的视频生成模型 dall-e 等背后都是diffusion 模型,这篇论文细致的讲解了diffusion模型的来龙去脉,从ELBO 到VAE 到hierarchical VAE 到diffusion 模型,及diffusion模型的三个视角及diffusion模型的局限,整篇论文公式推导清晰易读是了解diffusion模型的好资料。
arXiv,
2022.
DOI: 10.48550/arXiv.2208.11970
Abstract:
Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we …
>>>
Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.
<<<
翻译
111.
前进
(2022-08-24 22:22):
#paper arXiv:2208.04939v1 ,2022,U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?
基于Transformer的网络由于其长距离建模能力,在可变形图像配准中越来越流行。然而本文认为,一个具有5层卷积Unet网络的感受野足以在不需要依赖长距离建模能力的情况下捕捉精确的图像形变。本文想要探究UNet网络在应用于医学图像配准时,与现代基于Transformer的方法相比是否已经过时?为此,作者提出了一个具有大的卷积核的UNet网络(LKU-Net),即通过在一个普通的UNet网络内嵌入平行的卷积块来争强网络的感受野。在公用3D IXI 大脑数据集上进行基于atlas的配准实验,作者证明了LKU-Net的变现依旧可以和如今最先进的基于Transformer的方法相当甚至超越,而且只用了TransMorph 1.12%的参数量和10.8%的计算量。作者进一步将算法应用在MICCAI 2021的配准比赛中,同样超越了Transmorph,目前排在第一。只对UNet进行了简单的改造,基于Unet的配准算法依旧可以达到最先进的效果,证明基于UNet的配准网络并未过时。
arXiv,
2022.
DOI: 10.48550/arXiv.2208.04939
Abstract:
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net …
>>>
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based methods are outdated compared to modern transformer-based approaches when applied to medical image registration. For this, we propose a large kernel U-Net (LKU-Net) by embedding a parallel convolutional block to a vanilla U-Net in order to enhance the effective receptive field. On the public 3D IXI brain dataset for atlas-based registration, we show that the performance of the vanilla U-Net is already comparable with that of state-of-the-art transformer-based networks (such as TransMorph), and that the proposed LKU-Net outperforms TransMorph by using only 1.12% of its parameters and 10.8% of its mult-adds operations. We further evaluate LKU-Net on a MICCAI Learn2Reg 2021 challenge dataset for inter-subject registration, our LKU-Net also outperforms TransMorph on this dataset and ranks first on the public leaderboard as of the submission of this work. With only modest modifications to the vanilla U-Net, we show that U-Net can outperform transformer-based architectures on inter-subject and atlas-based 3D medical image registration. Code is available at this https URL.
<<<
翻译
112.
张浩彬
(2022-08-11 16:10):
#paper 10.48550/arXiv.1901.10738
Unsupervised Scalable Representation Learning for Multivariate Time Series
论文关键是:正负样本构造, triplet loss以及因果空洞卷积
适用:该无监督学习模型可以用于不定长的序列;短序列及长序列均可使用;
1.正负样本构造:对于某序列,随机选择长度,构造一个子序列。在这个子序列中,随机抽样一个子序列作为正样本;从其他序列中随机抽样作为一个负样本
2.改造的triplet loss
3. exponentially dilated causal convolutions作为特征提取器代替传统的rnn、lstm
结果表明由于现有的无监督方法,并且不差于有监督方法。
arXiv,
2019.
DOI: 10.48550/arXiv.1901.10738
Abstract:
Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle this challenge by …
>>>
Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle this challenge by proposing an unsupervised method to learn universal embeddings of time series. Unlike previous works, it is scalable with respect to their length and we demonstrate the quality, transferability and practicability of the learned representations with thorough experiments and comparisons. To this end, we combine an encoder based on causal dilated convolutions with a novel triplet loss employing time-based negative sampling, obtaining general-purpose representations for variable length and multivariate time series.
<<<
翻译
113.
张浩彬
(2022-08-11 16:09):
#paper https://doi.org/10.48550/arXiv.2103.07719
Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting
对输入使用“Latent Correlation Layer”自动生成图结构;对图结构输入StemGNN层;
该层首先使用GFT(图傅里叶变换)将图转为谱矩阵( 其中每个节点的单变量时间序列变为线性独立),然后使用离散傅里叶变换对每个单变量分量转到频域,并利用一维卷积以及GLU提取特征模式,再通过逆离散傅里叶变换变回时域。另外,模型产生一个预测损失(对未来值),一个回溯损失(对历史值),对两个损失合并作为联合的损失函数。
arXiv,
2021.
DOI: 10.48550/arXiv.2103.07719
Abstract:
Multivariate time-series forecasting plays a crucial role in many real-world applications. It is a challenging problem as one needs to consider both intra-series temporal correlations and inter-series correlations simultaneously. Recently, …
>>>
Multivariate time-series forecasting plays a crucial role in many real-world applications. It is a challenging problem as one needs to consider both intra-series temporal correlations and inter-series correlations simultaneously. Recently, there have been multiple works trying to capture both correlations, but most, if not all of them only capture temporal correlations in the time domain and resort to pre-defined priors as inter-series relationships. In this paper, we propose Spectral Temporal Graph Neural Network (StemGNN) to further improve the accuracy of multivariate time-series forecasting. StemGNN captures inter-series correlations and temporal dependencies \textit{jointly} in the \textit{spectral domain}. It combines Graph Fourier Transform (GFT) which models inter-series correlations and Discrete Fourier Transform (DFT) which models temporal dependencies in an end-to-end framework. After passing through GFT and DFT, the spectral representations hold clear patterns and can be predicted effectively by convolution and sequential learning modules. Moreover, StemGNN learns inter-series correlations automatically from the data without using pre-defined priors. We conduct extensive experiments on ten real-world datasets to demonstrate the effectiveness of StemGNN. Code is available at this https URL
<<<
翻译
114.
王昊
(2022-08-10 11:27):
#paper 10.48550/arXiv.2109.07872 TAN S, GE M, GUO D, 等. Knowledge-based Embodied Question Answering[J/OL]. 2021[2022-08-09]. https://arxiv.org/abs/2109.07872v1.清华孙富春组的文章,主要介绍具身智能体在AI2thor空间里回答针对周围环境的问题,且这些问题需要外部知识库的支持才能回答.
之前存在的问题:具身问答(EQA)不具备回答需要外部知识图谱的问题的能力(其实在KBVQA领域已经有人这么做过了),且不具备推理能力(其实什么可以被定义为推理挺难说的),多跳问答是一个较难的问题.,且现在的EQA系统不能使用遗忘的记忆来节省智能体重新探索的时间.
本文贡献:
1.提出了knowledge-EQA的任务,基于AI2THOR虚拟环境;
2.建立了数据集(数据集的种类只有一些很简单的问题,不是很难)
3.提出了基于 神经编程诊断、3D场景图、3D重建、问题转换为SQL语句、蒙特卡洛树搜索 等技术综合起来的方法来解决上述问题。
arXiv,
2021.
DOI: 10.48550/arXiv.2109.07872
Abstract:
In this paper, we propose a novel Knowledge-based Embodied Question Answering (K-EQA) task, in which the agent intelligently explores the environment to answer various questions with the knowledge. Different from …
>>>
In this paper, we propose a novel Knowledge-based Embodied Question Answering (K-EQA) task, in which the agent intelligently explores the environment to answer various questions with the knowledge. Different from explicitly specifying the target object in the question as existing EQA work, the agent can resort to external knowledge to understand more complicated question such as "Please tell me what are objects used to cut food in the room?", in which the agent must know the knowledge such as "knife is used for cutting food". To address this K-EQA problem, a novel framework based on neural program synthesis reasoning is proposed, where the joint reasoning of the external knowledge and 3D scene graph is performed to realize navigation and question answering. Especially, the 3D scene graph can provide the memory to store the visual information of visited scenes, which significantly improves the efficiency for the multi-turn question answering. Experimental results have demonstrated that the proposed framework is capable of answering more complicated and realistic questions in the embodied environment. The proposed method is also applicable to multi-agent scenarios.
<<<
翻译
115.
张浩彬
(2022-08-09 17:26):
#paper 10.48550/arXiv.2203.03423
Multivariate Time Series Forecasting with Latent Graph Inference 2022的文章。我觉得比较有意思的是,我感觉作者是把简单的东西套在了一个高级的框架里面(这种写作思路值得学习)文章把多变量预测问题分成了两个路线,一个是全局单变量建模(变量共享),一个是直接全局建模全局预测。而作者说他的办法是在第一个方法的基础上进行模块化扩展。具体来说,就是每个单独序列输入编码器生成隐变量。隐变量三会进入一图结构中然后得到隐变量的预测输出。再将输出解码得到最终输出。然后作者说中间的图结构,我们有两种方式,一种是全连接图网络(FC-GNN),一种是二分法图网络(BP-GNN)(我理解是GNN中聚类的一种变体,至于多少类别,则是一个超参数)。这种思路,显然效率会有很大的提升,即使是作者提到的全局GNN,因为只是对隐变量作图,效率也是有提升,更不要说通过抽样构造子图了。所以比起基线模型效率最高,完全可以理解。倒是在准确率的讨论上,实际上作者提出的网络也不全是最优的(两个数据集,一个大部分最优,另一个不是)。虽然做了个简单的消融实验,但是作者也没怎么解释。
总结下来几点:
(1)往上套一个大框架:多变量预测分成两种;embedding变成隐变量;图模型中提供了全连接+二分图的性能-效率权衡()
(2)实验不够,加模拟(这一点还真类似统计中oracle性质的讨论,貌似在深度学习的会议中相对少见)
arXiv,
2022.
DOI: 10.48550/arXiv.2203.03423
Abstract:
This paper introduces a new approach for Multivariate Time Series forecasting that jointly infers and leverages relations among time series. Its modularity allows it to be integrated with current univariate …
>>>
This paper introduces a new approach for Multivariate Time Series forecasting that jointly infers and leverages relations among time series. Its modularity allows it to be integrated with current univariate methods. Our approach allows to trade-off accuracy and computational efficiency gradually via offering on one extreme inference of a potentially fully-connected graph or on another extreme a bipartite graph. In the potentially fully-connected case we consider all pair-wise interactions among time-series which yields the best forecasting accuracy. Conversely, the bipartite case leverages the dependency structure by inter-communicating the N time series through a small set of K auxiliary nodes that we introduce. This reduces the time and memory complexity w.r.t. previous graph inference methods from O(N^2) to O(NK) with a small trade-off in accuracy. We demonstrate the effectiveness of our model in a variety of datasets where both of its variants perform better or very competitively to previous graph inference methods in terms of forecasting accuracy and time efficiency.
<<<
翻译
116.
林海onrush
(2022-08-07 22:47):
#paper arXiv:2207.03530v1 [cs.RO] 7 Jul 2022,VMAS: A Vectorized Multi-Agent Simulator for
Collective Robot Learning,https://deepai.org/publication/vmas-a-vectorized-multi-agent-simulator-for-collective-robot-learning
剑桥大学提出多智能体联合强化学习框架VMAS
虽然许多多机器人协调问题可以通过精确的算法得到最佳解决,但解决方案在机器人的数量上往往是不可扩展的。多智能体强化学习(MARL)作为解决这类问题的一个有希望的解决方案,在机器人界越来越受到关注。然而,仍然缺乏能够快速有效地找到大规模集体学习任务解决方案的工具。在这项工作中,介绍了VMAS。VMAS是一个开源的框架,为高效的MARL基准测试而设计。它由一个用PyTorch编写的矢量二维物理引擎和一套12个具有挑战性的多机器人场景组成。其他场景可以通过一个简单的模块化接口来实现。
本文展示了矢量化是如何在不增加复杂性的情况下在加速硬件上实现并行仿真的,比较了VMAS和目前的最优框架OpenAI MPE,表明了其速度超过了MPE100倍,同时本文使用VMAS进行了各种基准测试,表明了现有算法存在的挑战。
VMAS 能够在 10 秒内执行 30,000 次并行仿真,速度提高了 100 倍以上。使用 VMAS 的 RLlib 接口,我们使用各种基于近端策略优化 (PPO) 的 MARL 算法对我们的多机器人场景进行基准测试。 VMAS 的场景在最先进的 MARL 算法的正交方法。 VMAS 框架可在以下网址获得并可进行复现:https://github.com/proroklab/VectorizedMultiAgentSimulator
arXiv,
2022.
DOI: arXiv:2207.03530
Abstract:
While many multi-robot coordination problems can be solved optimally by exact algorithms, solutions are often not scalable in the number of robots. Multi-Agent Reinforcement Learning (MARL) is gaining increasing attention …
>>>
While many multi-robot coordination problems can be solved optimally by exact algorithms, solutions are often not scalable in the number of robots. Multi-Agent Reinforcement Learning (MARL) is gaining increasing attention in the robotics community as a promising solution to tackle such problems. Nevertheless, we still lack the tools that allow us to quickly and efficiently find solutions to large-scale collective learning tasks. In this work, we introduce the Vectorized Multi-Agent Simulator (VMAS). VMAS is an open-source framework designed for efficient MARL benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of twelve challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface. We demonstrate how vectorization enables parallel simulation on accelerated hardware without added complexity. When comparing VMAS to OpenAI MPE, we show how MPE's execution time increases linearly in the number of simulations while VMAS is able to execute 30,000 parallel simulations in under 10s, proving more than 100x faster. Using VMAS's RLlib interface, we benchmark our multi-robot scenarios using various Proximal Policy Optimization (PPO)-based MARL algorithms. VMAS's scenarios prove challenging in orthogonal ways for state-of-the-art MARL algorithms. The VMAS framework is available at this https URL. A video of VMAS scenarios and experiments is available at this https URL}{here}\footnote{\url{this https URL.
<<<
翻译
117.
尹志
(2022-07-30 22:41):
#paper https://doi.org/10.48550/arXiv.2205.01529 Masked Generative Distillation ECCV 2022. 这是一篇知识蒸馏的文章,通过类似对比学习的方式去生成特征,从而实现蒸馏。我们知道,知识蒸馏作为一个通用的技巧,已经被用于各类
机器学习任务,在视觉上比如分类、分割、检测等。一般来说蒸馏算法通过使得学生模仿老师特征去提高学生特征的表征能力。但这篇文章提出,学生不用去模仿老师的特征了,干脆自己生成特征好了,即通过对学生特征进行随机遮盖,然后用学生的部分特征去生成老师特征。这样学生特征就具有了较强的表征能力。这个想法很有意思,我打个比方(可能不太合适),就像本来是要学习老师的一举一动,但是现在这个老师不太出现,你不方便直接模仿,那就学生自己通过监督,去盲猜老师的特征什么样的,这样多猜几次,每次都能猜准的时候,说明对这位老师已经很熟悉了,然后说明学生的表征能力就比较强了。通过这个方式,作者在图像分类、目标检测、语义分割、实例分割等多种任务上,在不同数据集不同model的基础上,做了大量实验,发现性能都得到了提升(基本上都有2-3个点的提升,具体数值见文献)。
arXiv,
2022.
DOI: 10.48550/arXiv.2205.01529
Abstract:
Knowledge distillation has been applied to various tasks successfully. The current distillation algorithm usually improves students' performance by imitating the output of the teacher. This paper shows that teachers can …
>>>
Knowledge distillation has been applied to various tasks successfully. The current distillation algorithm usually improves students' performance by imitating the output of the teacher. This paper shows that teachers can also improve students' representation power by guiding students' feature recovery. From this point of view, we propose Masked Generative Distillation (MGD), which is simple: we mask random pixels of the student's feature and force it to generate the teacher's full feature through a simple block. MGD is a truly general feature-based distillation method, which can be utilized on various tasks, including image classification, object detection, semantic segmentation and instance segmentation. We experiment on different models with extensive datasets and the results show that all the students achieve excellent improvements. Notably, we boost ResNet-18 from 69.90% to 71.69% ImageNet top-1 accuracy, RetinaNet with ResNet-50 backbone from 37.4 to 41.0 Boundingbox mAP, SOLO based on ResNet-50 from 33.1 to 36.2 Mask mAP and DeepLabV3 based on ResNet-18 from 73.20 to 76.02 mIoU. Our codes are available at this https URL.
<<<
翻译
118.
王昊
(2022-07-28 09:51):
#paper doi:10.48550/arXiv.2207.04630 Yi Ma, Doris Tsao, and Heung-Yeung Shum. 2022. On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence. 作者马毅数学功底很好,和做神经科学的Doris Tsao合作的一篇讲述他们认为的2个重要的AI基本原理的文章。本文提出了一个理解深度神经网络的新框架:压缩闭环转录,并回答了从数据中学习的目标是什么,如何衡量?(信息编码论)以及 如何通过高效和有效的计算实现这样的目标?(控制)这两个问题。提出理解AI的两个基本原理:简约性与自洽性。
arXiv,
2022.
DOI: 10.48550/arXiv.2207.04630
Abstract:
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in …
>>>
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that address two fundamental questions regarding Intelligence: what to learn and how to learn, respectively. We believe the two principles are the cornerstones for the emergence of Intelligence, artificial or natural. While these two principles have rich classical roots, we argue that they can be stated anew in entirely measurable and computable ways. More specifically, the two principles lead to an effective and efficient computational framework, compressive closed-loop transcription, that unifies and explains the evolution of modern deep networks and many artificial intelligence practices. While we mainly use modeling of visual data as an example, we believe the two principles will unify understanding of broad families of autonomous intelligent systems and provide a framework for understanding the brain.
<<<
翻译
119.
张德祥
(2022-07-19 18:49):
#paper https://doi.org/10.48550/arXiv.2207.04630 On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence 马毅的这篇论文已经有公众号报道过了,马毅结合自己的之前的两个工作,LDR 数据压缩及闭环生成模型的深度网络,将压缩和闭环生成提炼为简约和自洽的智能原则,本论文继续提出了更多通用性的想法,并扩展到3d视觉及强化学习并预测对神经科学及高级智能的影响。
arXiv,
2022.
DOI: 10.48550/arXiv.2207.04630
Abstract:
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in …
>>>
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that address two fundamental questions regarding Intelligence: what to learn and how to learn, respectively. We believe the two principles are the cornerstones for the emergence of Intelligence, artificial or natural. While these two principles have rich classical roots, we argue that they can be stated anew in entirely measurable and computable ways. More specifically, the two principles lead to an effective and efficient computational framework, compressive closed-loop transcription, that unifies and explains the evolution of modern deep networks and many artificial intelligence practices. While we mainly use modeling of visual data as an example, we believe the two principles will unify understanding of broad families of autonomous intelligent systems and provide a framework for understanding the brain.
<<<
翻译
120.
王昊
(2022-06-30 17:08):
#paper doi:https://doi.org/10.48550/arXiv.2201.12086 BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. arXiv:2201.12086 [cs].
BLIP 是一个统一的视觉语言预训练(vision-language pre-training, VLP)框架,从有噪声的图像文本对中学习。 BLIP 通过自展标注(bootstrapping the captions),可以有效地利用带有噪声的 web 数据,其中标注器(captioner)生成标注,过滤器(filter)去除有噪声的标注。本模型属于开源的视觉语言模型中性能较好的(2022年6月),可以直接docker部署,应用于多个视觉语言下游任务。我们尝试了以后可以一定程度上实现zero-shot的功能。在VQA 2.0数据集上性能较好。思考下一步将其作为预训练模型,微调后应用于落地的其它下游任务。
arXiv,
2022.
DOI: 10.48550/arXiv.2201.12086
Abstract:
Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been …
>>>
Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. We achieve state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval (+2.7% in average recall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score). BLIP also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner. Code, models, and datasets are released at this https URL.
<<<
翻译