来自杂志 arXiv 的文献。
当前共找到 110 篇文献分享,本页显示第 81 - 100 篇。
81.
Ricardo
(2022-09-30 23:32):
#paper doi:https://doi.org/10.48550/arXiv.2202.03563,Aladdin: Joint Atlas Building and Diffeomorphic Registration Learning with Pairwise Alignment 图谱构建和图像配准是医学影像分析中的重要任务,但是图谱估计和无参形变的计算需要极高的计算代价。此外,以前的图谱构建方法通常计算模糊图谱和每个单独的图像之间的相似度驱动模型优化,这可能会增加预估的图谱和个体图像之间配准的难度,因为预估的模糊图谱相比个体图像不具有更清楚的解剖结构。这篇文章基于forward model从多个角度约束了图谱的生成空间,并做了充足的理论分析。但是由于模型较为复杂,并且涉及所有图像的同时优化,所以不太适合3d图像数据,目前还只是在2d图像数据上做实验。
arXiv,
2022.
DOI: 10.48550/arXiv.2202.03563
Abstract:
Atlas building and image registration are important tasks for medical image analysis. Once one or multiple atlases from an image population have been constructed, commonly (1) images are warped into …
>>>
Atlas building and image registration are important tasks for medical image analysis. Once one or multiple atlases from an image population have been constructed, commonly (1) images are warped into an atlas space to study intra-subject or inter-subject variations or (2) a possibly probabilistic atlas is warped into image space to assign anatomical labels. Atlas estimation and nonparametric transformations are computationally expensive as they usually require numerical optimization. Additionally, previous approaches for atlas building often define similarity measures between a fuzzy atlas and each individual image, which may cause alignment difficulties because a fuzzy atlas does not exhibit clear anatomical structures in contrast to the individual images. This work explores using a convolutional neural network (CNN) to jointly predict the atlas and a stationary velocity field (SVF) parameterization for diffeomorphic image registration with respect to the atlas. Our approach does not require affine pre-registrations and utilizes pairwise image alignment losses to increase registration accuracy. We evaluate our model on 3D knee magnetic resonance images (MRI) from the OAI-ZIB dataset. Our results show that the proposed framework achieves better performance than other state-of-the-art image registration algorithms, allows for end-to-end training, and for fast inference at test time.
<<<
翻译
82.
林海onrush
(2022-09-30 22:25):
#paper arXiv, 2209.00796 (2022) , Diffusion Models: A Comprehensive Survey of Methods and Applications, Diffusion model在诸多领域都有着优异的表现,并且考虑到不同领域的应用中diffusion model产生了不同的变形,论文系统地介绍了diffusion model的应用研究,其中包含如下领域:计算机视觉,NLP、波形信号处理、多模态建模、分子图建模、时间序列建模、对抗性净化。工作的主要贡献总结如下:新的分类方法:我们对扩散模型和其应用提出了一种新的、系统的分类法。具体将模型分为三类:采样速度增强、最大似然估计增强、数据泛化增强。进一步地,将扩散模型的应用分为七类:计算机视觉,NLP、波形信号处理、多模态建模、分子图建模、时间序列建模、对抗性净化。全面地概述了现代扩散模型及其应用,展示了每种扩散模型的主要改进,和原始模型进行了必要的比较,并总结了相应的论文。扩散模型的基本思想是正向扩散过程来系统地扰动数据中的分布,然后通过学习反向扩散过程恢复数据的分布,这样就了产生一个高度灵活且易于计算的生成模型。
arXiv,
2022.
DOI: 10.48550/arXiv.2209.00796
Abstract:
Diffusion models are a class of deep generative models that have shown impressive results on various tasks with a solid theoretical foundation. Despite demonstrated success than state-of-the-art approaches, diffusion models …
>>>
Diffusion models are a class of deep generative models that have shown impressive results on various tasks with a solid theoretical foundation. Despite demonstrated success than state-of-the-art approaches, diffusion models often entail costly sampling procedures and sub-optimal likelihood estimation. Significant efforts have been made to improve the performance of diffusion models in various aspects. In this article, we present a comprehensive review of existing variants of diffusion models. Specifically, we provide the taxonomy of diffusion models and categorize them into three types: sampling-acceleration enhancement, likelihood-maximization enhancement, and data-generalization enhancement. We also introduce the other generative models (i.e., variational autoencoders, generative adversarial networks, normalizing flow, autoregressive models, and energy-based models) and discuss the connections between diffusion models and these generative models. Then we review the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification. Furthermore, we propose new perspectives pertaining to the development of generative models. Github: this https URL.
<<<
翻译
83.
尹志
(2022-09-30 11:06):
#paper doi:10.48550/arXiv.1907.10830 U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation, ICLR 2020. 这又是一篇图像翻译的文章,还是在网络结构上做了有效的改进。作者通过提出一个新的注意力模块和一种新的归一化函数实现无监督的图像翻译工作。作者提出的注意力模块对于图像的几何形变能够做出很好的处理,这也让文章的架构对于很多艺术风格的变化处理具有优越的效果。
arXiv,
2019.
DOI: 10.48550/arXiv.1907.10830
Abstract:
We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner. The attention module guides our …
>>>
We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner. The attention module guides our model to focus on more important regions distinguishing between source and target domains based on the attention map obtained by the auxiliary classifier. Unlike previous attention-based method which cannot handle the geometric changes between domains, our model can translate both images requiring holistic changes and images requiring large shape changes. Moreover, our new AdaLIN (Adaptive Layer-Instance Normalization) function helps our attention-guided model to flexibly control the amount of change in shape and texture by learned parameters depending on datasets. Experimental results show the superiority of the proposed method compared to the existing state-of-the-art models with a fixed network architecture and hyper-parameters. Our code and datasets are available at this https URL or this https URL.
<<<
翻译
84.
前进
(2022-09-29 12:12):
#paper Affine Medical Image Registration with Coarse-to-Fine Vision Transformer Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 20835-20844
仿射配准是综合医学图像配准中不可缺少的一部分。然而,关于快速、鲁棒的仿射配准算法的研究很少。这些研究大多都是联合仿射和变形配准的CNN模型,而对仿射子网络的独立性能研究较少。此外,现有的基于CNN的仿射配准方法要么关注输入的局部错位,要么关注输入的全局方向和位置,以预测仿射变换矩阵,这种方法对空间初始化敏感,泛化能力有限。这篇论文提出了一种快速、鲁棒的基于学习的三维仿射医学图像配准算法C2FViT。该方法自然地利用Transformer的全局连通性和CNN的局部性以及多分辨率策略来学习全局仿射配准,并且在3D脑图谱配准中评估了该方法。结果表明该方法在配准精度、鲁棒性、配准速度和泛化性都表现良好。
arXiv,
2022.
DOI: 10.48550/arXiv.2203.15216
Abstract:
Affine registration is indispensable in a comprehensive medical image registration pipeline. However, only a few studies focus on fast and robust affine registration algorithms. Most of these studies utilize convolutional …
>>>
Affine registration is indispensable in a comprehensive medical image registration pipeline. However, only a few studies focus on fast and robust affine registration algorithms. Most of these studies utilize convolutional neural networks (CNNs) to learn joint affine and non-parametric registration, while the standalone performance of the affine subnetwork is less explored. Moreover, existing CNN-based affine registration approaches focus either on the local misalignment or the global orientation and position of the input to predict the affine transformation matrix, which are sensitive to spatial initialization and exhibit limited generalizability apart from the training dataset. In this paper, we present a fast and robust learning-based algorithm, Coarse-to-Fine Vision Transformer (C2FViT), for 3D affine medical image registration. Our method naturally leverages the global connectivity and locality of the convolutional vision transformer and the multi-resolution strategy to learn the global affine registration. We evaluate our method on 3D brain atlas registration and template-matching normalization. Comprehensive results demonstrate that our method is superior to the existing CNNs-based affine registration methods in terms of registration accuracy, robustness and generalizability while preserving the runtime advantage of the learning-based methods. The source code is available at this https URL.
<<<
翻译
85.
张浩彬
(2022-09-21 11:01):
#paper https://doi.org/10.48550/arXiv.2106.00750
Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding
21年ICLR论文,时间序列对比学习
代码:https://github.com/sanatonek/TNC_ representation_learning
样本的选择思想是,认为领域内的信号是相似的,领域外的信号是需要区分的
正样本的选择:邻域的信号都是服从某个高斯分布,均值为t*,方差是窗口大小和邻域长度.领域内是正样本正样本。如果确定邻域,使用ADF检验。
负样本:不在邻域内的就是负样本,但是这一点,作者在损失函数里进一步优化了
损失函数:作者认为,不在一个领域不能都认为是负样本,因为时序问题具有周期性,因此应该把它归为正无标记样本(即正类和负类混合)。在处理上,根据PU学习的一些经验,它在上面的负样本中引入权重,同时进入损失函数。、
数据:总共3个数据:1个模拟数据(4个类别,HMM生成),1个医疗临床房颤数据(MIT-BIH,特点是类别交替进行,类别非常不平衡,少量个体(人)具体非常长的数据),1个人类活动数据(UCI-HAR数据)
下游任务:聚类与分类,其中主要目标是为了尽可能比较表征学习,因此对于同一任务,不同的模型都用了相同的,并且简单的编码器结构。由于不同数据集特点不一样,因此不同任务的编码器不同。
聚类用了简单的kmeans;分类用了简单的knn;本文的TNC都取得了最好的结果
arXiv,
2021.
DOI: 10.48550/arXiv.2106.00750
Abstract:
Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for …
>>>
Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for non-stationary time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal's generative process to define neighborhoods in time with stationary properties. Using a debiased contrastive objective, our framework learns time series representations by ensuring that in the encoding space, the distribution of signals from within a neighborhood is distinguishable from the distribution of non-neighboring signals. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients' latent states in settings where labeling data is practically impossible. We compare our method to recently developed unsupervised representation learning approaches and demonstrate superior performance on clustering and classification tasks for multiple datasets.
<<<
翻译
86.
张德祥
(2022-09-19 19:40):
#paper https://doi.org/10.48550/arXiv.2206.00426 Semantic Probabilistic Layers for Neuro-Symbolic Learning 论文为结构化输出预测设计了一个预测层,可以嵌入神经网络中,保证预测与标签约束一致,通过建模复杂的相关性和约束,结合了概率推理和逻辑推理。是现在唯一满足六个条件的实现。(概率性,高表达力,保证逻辑约束一致,通用-支持各种约束的形式语言表达,模块化嵌入神经网络端对端训练,高效的线性时间);核心是论文通过带约束的概率线路来实现。应用:路径规划(有障碍物、水路等限制),层级多标签训练等。
arXiv,
2022.
DOI: 10.48550/arXiv.2206.00426
Abstract:
We design a predictive layer for structured-output prediction (SOP) that can be plugged into any neural network guaranteeing its predictions are consistent with a set of predefined symbolic constraints. Our …
>>>
We design a predictive layer for structured-output prediction (SOP) that can be plugged into any neural network guaranteeing its predictions are consistent with a set of predefined symbolic constraints. Our Semantic Probabilistic Layer (SPL) can model intricate correlations, and hard constraints, over a structured output space all while being amenable to end-to-end learning via maximum likelihood. SPLs combine exact probabilistic inference with logical reasoning in a clean and modular way, learning complex distributions and restricting their support to solutions of the constraint. As such, they can faithfully, and efficiently, model complex SOP tasks beyond the reach of alternative neuro-symbolic approaches. We empirically demonstrate that SPLs outperform these competitors in terms of accuracy on challenging SOP tasks including hierarchical multi-label classification, pathfinding and preference learning, while retaining perfect constraint satisfaction.
<<<
翻译
87.
song
(2022-09-09 09:04):
#paper https://doi.org/10.48550/arXiv.2206.13236 Pruned RNN-T for fast, memory-efficient ASR training
来自于小米新一代kaldi团队。RNN-T是目前端到端语音识别的主流范式之一,是目前流式解码模型中表现最好和最易工业化部署的,缺点是训练时内存比其他主流模型占用内存至少高一个数量级。究其原因是因为比其他模型如CTC和attention模型的内存多了一个解码器的输出帧数,U,导致的。U值一般在几十到几百之间。本文提出了一种在不降低模型性能的情况下对模型进行剪枝以降低U值的方法。该团队首先发现在RNN-T loss计算过程中,并不是每个计算节点都参与进了计算过程中。计算节点的数量和输出帧数U成正比,只要选择并只保留对模型训练有作用的计算节点便可减少模型内存提高模型训练速度。在计算梯度过程中,只有中间一段连续的计算节点参与进训练之中,根据不同的常见,这个连续节点数,S,为4或5。在实验中,训练时间达到之前sota的约十六分之一,内存占用达到之前的约五分之一,模型性能仅降了0.05%。个人尝试下来,仅用4张V100已经较少的调参便可完全重现并部署。中小型公司将sota模型应用于产品之中的成本和人力将大大减少
arXiv,
2022.
DOI: 10.48550/arXiv.2206.13236
Abstract:
The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the …
>>>
The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the drawbacks of RNN-T is that its loss function is relatively slow to compute, and can use a lot of memory. Excessive GPU memory usage can make it impractical to use RNN-T loss in cases where the vocabulary size is large: for example, for Chinese character-based ASR. We introduce a method for faster and more memory-efficient RNN-T loss computation. We first obtain pruning bounds for the RNN-T recursion using a simple joiner network that is linear in the encoder and decoder embeddings; we can evaluate this without using much memory. We then use those pruning bounds to evaluate the full, non-linear joiner network.
<<<
翻译
88.
张德祥
(2022-09-01 22:03):
#paper https://doi.org/10.48550/arXiv.2208.11970 Understanding Diffusion Models: A Unified Perspective ;最近大火的视频生成模型 dall-e 等背后都是diffusion 模型,这篇论文细致的讲解了diffusion模型的来龙去脉,从ELBO 到VAE 到hierarchical VAE 到diffusion 模型,及diffusion模型的三个视角及diffusion模型的局限,整篇论文公式推导清晰易读是了解diffusion模型的好资料。
arXiv,
2022.
DOI: 10.48550/arXiv.2208.11970
Abstract:
Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we …
>>>
Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.
<<<
翻译
89.
前进
(2022-08-24 22:22):
#paper arXiv:2208.04939v1 ,2022,U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?
基于Transformer的网络由于其长距离建模能力,在可变形图像配准中越来越流行。然而本文认为,一个具有5层卷积Unet网络的感受野足以在不需要依赖长距离建模能力的情况下捕捉精确的图像形变。本文想要探究UNet网络在应用于医学图像配准时,与现代基于Transformer的方法相比是否已经过时?为此,作者提出了一个具有大的卷积核的UNet网络(LKU-Net),即通过在一个普通的UNet网络内嵌入平行的卷积块来争强网络的感受野。在公用3D IXI 大脑数据集上进行基于atlas的配准实验,作者证明了LKU-Net的变现依旧可以和如今最先进的基于Transformer的方法相当甚至超越,而且只用了TransMorph 1.12%的参数量和10.8%的计算量。作者进一步将算法应用在MICCAI 2021的配准比赛中,同样超越了Transmorph,目前排在第一。只对UNet进行了简单的改造,基于Unet的配准算法依旧可以达到最先进的效果,证明基于UNet的配准网络并未过时。
arXiv,
2022.
DOI: 10.48550/arXiv.2208.04939
Abstract:
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net …
>>>
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based methods are outdated compared to modern transformer-based approaches when applied to medical image registration. For this, we propose a large kernel U-Net (LKU-Net) by embedding a parallel convolutional block to a vanilla U-Net in order to enhance the effective receptive field. On the public 3D IXI brain dataset for atlas-based registration, we show that the performance of the vanilla U-Net is already comparable with that of state-of-the-art transformer-based networks (such as TransMorph), and that the proposed LKU-Net outperforms TransMorph by using only 1.12% of its parameters and 10.8% of its mult-adds operations. We further evaluate LKU-Net on a MICCAI Learn2Reg 2021 challenge dataset for inter-subject registration, our LKU-Net also outperforms TransMorph on this dataset and ranks first on the public leaderboard as of the submission of this work. With only modest modifications to the vanilla U-Net, we show that U-Net can outperform transformer-based architectures on inter-subject and atlas-based 3D medical image registration. Code is available at this https URL.
<<<
翻译
90.
张浩彬
(2022-08-11 16:10):
#paper 10.48550/arXiv.1901.10738
Unsupervised Scalable Representation Learning for Multivariate Time Series
论文关键是:正负样本构造, triplet loss以及因果空洞卷积
适用:该无监督学习模型可以用于不定长的序列;短序列及长序列均可使用;
1.正负样本构造:对于某序列,随机选择长度,构造一个子序列。在这个子序列中,随机抽样一个子序列作为正样本;从其他序列中随机抽样作为一个负样本
2.改造的triplet loss
3. exponentially dilated causal convolutions作为特征提取器代替传统的rnn、lstm
结果表明由于现有的无监督方法,并且不差于有监督方法。
arXiv,
2019.
DOI: 10.48550/arXiv.1901.10738
Abstract:
Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle this challenge by …
>>>
Time series constitute a challenging data type for machine learning algorithms, due to their highly variable lengths and sparse labeling in practice. In this paper, we tackle this challenge by proposing an unsupervised method to learn universal embeddings of time series. Unlike previous works, it is scalable with respect to their length and we demonstrate the quality, transferability and practicability of the learned representations with thorough experiments and comparisons. To this end, we combine an encoder based on causal dilated convolutions with a novel triplet loss employing time-based negative sampling, obtaining general-purpose representations for variable length and multivariate time series.
<<<
翻译
91.
张浩彬
(2022-08-11 16:09):
#paper https://doi.org/10.48550/arXiv.2103.07719
Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting
对输入使用“Latent Correlation Layer”自动生成图结构;对图结构输入StemGNN层;
该层首先使用GFT(图傅里叶变换)将图转为谱矩阵( 其中每个节点的单变量时间序列变为线性独立),然后使用离散傅里叶变换对每个单变量分量转到频域,并利用一维卷积以及GLU提取特征模式,再通过逆离散傅里叶变换变回时域。另外,模型产生一个预测损失(对未来值),一个回溯损失(对历史值),对两个损失合并作为联合的损失函数。
arXiv,
2021.
DOI: 10.48550/arXiv.2103.07719
Abstract:
Multivariate time-series forecasting plays a crucial role in many real-world applications. It is a challenging problem as one needs to consider both intra-series temporal correlations and inter-series correlations simultaneously. Recently, …
>>>
Multivariate time-series forecasting plays a crucial role in many real-world applications. It is a challenging problem as one needs to consider both intra-series temporal correlations and inter-series correlations simultaneously. Recently, there have been multiple works trying to capture both correlations, but most, if not all of them only capture temporal correlations in the time domain and resort to pre-defined priors as inter-series relationships. In this paper, we propose Spectral Temporal Graph Neural Network (StemGNN) to further improve the accuracy of multivariate time-series forecasting. StemGNN captures inter-series correlations and temporal dependencies \textit{jointly} in the \textit{spectral domain}. It combines Graph Fourier Transform (GFT) which models inter-series correlations and Discrete Fourier Transform (DFT) which models temporal dependencies in an end-to-end framework. After passing through GFT and DFT, the spectral representations hold clear patterns and can be predicted effectively by convolution and sequential learning modules. Moreover, StemGNN learns inter-series correlations automatically from the data without using pre-defined priors. We conduct extensive experiments on ten real-world datasets to demonstrate the effectiveness of StemGNN. Code is available at this https URL
<<<
翻译
92.
王昊
(2022-08-10 11:27):
#paper 10.48550/arXiv.2109.07872 TAN S, GE M, GUO D, 等. Knowledge-based Embodied Question Answering[J/OL]. 2021[2022-08-09]. https://arxiv.org/abs/2109.07872v1.清华孙富春组的文章,主要介绍具身智能体在AI2thor空间里回答针对周围环境的问题,且这些问题需要外部知识库的支持才能回答.
之前存在的问题:具身问答(EQA)不具备回答需要外部知识图谱的问题的能力(其实在KBVQA领域已经有人这么做过了),且不具备推理能力(其实什么可以被定义为推理挺难说的),多跳问答是一个较难的问题.,且现在的EQA系统不能使用遗忘的记忆来节省智能体重新探索的时间.
本文贡献:
1.提出了knowledge-EQA的任务,基于AI2THOR虚拟环境;
2.建立了数据集(数据集的种类只有一些很简单的问题,不是很难)
3.提出了基于 神经编程诊断、3D场景图、3D重建、问题转换为SQL语句、蒙特卡洛树搜索 等技术综合起来的方法来解决上述问题。
arXiv,
2021.
DOI: 10.48550/arXiv.2109.07872
Abstract:
In this paper, we propose a novel Knowledge-based Embodied Question Answering (K-EQA) task, in which the agent intelligently explores the environment to answer various questions with the knowledge. Different from …
>>>
In this paper, we propose a novel Knowledge-based Embodied Question Answering (K-EQA) task, in which the agent intelligently explores the environment to answer various questions with the knowledge. Different from explicitly specifying the target object in the question as existing EQA work, the agent can resort to external knowledge to understand more complicated question such as "Please tell me what are objects used to cut food in the room?", in which the agent must know the knowledge such as "knife is used for cutting food". To address this K-EQA problem, a novel framework based on neural program synthesis reasoning is proposed, where the joint reasoning of the external knowledge and 3D scene graph is performed to realize navigation and question answering. Especially, the 3D scene graph can provide the memory to store the visual information of visited scenes, which significantly improves the efficiency for the multi-turn question answering. Experimental results have demonstrated that the proposed framework is capable of answering more complicated and realistic questions in the embodied environment. The proposed method is also applicable to multi-agent scenarios.
<<<
翻译
93.
张浩彬
(2022-08-09 17:26):
#paper 10.48550/arXiv.2203.03423
Multivariate Time Series Forecasting with Latent Graph Inference 2022的文章。我觉得比较有意思的是,我感觉作者是把简单的东西套在了一个高级的框架里面(这种写作思路值得学习)文章把多变量预测问题分成了两个路线,一个是全局单变量建模(变量共享),一个是直接全局建模全局预测。而作者说他的办法是在第一个方法的基础上进行模块化扩展。具体来说,就是每个单独序列输入编码器生成隐变量。隐变量三会进入一图结构中然后得到隐变量的预测输出。再将输出解码得到最终输出。然后作者说中间的图结构,我们有两种方式,一种是全连接图网络(FC-GNN),一种是二分法图网络(BP-GNN)(我理解是GNN中聚类的一种变体,至于多少类别,则是一个超参数)。这种思路,显然效率会有很大的提升,即使是作者提到的全局GNN,因为只是对隐变量作图,效率也是有提升,更不要说通过抽样构造子图了。所以比起基线模型效率最高,完全可以理解。倒是在准确率的讨论上,实际上作者提出的网络也不全是最优的(两个数据集,一个大部分最优,另一个不是)。虽然做了个简单的消融实验,但是作者也没怎么解释。
总结下来几点:
(1)往上套一个大框架:多变量预测分成两种;embedding变成隐变量;图模型中提供了全连接+二分图的性能-效率权衡()
(2)实验不够,加模拟(这一点还真类似统计中oracle性质的讨论,貌似在深度学习的会议中相对少见)
arXiv,
2022.
DOI: 10.48550/arXiv.2203.03423
Abstract:
This paper introduces a new approach for Multivariate Time Series forecasting that jointly infers and leverages relations among time series. Its modularity allows it to be integrated with current univariate …
>>>
This paper introduces a new approach for Multivariate Time Series forecasting that jointly infers and leverages relations among time series. Its modularity allows it to be integrated with current univariate methods. Our approach allows to trade-off accuracy and computational efficiency gradually via offering on one extreme inference of a potentially fully-connected graph or on another extreme a bipartite graph. In the potentially fully-connected case we consider all pair-wise interactions among time-series which yields the best forecasting accuracy. Conversely, the bipartite case leverages the dependency structure by inter-communicating the N time series through a small set of K auxiliary nodes that we introduce. This reduces the time and memory complexity w.r.t. previous graph inference methods from O(N^2) to O(NK) with a small trade-off in accuracy. We demonstrate the effectiveness of our model in a variety of datasets where both of its variants perform better or very competitively to previous graph inference methods in terms of forecasting accuracy and time efficiency.
<<<
翻译
94.
林海onrush
(2022-08-07 22:47):
#paper arXiv:2207.03530v1 [cs.RO] 7 Jul 2022,VMAS: A Vectorized Multi-Agent Simulator for
Collective Robot Learning,https://deepai.org/publication/vmas-a-vectorized-multi-agent-simulator-for-collective-robot-learning
剑桥大学提出多智能体联合强化学习框架VMAS
虽然许多多机器人协调问题可以通过精确的算法得到最佳解决,但解决方案在机器人的数量上往往是不可扩展的。多智能体强化学习(MARL)作为解决这类问题的一个有希望的解决方案,在机器人界越来越受到关注。然而,仍然缺乏能够快速有效地找到大规模集体学习任务解决方案的工具。在这项工作中,介绍了VMAS。VMAS是一个开源的框架,为高效的MARL基准测试而设计。它由一个用PyTorch编写的矢量二维物理引擎和一套12个具有挑战性的多机器人场景组成。其他场景可以通过一个简单的模块化接口来实现。
本文展示了矢量化是如何在不增加复杂性的情况下在加速硬件上实现并行仿真的,比较了VMAS和目前的最优框架OpenAI MPE,表明了其速度超过了MPE100倍,同时本文使用VMAS进行了各种基准测试,表明了现有算法存在的挑战。
VMAS 能够在 10 秒内执行 30,000 次并行仿真,速度提高了 100 倍以上。使用 VMAS 的 RLlib 接口,我们使用各种基于近端策略优化 (PPO) 的 MARL 算法对我们的多机器人场景进行基准测试。 VMAS 的场景在最先进的 MARL 算法的正交方法。 VMAS 框架可在以下网址获得并可进行复现:https://github.com/proroklab/VectorizedMultiAgentSimulator
arXiv,
2022.
DOI: arXiv:2207.03530
Abstract:
While many multi-robot coordination problems can be solved optimally by exact algorithms, solutions are often not scalable in the number of robots. Multi-Agent Reinforcement Learning (MARL) is gaining increasing attention …
>>>
While many multi-robot coordination problems can be solved optimally by exact algorithms, solutions are often not scalable in the number of robots. Multi-Agent Reinforcement Learning (MARL) is gaining increasing attention in the robotics community as a promising solution to tackle such problems. Nevertheless, we still lack the tools that allow us to quickly and efficiently find solutions to large-scale collective learning tasks. In this work, we introduce the Vectorized Multi-Agent Simulator (VMAS). VMAS is an open-source framework designed for efficient MARL benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of twelve challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface. We demonstrate how vectorization enables parallel simulation on accelerated hardware without added complexity. When comparing VMAS to OpenAI MPE, we show how MPE's execution time increases linearly in the number of simulations while VMAS is able to execute 30,000 parallel simulations in under 10s, proving more than 100x faster. Using VMAS's RLlib interface, we benchmark our multi-robot scenarios using various Proximal Policy Optimization (PPO)-based MARL algorithms. VMAS's scenarios prove challenging in orthogonal ways for state-of-the-art MARL algorithms. The VMAS framework is available at this https URL. A video of VMAS scenarios and experiments is available at this https URL}{here}\footnote{\url{this https URL.
<<<
翻译
95.
尹志
(2022-07-30 22:41):
#paper https://doi.org/10.48550/arXiv.2205.01529 Masked Generative Distillation ECCV 2022. 这是一篇知识蒸馏的文章,通过类似对比学习的方式去生成特征,从而实现蒸馏。我们知道,知识蒸馏作为一个通用的技巧,已经被用于各类
机器学习任务,在视觉上比如分类、分割、检测等。一般来说蒸馏算法通过使得学生模仿老师特征去提高学生特征的表征能力。但这篇文章提出,学生不用去模仿老师的特征了,干脆自己生成特征好了,即通过对学生特征进行随机遮盖,然后用学生的部分特征去生成老师特征。这样学生特征就具有了较强的表征能力。这个想法很有意思,我打个比方(可能不太合适),就像本来是要学习老师的一举一动,但是现在这个老师不太出现,你不方便直接模仿,那就学生自己通过监督,去盲猜老师的特征什么样的,这样多猜几次,每次都能猜准的时候,说明对这位老师已经很熟悉了,然后说明学生的表征能力就比较强了。通过这个方式,作者在图像分类、目标检测、语义分割、实例分割等多种任务上,在不同数据集不同model的基础上,做了大量实验,发现性能都得到了提升(基本上都有2-3个点的提升,具体数值见文献)。
arXiv,
2022.
DOI: 10.48550/arXiv.2205.01529
Abstract:
Knowledge distillation has been applied to various tasks successfully. The current distillation algorithm usually improves students' performance by imitating the output of the teacher. This paper shows that teachers can …
>>>
Knowledge distillation has been applied to various tasks successfully. The current distillation algorithm usually improves students' performance by imitating the output of the teacher. This paper shows that teachers can also improve students' representation power by guiding students' feature recovery. From this point of view, we propose Masked Generative Distillation (MGD), which is simple: we mask random pixels of the student's feature and force it to generate the teacher's full feature through a simple block. MGD is a truly general feature-based distillation method, which can be utilized on various tasks, including image classification, object detection, semantic segmentation and instance segmentation. We experiment on different models with extensive datasets and the results show that all the students achieve excellent improvements. Notably, we boost ResNet-18 from 69.90% to 71.69% ImageNet top-1 accuracy, RetinaNet with ResNet-50 backbone from 37.4 to 41.0 Boundingbox mAP, SOLO based on ResNet-50 from 33.1 to 36.2 Mask mAP and DeepLabV3 based on ResNet-18 from 73.20 to 76.02 mIoU. Our codes are available at this https URL.
<<<
翻译
96.
王昊
(2022-07-28 09:51):
#paper doi:10.48550/arXiv.2207.04630 Yi Ma, Doris Tsao, and Heung-Yeung Shum. 2022. On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence. 作者马毅数学功底很好,和做神经科学的Doris Tsao合作的一篇讲述他们认为的2个重要的AI基本原理的文章。本文提出了一个理解深度神经网络的新框架:压缩闭环转录,并回答了从数据中学习的目标是什么,如何衡量?(信息编码论)以及 如何通过高效和有效的计算实现这样的目标?(控制)这两个问题。提出理解AI的两个基本原理:简约性与自洽性。
arXiv,
2022.
DOI: 10.48550/arXiv.2207.04630
Abstract:
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in …
>>>
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that address two fundamental questions regarding Intelligence: what to learn and how to learn, respectively. We believe the two principles are the cornerstones for the emergence of Intelligence, artificial or natural. While these two principles have rich classical roots, we argue that they can be stated anew in entirely measurable and computable ways. More specifically, the two principles lead to an effective and efficient computational framework, compressive closed-loop transcription, that unifies and explains the evolution of modern deep networks and many artificial intelligence practices. While we mainly use modeling of visual data as an example, we believe the two principles will unify understanding of broad families of autonomous intelligent systems and provide a framework for understanding the brain.
<<<
翻译
97.
张德祥
(2022-07-19 18:49):
#paper https://doi.org/10.48550/arXiv.2207.04630 On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence 马毅的这篇论文已经有公众号报道过了,马毅结合自己的之前的两个工作,LDR 数据压缩及闭环生成模型的深度网络,将压缩和闭环生成提炼为简约和自洽的智能原则,本论文继续提出了更多通用性的想法,并扩展到3d视觉及强化学习并预测对神经科学及高级智能的影响。
arXiv,
2022.
DOI: 10.48550/arXiv.2207.04630
Abstract:
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in …
>>>
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that address two fundamental questions regarding Intelligence: what to learn and how to learn, respectively. We believe the two principles are the cornerstones for the emergence of Intelligence, artificial or natural. While these two principles have rich classical roots, we argue that they can be stated anew in entirely measurable and computable ways. More specifically, the two principles lead to an effective and efficient computational framework, compressive closed-loop transcription, that unifies and explains the evolution of modern deep networks and many artificial intelligence practices. While we mainly use modeling of visual data as an example, we believe the two principles will unify understanding of broad families of autonomous intelligent systems and provide a framework for understanding the brain.
<<<
翻译
98.
王昊
(2022-06-30 17:08):
#paper doi:https://doi.org/10.48550/arXiv.2201.12086 BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. arXiv:2201.12086 [cs].
BLIP 是一个统一的视觉语言预训练(vision-language pre-training, VLP)框架,从有噪声的图像文本对中学习。 BLIP 通过自展标注(bootstrapping the captions),可以有效地利用带有噪声的 web 数据,其中标注器(captioner)生成标注,过滤器(filter)去除有噪声的标注。本模型属于开源的视觉语言模型中性能较好的(2022年6月),可以直接docker部署,应用于多个视觉语言下游任务。我们尝试了以后可以一定程度上实现zero-shot的功能。在VQA 2.0数据集上性能较好。思考下一步将其作为预训练模型,微调后应用于落地的其它下游任务。
arXiv,
2022.
DOI: 10.48550/arXiv.2201.12086
Abstract:
Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been …
>>>
Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. We achieve state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval (+2.7% in average recall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score). BLIP also demonstrates strong generalization ability when directly transferred to video-language tasks in a zero-shot manner. Code, models, and datasets are released at this https URL.
<<<
翻译
99.
Ricardo
(2022-05-30 23:39):
#paper https://arxiv.org/abs/2102.04159v3 Deep Residual Learning in Spiking Neural Networks. 2021年发表于NIPS。基于人工神经网络的现代深度学习技术在各个领域上都取得了相当大的进展,但是由于其数学上的黑箱不可解释性、功耗高的问题,有一部分研究开始关注于基于生物脉冲神经元的脉冲神经网络上(SNN)。SNN有较高的生物解释性、事件驱动性和低功耗等特点,被视为人工神经网络的潜在竞争对手。但是SNN仍然面临许多理论和工程问题,在一些复杂任务上的表现仍然比ANN差。基于残差学习在ANN上取得的巨大成功,自然会去研究如何利用残差学习去训练SNN。之前的一些研究仿照ANN中标准的残差模块,简单地将relu激活函数替换成脉冲神经元,但是这样的网络伴随着深度的增加会出现退化问题,从而难以实现残差学习。在这篇论文里,作者证明了之前在SNN上的残差学习方法会导致梯度爆炸/消失问题,从而难以实现identity mapping。因此,他们提出了一个方法用来解决这么一个梯度爆炸/消失问题。实验结果也挺漂亮的,在多个数据集上都比之前的snn方法更好,当然不如ann的结果啦。并且能够通过加深网络深度提高snn的performance。而且,也首次实现了能够直接训练超过100层的snn。
arXiv,
2022.
DOI: 10.48550/arXiv.2102.04159
Abstract:
Deep Spiking Neural Networks (SNNs) present optimization difficulties for gradient-based approaches due to discrete binary activation and complex spatial-temporal dynamics. Considering the huge success of ResNet in deep learning, it …
>>>
Deep Spiking Neural Networks (SNNs) present optimization difficulties for gradient-based approaches due to discrete binary activation and complex spatial-temporal dynamics. Considering the huge success of ResNet in deep learning, it would be natural to train deep SNNs with residual learning. Previous Spiking ResNet mimics the standard residual block in ANNs and simply replaces ReLU activation layers with spiking neurons, which suffers the degradation problem and can hardly implement residual learning. In this paper, we propose the spike-element-wise (SEW) ResNet to realize residual learning in deep SNNs. We prove that the SEW ResNet can easily implement identity mapping and overcome the vanishing/exploding gradient problems of Spiking ResNet. We evaluate our SEW ResNet on ImageNet, DVS Gesture, and CIFAR10-DVS datasets, and show that SEW ResNet outperforms the state-of-the-art directly trained SNNs in both accuracy and time-steps. Moreover, SEW ResNet can achieve higher performance by simply adding more layers, providing a simple method to train deep SNNs. To our best knowledge, this is the first time that directly training deep SNNs with more than 100 layers becomes possible.
<<<
翻译
100.
张浩彬
(2022-05-30 19:14):
#paper Wen, Ruofeng, et al. A Multi-Horizon Quantile Recurrent Forecaster. #paper Wen, Ruofeng, et al. A Multi-Horizon Quantile Recurrent Forecaster. DOI: 10.48550/arXiv.1711.11053
MQRNN,又是亚马逊的时序论文。之前看了DeepAR,可以对多个序列进行建模,并且也有很好的鲁棒性。但是相比之前的prophet和DeepAR,MQRNN走了另外一个路子,基于分位数的预测。这样的一个好处是,它认为我们不再去预测序列在t时刻的分布,而是预测t时刻的分位数,走了分位数回归的路子。另外,相比于DeepAR,MQRNN使用了水平多无预测,即不再采用迭代方式预测多步,而是一次性产生多步预测。按照论文的说法,这样的好处是提高了预测效率(毕竟可以并行),减少了累积误差(个人觉得这点,见仁见智,本质其实一样)
arXiv,
2017.
DOI: 10.48550/arXiv.1711.11053
Abstract:
We propose a framework for general probabilistic multi-step time series regression. Specifically, we exploit the expressiveness and temporal nature of Sequence-to-Sequence Neural Networks (e.g. recurrent and convolutional structures), the nonparametric …
>>>
We propose a framework for general probabilistic multi-step time series regression. Specifically, we exploit the expressiveness and temporal nature of Sequence-to-Sequence Neural Networks (e.g. recurrent and convolutional structures), the nonparametric nature of Quantile Regression and the efficiency of Direct Multi-Horizon Forecasting. A new training scheme, *forking-sequences*, is designed for sequential nets to boost stability and performance. We show that the approach accommodates both temporal and static covariates, learning across multiple related series, shifting seasonality, future planned event spikes and cold-starts in real life large-scale forecasting. The performance of the framework is demonstrated in an application to predict the future demand of items sold on this http URL, and in a public probabilistic forecasting competition to predict electricity price and load.
<<<
翻译