来自杂志 arXiv 的文献。
当前共找到 137 篇文献分享,本页显示第 61 - 80 篇。
61.
符毓 (2023-08-31 22:39):
#paper doi.org/10.48550/arXiv.2303.09165 2023, A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation。 为了解决机器视觉中大量人工标注的成本问题,团队尝试通过用合成数据的方式解决。基于一定规则生成合成数据后,本文展示了通过合成数据进行预训练的方式优于真实数据,同时也能优于几种数据增加后的结果的可能性。未来应用具有较大的想象力
Abstract:
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data. However, exhaustive data annotation is impracticable for each task of all domains of … >>>
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data. However, exhaustive data annotation is impracticable for each task of all domains of interest, due to high labor costs and unguaranteed labeling accuracy. Besides, the uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist. All these nuisances may hinder the verification of typical theories and exposure to new findings. To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization. We in this work push forward along this line by doing profound and extensive research on bare supervised learning and downstream domain adaptation. Specifically, under the well-controlled, IID data setting enabled by 3D rendering, we systematically verify the typical, important learning insights, e.g., shortcut learning, and discover the new laws of various data regimes and network architectures in generalization. We further investigate the effect of image formation factors on generalization, e.g., object scale, material texture, illumination, camera viewpoint, and background in a 3D scene. Moreover, we use the simulation-to-reality adaptation as a downstream task for comparing the transferability between synthetic and real data when used for pre-training, which demonstrates that synthetic data pre-training is also promising to improve real test results. Lastly, to promote future research, we develop a new large-scale synthetic-to-real benchmark for image classification, termed S2RDA, which provides more significant challenges for transfer from simulation to reality. The code and datasets are available at this https URL. <<<
翻译
62.
尹志 (2023-08-31 22:11):
#paper https://doi.org/10.48550/arXiv.1812.07907 PnP-AdaNet: Plug-and-Play Adversarial Domain Adaptation Network at Unpaired Cross-Modality Cardiac Segmentation。调研高效生成模型的过程中偶遇的论文,发现还是有点意思的。文章提出了一个网络结构:PnP-AdaNet,实现了无监督的不同模态间分割任务领域适应。考虑到是2018年的老文章,其替换网络结构和利用对抗学习的想法现在已经比较常见,但我认为替换网络的思想在大模型盛行的今天有着更深刻的内涵,本人手头的一个研究主题也是沿着这条线索,目前看部分实验结果还是很不错的。
Abstract:
Deep convolutional networks have demonstrated the state-of-the-art performance on various medical image computing tasks. Leveraging images from different modalities for the same analysis task holds clinical benefits. However, the generalization … >>>
Deep convolutional networks have demonstrated the state-of-the-art performance on various medical image computing tasks. Leveraging images from different modalities for the same analysis task holds clinical benefits. However, the generalization capability of deep models on test data with different distributions remain as a major challenge. In this paper, we propose the PnPAdaNet (plug-and-play adversarial domain adaptation network) for adapting segmentation networks between different modalities of medical images, e.g., MRI and CT. We propose to tackle the significant domain shift by aligning the feature spaces of source and target domains in an unsupervised manner. Specifically, a domain adaptation module flexibly replaces the early encoder layers of the source network, and the higher layers are shared between domains. With adversarial learning, we build two discriminators whose inputs are respectively multi-level features and predicted segmentation masks. We have validated our domain adaptation method on cardiac structure segmentation in unpaired MRI and CT. The experimental results with comprehensive ablation studies demonstrate the excellent efficacy of our proposed PnP-AdaNet. Moreover, we introduce a novel benchmark on the cardiac dataset for the task of unsupervised cross-modality domain adaptation. We will make our code and database publicly available, aiming to promote future studies on this challenging yet important research topic in medical imaging. <<<
翻译
63.
尹志 (2023-07-31 22:52):
#paper doi: https://doi.org/10.48550/arXiv.2210.13695 Structure-based Drug Design with Equivariant Diffusion Models 又读了一遍这篇文献,用等变扩散模型进行结构化药物设计确实是一种有效的药物设计方式,越来越多的工作也在不断证明它的价值。这篇工作挺经典的(虽然貌似被iclr拒了),它基于蛋白质口袋利用se3等变扩散模型进行了分子生成。大量实验证明它生成药物分子的新颖性和多样性在效率和有效性上都很不错。文章还讨论了使用该方法对现有分子的优化,基于补全进行分子设计等问题,虽然在效果上还存在很多缺陷,但这些思路对于小分子药物设计及现有方法的改进都非常有价值。
Abstract:
Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. In this paper, we formulate SBDD as a 3D-conditional generation … >>>
Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. In this paper, we formulate SBDD as a 3D-conditional generation problem and present DiffSBDD, an SE(3)-equivariant 3D-conditional diffusion model that generates novel ligands conditioned on protein pockets. Comprehensive in silico experiments demonstrate the efficiency and effectiveness of DiffSBDD in generating novel and diverse drug-like ligands with competitive docking scores. We further explore the flexibility of the diffusion framework for a broader range of tasks in drug design campaigns, such as off-the-shelf property optimization and partial molecular design with inpainting. <<<
翻译
64.
Ricardo (2023-07-31 22:16):
#paper doi: https://doi.org/10.48550/arXiv.2112.05149 DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model 形变图像配准是医学成像的基本任务之一。经典的配准算法通常需要较高的计算成本进行迭代优化。尽管基于深度学习的图像配准方法已被用于快速图像配准,但要获得从运动图像到固定图像的真实连续形变且拓扑折叠较少,仍然是一个挑战性的问题。为解决这个问题,本文提出一种新的基于扩散模型的图像配准方法DiffuseMorph。DiffuseMorph不仅可以通过反向扩散生成合成的变形图像,而且可以通过变形场进行图像配准。具体来说,形变场由运动图像和固定图像之间的形变的条件得分函数生成,通过简单缩放得分的潜在特征即可从连续形变中进行配准。在2D人脸和3D医学图像配准任务上的实验结果表明,该方法可以提供灵活的形变和拓扑保持能力。
Abstract:
Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed … >>>
Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed for fast image registration, it is still challenging to obtain realistic continuous deformations from a moving image to a fixed image with less topological folding problem. To address this, here we present a novel diffusion-model-based image registration method, called DiffuseMorph. DiffuseMorph not only generates synthetic deformed images through reverse diffusion but also allows image registration by deformation fields. Specifically, the deformation fields are generated by the conditional score function of the deformation between the moving and fixed images, so that the registration can be performed from continuous deformation by simply scaling the latent feature of the score. Experimental results on 2D facial and 3D medical image registration tasks demonstrate that our method provides flexible deformations with topology preservation capability. <<<
翻译
65.
符毓 (2023-07-31 16:41):
#paper doi: 10.48550/arXiv.2307.05973 2023, Composable 3D Value Maps for Robotic Manipulation with Language Models. 李飞飞团队最新论文研究,把语言模型与机器人操作结合。与大语言模型结合后人机交互效率得到提高,并且能做到基于视觉的实时轨迹规划。目测机械臂移动速率为常见机械臂工作速率的八分之一,到真实应用的话稳定性还需要进一步提高(超过25%的出错率)
Abstract:
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, … >>>
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, most still rely on pre-defined motion primitives to carry out the physical interactions with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a visual-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Project website: this https URL <<<
翻译
66.
Ricardo (2023-06-30 23:49):
#paper Denoising Diffusion Probabilistic Models. doi: https://doi.org/10.48550/arXiv.2006.11239 大名鼎鼎的DDPM模型,算法结构出奇的简单,分为前向加噪过程和反向去噪过程。前向加噪过程是通过在多个时间步里加小噪声,反向去噪过程则在每一个时间步上通过网络学习噪声分布去掉噪声。通过一长串的公式推导,其最终的损失函数相当的简单,就是个mse。看起来就像是很多个VAE叠加在一起。DDPM的一个缺点就是采样步长很长,通常需要1000步以上;而之后提出的DDIM模型将这个采样步长缩小到了50步左右,而这个效果是通过牺牲生成样本多样性实现的。DDIM模型通过一个叫做飘逸扩散方程的模型(这个模型在行为决策等研究中常常被采纳)来解释其原理。原本的DDPM模型其实只有漂移扩散方程中的扩散部分,而DDIM模型则加上了漂移的部分,可以将模型往数据采样密度较高的地方去靠近。
Abstract:
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training … >>>
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at this https URL <<<
翻译
67.
张浩彬 (2023-06-30 11:45):
#paper The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting doi: https://doi.org/10.48550/arXiv.2304.05206Focus to learn more 专门研究了针对多元时间序列的预测问题,探讨了使用独立预测以及联合预测的差异,证明了由于分布偏移的存在,独立预测的方法更好,应为其更加有利于缓解分布偏移的问题,提高模型的繁华性。并且文章证明了独立预测和联合预测,是一种模型容量和模型鲁棒性的权衡。随州论文提出了包括正则化,低秩分解、采用MAE代替MSE,调整序列长度等方法提高联合预测的精度
Abstract:
Multivariate time series data comprises various channels of variables. The multivariate forecasting models need to capture the relationship between the channels to accurately predict future values. However, recently, there has … >>>
Multivariate time series data comprises various channels of variables. The multivariate forecasting models need to capture the relationship between the channels to accurately predict future values. However, recently, there has been an emergence of methods that employ the Channel Independent (CI) strategy. These methods view multivariate time series data as separate univariate time series and disregard the correlation between channels. Surprisingly, our empirical results have shown that models trained with the CI strategy outperform those trained with the Channel Dependent (CD) strategy, usually by a significant margin. Nevertheless, the reasons behind this phenomenon have not yet been thoroughly explored in the literature. This paper provides comprehensive empirical and theoretical analyses of the characteristics of multivariate time series datasets and the CI/CD strategy. Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series. In contrast, the CI approach trades capacity for robust prediction. Practical measures inspired by these analyses are proposed to address the capacity and robustness dilemma, including a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy. We hope our findings can raise awareness among researchers about the characteristics of multivariate time series and inspire the construction of better forecasting models. <<<
翻译
68.
Ricardo (2023-05-31 23:53):
#paper DOI:https://doi.org/10.48550/arXiv.2304.00217 DrDisco: Deep Registration for Distortion Correction of Diffusion MRI with single phase-encoding 弥散加权磁共振成像(DW-MRI)是一种对人脑白质束进行无创成像的方法。dw - mri通常采用高梯度回波平面成像(echo-planar imaging, EPI)获得,会引入严重的几何畸变,影响进一步的分析。大多数校正失真的工具需要两张不同相位编码方向获取的最小加权DW-MRI图像(B0),处理每个受试者可能需要数小时。由于大量扩散数据仅在单一相位编码方向下获取,现有方法的应用受到限制。本文提出一种基于深度学习的配准方法,仅使用从单一相位编码方向获得的B0来纠正失真。通过一个深度学习模型,将未失真的t1加权图像与失真的B0图像进行配准,以消除失真。在训练过程中应用可微的互信息损失来改善模态间对齐。在Human Connectome Project数据集上的实验表明,所提出的方法在多个指标上优于SyN和VoxelMorph,且处理一个受试者只需几秒钟。
Abstract:
Diffusion-weighted magnetic resonance imaging (DW-MRI) is a non-invasive way of imaging white matter tracts in the human brain. DW-MRIs are usually acquired using echo-planar imaging (EPI) with high gradient fields, … >>>
Diffusion-weighted magnetic resonance imaging (DW-MRI) is a non-invasive way of imaging white matter tracts in the human brain. DW-MRIs are usually acquired using echo-planar imaging (EPI) with high gradient fields, which could introduce severe geometric distortions that interfere with further analyses. Most tools for correcting distortion require two minimally weighted DW-MRI images (B0) acquired with different phase-encoding directions, and they can take hours to process per subject. Since a great amount of diffusion data are only acquired with a single phase-encoding direction, the application of existing approaches is limited. We propose a deep learning-based registration approach to correct distortion using only the B0 acquired from a single phase-encoding direction. Specifically, we register undistorted T1-weighted images and distorted B0 to remove the distortion through a deep learning model. We apply a differentiable mutual information loss during training to improve inter-modality alignment. Experiments on the Human Connectome Project dataset show the proposed method outperforms SyN and VoxelMorph on several metrics, and only takes a few seconds to process one subject. <<<
翻译
69.
符毓 (2023-05-31 22:40):
#paper doi.org/10.48550/arXiv.2212.12669 Nature, 2023, On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective。本文讨论了由于不确定性和动态环境,在现实场景中实现机器主导的智能决策(IDM)所面临的挑战。作者提出了一个基础决策模型(FDM)的想法来克服这些挑战,并使IDM得到广泛采用。本文还展示了人工智能增强IDM潜在的各种方法和理论可行性。
Abstract:
The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems. Consequently, IDM should possess the ability to continuously … >>>
The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems. Consequently, IDM should possess the ability to continuously acquire new skills and effectively generalize across a broad range of applications. The advancement of Artificial General Intelligence (AGI) that transcends task and application boundaries is critical for enhancing IDM. Recent studies have extensively investigated the Transformer neural architecture as a foundational model for various tasks, including computer vision, natural language processing, and reinforcement learning. We propose that a Foundation Decision Model (FDM) can be developed by formulating diverse decision-making tasks as sequence decoding tasks using the Transformer architecture, offering a promising solution for expanding IDM applications in complex real-world situations. In this paper, we discuss the efficiency and generalization improvements offered by a foundation decision model for IDM and explore its potential applications in multi-agent game AI, production scheduling, and robotics tasks. Lastly, we present a case study demonstrating our FDM implementation, DigitalBrain (DB1) with 1.3 billion parameters, achieving human-level performance in 870 tasks, such as text generation, image captioning, video game playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 represents an initial step toward more autonomous and efficient real-world IDM applications. <<<
翻译
70.
周周复始 (2023-05-31 22:29):
#paper doi:https://doi.org/10.48550/arXiv.2201.00308. DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents.2022.目前扩散概率模型在几个有竞争性图像合成基准上产生最先进的结果,但缺乏低维、可解释的潜在空间,并且生成速度较慢。而变分自编码器(VAEs)通常具有低维潜在空间,但生成的样本质量较差。基于此本文提出了一种新的生成框架DiffuseVAE,它将VAE集成到扩散模型框架中,并利用它为扩散模型设计新的条件参数化。文章表明,所得到的模型为扩散模型配备了低维VAE推断潜在代码,可用于下游任务,如条件生成。
Abstract:
Diffusion probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the … >>>
Diffusion probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand, standard Variational Autoencoders (VAEs) typically have access to a low-dimensional latent space but exhibit poor sample quality. We present DiffuseVAE, a novel generative framework that integrates VAE within a diffusion model framework, and leverage this to design novel conditional parameterizations for diffusion models. We show that the resulting model equips diffusion models with a low-dimensional VAE inferred latent code which can be used for downstream tasks like controllable synthesis. The proposed method also improves upon the speed vs quality tradeoff exhibited in standard unconditional DDPM/DDIM models (for instance, FID of 16.47 vs 34.36 using a standard DDIM on the CelebA-HQ-128 benchmark using T=10 reverse process steps) without having explicitly trained for such an objective. Furthermore, the proposed model exhibits synthesis quality comparable to state-of-the-art models on standard image synthesis benchmarks like CIFAR-10 and CelebA-64 while outperforming most existing VAE-based methods. Lastly, we show that the proposed method exhibits inherent generalization to different types of noise in the conditioning signal. For reproducibility, our source code is publicly available at this https URL. <<<
翻译
71.
张浩彬 (2023-05-30 11:48):
#paper:doi:10.48550/arXiv.2010.04515 Principal Component Analysis using Frequency Components of Multivariate Time Series 提出了一个新的谱分解方法,使得对多元时间序列(二阶平稳,宽平稳)进行分解,从而使得分解后的子序列在组内是有非零的谱相关,而跨组的子序列则具有零的谱相关性。从写作上,则是典型的问题引入,方法介绍、理论的渐近性质证明,数值模拟,实证研究,其中有大量的推导。
Abstract:
Dimension reduction techniques for multivariate time series decompose the observed series into a few useful independent/orthogonal univariate components. We develop a spectral domain method for multivariate second-order stationary time series … >>>
Dimension reduction techniques for multivariate time series decompose the observed series into a few useful independent/orthogonal univariate components. We develop a spectral domain method for multivariate second-order stationary time series that linearly transforms the observed series into several groups of lower-dimensional multivariate subseries. These multivariate subseries have non-zero spectral coherence among components within a group but have zero spectral coherence among components across groups. The observed series is expressed as a sum of frequency components whose variances are proportional to the spectral matrices at the respective frequencies. The demixing matrix is then estimated using an eigendecomposition on the sum of the variance matrices of these frequency components and its asymptotic properties are derived. Finally, a consistent test on the cross-spectrum of pairs of components is used to find the desired segmentation into the lower-dimensional subseries. The numerical performance of the proposed method is illustrated through simulation examples and an application to modeling and forecasting wind data is presented. <<<
翻译
72.
张德祥 (2023-05-16 08:14):
#paper https://doi.org/10.48550/arXiv.2203.11740 我们可以把我们的大脑想象成是地球,地心熔岩的产生如同在海马体的短期记忆的发生,过程是量子的;地表的地震因为势能释放,选出强的短期记忆成为长期记忆存储在不同皮层的记忆印记细胞能被释放。 AI+脑科学+量子力学的结合。我们提出了PNN,但它不仅仅是简单的时间序列模型。 除了突触连接的共享权重,我们提出了新的神经网络包括突触有效范围权重也会进行前向和反向计算。而且很多仿真是RNN无法实现的。 正向和负向记忆的大脑塑性是量子的并产生短期记忆,并且波函数展现出在一段时间表现出指数衰减,在海马体里产生。而指数衰减是因为壁垒,壁垒可能和星形胶质细胞有关。工作记忆的大脑塑性在大脑流动从海马体到不同皮层通过方向导数。强的工作记忆的大脑塑性转变成长期记忆也就是最大的方向导数,而最大的方向导数就是梯度。这样长期记忆是工作记忆的大脑塑性的梯度。短期记忆变成长期记忆的过程,也就是非经典力学变成经典力学的过程。 PNN的仿真符合了6篇正刊、6篇子刊和1篇物理顶刊的脑科学实验和假设。 更多可以参考: https://mp.weixin.qq.com/s/k-KD1KcQo9FiYcQvSypBjQ
Abstract:
In addition to the shared weights of the synaptic connections, we proposed a new neural network that includes the synaptic effective range weights for both the forward and back propagation. … >>>
In addition to the shared weights of the synaptic connections, we proposed a new neural network that includes the synaptic effective range weights for both the forward and back propagation. And lots of simulations were used which RNN cannot be achieved. The simulations of PNN fit very well in experiments and hypotheses of 6 papers CNS Journals, 6 papers of CNS family Journals and 1 paper top Physics Journal [14-26]. The brain plasticity in positive or negative memory may be quantum and produce short-term memory, and exhibits an exponential decay in the wave function over a period of time, produced in the hippocampus. And exponential decay occurs due to barriers, and barriers can refer to astrocytes. Brain plasticity in working memory flows through the brain, from the hippocampus to the cortex, through directional derivatives. The strong working memory brain plasticity turns to long-term memory means maximum of directional derivatives, and maximum of directional derivatives is gradient. Thus, long-term memory signifies the gradient of brain plasticity in working memory. The process of short-term memory turns to long-term memory is the process of non-classically turns to classically. Astrocytic cortex memory persistence factor also inhibits local synaptic accumulation, and the model inspires experiments. This could be the process of astrocytes phagocytose synapses is driven by both positive and negative memories of plasticity in the brain. In simulation, it is possible that thicker cortices and more diverse individuals within the brain could have high IQ, but thickest cortices and most diverse individuals may have low IQ in simulation. PSO considers global solution or best previous solution, but also considers relatively good and relatively inferior solution. And PNN modified ResNet to consider memory gradient. The simple PNN only considers astrocytes phagocytosed synapses. <<<
翻译
73.
姗姗来迟 (2023-05-14 19:34):
#paper Multimodal Graph Transformer for Multimodal Question Answering https://arxiv.org/abs/2305.00581 这项工作从这两个世界中受益,并提出了一种新的多模态图转换器,用于需要跨多模态执行推理的问答任务。引入了一种涉及图形的即插即用类注意机制,将从文本和视觉数据中获得的多模态图形信息作为有效的先验信息整合到vanilla自注意力中。 具体来说,文章构建文本图、密集区域图和语义图来生成邻接矩阵,然后将它们与输入的视觉和语言特征组合在一起进行下游推理。 学习笔记链接:https://blog.csdn.net/weixin_44845357/article/details/130577459?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22130577459%22%2C%22source%22%3A%22weixin_44845357%22%7D
Abstract:
Despite the success of Transformer models in vision and language tasks, they often learn knowledge from enormous data implicitly and cannot utilize structured input data directly. On the other hand, … >>>
Despite the success of Transformer models in vision and language tasks, they often learn knowledge from enormous data implicitly and cannot utilize structured input data directly. On the other hand, structured learning approaches such as graph neural networks (GNNs) that integrate prior information can barely compete with Transformer models. In this work, we aim to benefit from both worlds and propose a novel Multimodal Graph Transformer for question answering tasks that requires performing reasoning across multiple modalities. We introduce a graph-involved plug-and-play quasi-attention mechanism to incorporate multimodal graph information, acquired from text and visual data, to the vanilla self-attention as effective prior. In particular, we construct the text graph, dense region graph, and semantic graph to generate adjacency matrices, and then compose them with input vision and language features to perform downstream reasoning. Such a way of regularizing self-attention with graph information significantly improves the inferring ability and helps align features from different modalities. We validate the effectiveness of Multimodal Graph Transformer over its Transformer baselines on GQA, VQAv2, and MultiModalQA datasets. <<<
翻译
74.
muton (2023-04-30 23:19):
#paper Amygdala and cortical gamma-band responses to emotional faces depend on the attended to valence https://arxiv.org/pdf/2304.05700.pdf 杏仁核被认为贡献于情绪面孔视觉加工中自下而上的注意偏好,然而其对于情绪的反应如何与自上而下的注意相互作用却并不清楚。并且,杏仁核对情绪和注意的反应与头皮脑电相比有多大程度相似也仍有待探究。因此作者分别记录了杏仁核脑区的颅内电极以及头皮脑电伽马段的脑电活动来探究面孔加工过程中情绪和注意的交互。结果发现,在情绪检测实验中杏仁核的高频伽马出现在以中性面孔作为识别目标时,当以负性面孔作为识别目标时,低频伽马在负性面孔出现时会显著增加,并且不仅局限于杏仁核,同时在后部脑区头皮脑电记录中也存在,且时间窗早于杏仁核。这一结果符合情绪加工的多通路模型,并且是从注意(自上而下)的角度发现了伽马波在加工情绪面孔中的作用。
Abstract:
The amygdala is assumed to contribute to a bottom-up attentional bias during visual processing of emotional faces. Still, how its response to emotion interacts with top-down attention is not fully … >>>
The amygdala is assumed to contribute to a bottom-up attentional bias during visual processing of emotional faces. Still, how its response to emotion interacts with top-down attention is not fully understood. It is also unclear if amygdala activity and scalp EEG respond to emotion and attention in a similar way. Therefore, we studied the interaction of emotion and attention during face processing in oscillatory gamma-band activity (GBA) in the amygdala and on the scalp. Amygdala signals were recorded via intracranial EEG (iEEG) in 9 patients with epilepsy. Scalp recordings were collected from 19 healthy participants. Three randomized blocks of angry, neutral, and happy faces were presented, and either negative, neutral, or positive expressions were denoted as targets. Both groups detected happy faces fastest and most accurately. In the amygdala, the earliest effect was observed around 170 ms in high GBA (105-117.5 Hz) when neutral faces served as targets. Here, GBA was higher for emotional than neutral faces. During attention to negative faces, low GBA (< 90 Hz) increased specifically for angry faces both in the amygdala and over posterior scalp regions, albeit earlier on the scalp (60 ms) than in the amygdala (210 ms). From 570 ms, amygdala high GBA (117.5-145 Hz) was also increased for both angry and neutral, compared to happy, faces. When positive faces were the targets, GBA did not differentiate between expressions. The present data reveal that attention-independent emotion detection in amygdala high GBA may only occur during a neutral focus of attention. Top-down threat vigilance coordinates widespread low GBA, biasing stimulus processing in favor of negative faces. These results are in line with a multi-pathway model of emotion processing and help specify the role of GBA in this process by revealing how attentional focus can tune timing and amplitude of emotional GBA responses. <<<
翻译
75.
周周复始 (2023-04-30 23:12):
#paper doi: https://doi.org/10.48550/arXiv.2112.05149.DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model.可形变图像配准是医学成像中的基本任务之一。经典的配准算法通常需要较高的计算代价来进行迭代优化。虽然基于深度学习的方法进行快速图像配准已经发展起来,但要获得从移动图像到固定图像较少拓扑折叠的真实连续形变问题仍然具有挑战性。为了解决这个问题,本文提出了一种新的基于扩散模型的图像配准方法,称为DiffuseMorph。DiffuseMorph不仅通过逆扩散过程生成合成的变形图像,并且通过形变场进行图像配准。具体来说,形变场由移动图像和固定图像之间形变的条件分数函数生成。所以可以通过简单地缩放分数的潜在特征,对连续形变进行配准。2D面部和3D医学图像配准任务的实验结果表明,本文方法提供了灵活的形变和拓扑保持能力。
Abstract:
Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed … >>>
Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed for fast image registration, it is still challenging to obtain realistic continuous deformations from a moving image to a fixed image with less topological folding problem. To address this, here we present a novel diffusion-model-based image registration method, called DiffuseMorph. DiffuseMorph not only generates synthetic deformed images through reverse diffusion but also allows image registration by deformation fields. Specifically, the deformation fields are generated by the conditional score function of the deformation between the moving and fixed images, so that the registration can be performed from continuous deformation by simply scaling the latent feature of the score. Experimental results on 2D facial and 3D medical image registration tasks demonstrate that our method provides flexible deformations with topology preservation capability. <<<
翻译
76.
张浩彬 (2023-04-28 13:45):
#paper An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling DOI:arXiv:1803.01271 . 最近密集地做时序问题的分享,认真看了一下TCN的原文.除了RNN那一套,TCN还是用得比较多。为了在不增加太多层的情况下实现大的感受野,通过空洞卷积来实现,并通过padding和裁剪的方式避免了数据泄露问题。一个TCN块有两个空洞因果卷积,激活层,norm层以及一个残差链接组成。实验证明了TCN的超参数相对不敏感,但卷积核大小k是个关键,另外drop out 和梯度裁剪也有较大的帮助。
Abstract:
For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and … >>>
For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks. To assist related work, we have made code available at this http URL . <<<
翻译
77.
姗姗来迟 (2023-04-19 13:44):
#paper arXiv:2103.00020 Learning Transferable Visual Models From Natural Language Supervision 前天拜读了CLIP论文并去了解了一下论文中提到的prompt 拜读笔记见博文:CLIP论文拜读及理解 链接:https://blog.csdn.net/weixin_44845357/article/details/130206779
Abstract:
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is … >>>
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at this https URL. <<<
翻译
78.
张德祥 (2023-04-16 11:20):
#paper https://doi.org/10.48550/arXiv.2302.10051 一种用于理解神经计算算法基础的既定规范方法是从原则计算目 标中导出在线算法, 并评估它们与解剖学和生理学观察的兼容性。 相似性匹配目标已成为成功导出在线算法的起点, 这些算法映射到具有点神经元和 Hebbian/anti‐Hebbian 可塑性的神经网络 (NN)。这些神经网络模型解释了许多解剖学和生理学观察; 然而, 这些目 标的计算能力有限, 并且派生的 NN 无法解释在整个大脑中普遍存在的多隔室神经元结构和非赫布形式的可塑性。在本文中, 我们回顾并统一了相似性匹配方法的最新扩展, 以解决更复杂的目 标, 包括范围广泛的无监督和自 监督学习任务, 这些任务可以表述为广义特征值问题或非负矩阵分解问题。有趣的是, 源自这些目 标的在线算法自 然地映射到具有多隔室神经元和局部非赫布学习规则的神经网络。 因此, 这种相似性匹配方法的统一扩展提供了一个规范框架, 有助于理解整个大脑中发现的多区室神经元结构和非赫布可塑性。
Abstract:
An established normative approach for understanding the algorithmic basis of neural computation is to derive online algorithms from principled computational objectives and evaluate their compatibility with anatomical and physiological observations. … >>>
An established normative approach for understanding the algorithmic basis of neural computation is to derive online algorithms from principled computational objectives and evaluate their compatibility with anatomical and physiological observations. Similarity matching objectives have served as successful starting points for deriving online algorithms that map onto neural networks (NNs) with point neurons and Hebbian/anti-Hebbian plasticity. These NN models account for many anatomical and physiological observations; however, the objectives have limited computational power and the derived NNs do not explain multi-compartmental neuronal structures and non-Hebbian forms of plasticity that are prevalent throughout the brain. In this article, we review and unify recent extensions of the similarity matching approach to address more complex objectives, including a broad range of unsupervised and self-supervised learning tasks that can be formulated as generalized eigenvalue problems or nonnegative matrix factorization problems. Interestingly, the online algorithms derived from these objectives naturally map onto NNs with multi-compartmental neurons and local, non-Hebbian learning rules. Therefore, this unified extension of the similarity matching approach provides a normative framework that facilitates understanding the multi-compartmental neuronal structures and non-Hebbian plasticity found throughout the brain. <<<
翻译
79.
林海onrush (2023-03-31 23:17):
#paper, BloombergGPT: A Large Language Model for Finance, doi:10.48550/arXiv.2303.17564, ChatGPT引爆的AI热潮也“烧到了”金融圈,彭博社重磅发布为金融界打造的大型语言模型(LLM)——BloombergGPT。3月30日,根据彭博社最新发布的报告显示,其构建迄今为止最大的特定领域数据集,并训练了专门用于金融领域的LLM,开发了拥有500亿参数的语言模型——BloombergGPT。报告显示,该模型依托彭博社的大量金融数据源,构建了一个3630亿个标签的数据集,支持金融行业内的各类任务。该模型在金融任务上的表现远超过现有模型,且在通用场景上的表现与现有模型也能一较高下。报告指出,从测试来看,BloombergGPT在五项任务中的四项(ConvFinQA,FiQA SA,FPB和Headline)表现最佳,在NER(Named Entity Recognition)中排名第二。因此,BloombergGPT有其优势性。
Abstract:
The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models … >>>
The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. As a next step, we plan to release training logs (Chronicles) detailing our experience in training BloombergGPT. <<<
翻译
80.
Vincent (2023-03-31 15:34):
#paper https://doi.org/10.48550/arXiv.1904.10098 ICML 2019 DAG-GNN: DAG Structure Learning with Graph Neural Networks. 有向无环图(DAG)的结构学习是一项十分具有挑战性的工作,其搜索空间随着节点数的增多而呈现指数式的增长。常用的研究手段是将结构学习转化为一种score的优化问题。为了让问题可解,传统的方法通常考虑线性结构方程模型(Linear SEM),这篇文章基于线性SEM的框架,发展了一套基于变分自编码器VAE和图神经网络GNN的DAG学习方法,得益于神经网络的非线性拟合,这套方法在保证至少比线性SEM好的情况下还能解决一些非线性的问题。通过数据仿真和真实数据的学习,文章验证了该方法的准确度比线性SEM好,假发现率比线性SEM低。
Abstract:
Learning a faithful directed acyclic graph (DAG) from samples of a joint distribution is a challenging combinatorial problem, owing to the intractable search space superexponential in the number of graph … >>>
Learning a faithful directed acyclic graph (DAG) from samples of a joint distribution is a challenging combinatorial problem, owing to the intractable search space superexponential in the number of graph nodes. A recent breakthrough formulates the problem as a continuous optimization with a structural constraint that ensures acyclicity (Zheng et al., 2018). The authors apply the approach to the linear structural equation model (SEM) and the least-squares loss function that are statistically well justified but nevertheless limited. Motivated by the widespread success of deep learning that is capable of capturing complex nonlinear mappings, in this work we propose a deep generative model and apply a variant of the structural constraint to learn the DAG. At the heart of the generative model is a variational autoencoder parameterized by a novel graph neural network architecture, which we coin DAG-GNN. In addition to the richer capacity, an advantage of the proposed model is that it naturally handles discrete variables as well as vector-valued ones. We demonstrate that on synthetic data sets, the proposed method learns more accurate graphs for nonlinearly generated samples; and on benchmark data sets with discrete variables, the learned graphs are reasonably close to the global optima. The code is available at \url{this https URL}. <<<
翻译
回到顶部