来自杂志 arXiv 的文献。
当前共找到 142 篇文献分享,本页显示第 61 - 80 篇。
61.
前进 (2023-12-27 15:11):
#paper arXiv:2312.11514v1 ,2023, LLM in a flash: Efficient Large Language Model Inference with Limited Memory 大型语言模型(LLMs)在现代自然语言处理中具有重要作用,但其高昂的计算和内存需求对于内存有限的设备构成了挑战。为了高效运行超过可用DRAM容量的LLMs,该论文采用了存储模型参数在闪存上,并按需将其调入DRAM的方法。研究方法包括构建与闪存行为协调的推理模型,并在两个关键领域进行优化:减少闪存传输的数据量和以更大、更连续的块来读取数据。在这个框架下,引入了两种主要技术:“windowing”策略通过重复使用先前激活的神经元减少数据传输,“row-column bunding”则充分利用了闪存的顺序数据访问特性,增加了从闪存中读取的数据块的大小。这些方法使得可以在有限DRAM上运行比原先两倍大的模型,相较于朴素的加载方法,在CPU和GPU上推断速度分别提高了4-5倍和20-25倍。
Abstract:
Large language models (LLMs) are central to modern natural languageprocessing, delivering exceptional performance in various tasks. However, theirintensive computational and memory requirements present challenges, especiallyfor devices with limited DRAM capacity. … >>>
Large language models (LLMs) are central to modern natural languageprocessing, delivering exceptional performance in various tasks. However, theirintensive computational and memory requirements present challenges, especiallyfor devices with limited DRAM capacity. This paper tackles the challenge ofefficiently running LLMs that exceed the available DRAM capacity by storing themodel parameters on flash memory but bringing them on demand to DRAM. Ourmethod involves constructing an inference cost model that harmonizes with theflash memory behavior, guiding us to optimize in two critical areas: reducingthe volume of data transferred from flash and reading data in larger, morecontiguous chunks. Within this flash memory-informed framework, we introducetwo principal techniques. First, "windowing'" strategically reduces datatransfer by reusing previously activated neurons, and second, "row-columnbundling", tailored to the sequential data access strengths of flash memory,increases the size of data chunks read from flash memory. These methodscollectively enable running models up to twice the size of the available DRAM,with a 4-5x and 20-25x increase in inference speed compared to naive loadingapproaches in CPU and GPU, respectively. Our integration of sparsity awareness,context-adaptive loading, and a hardware-oriented design paves the way foreffective inference of LLMs on devices with limited memory. <<<
翻译
62.
符毓 (2023-11-30 23:11):
#paper doi.org/10.48550/arXiv.2311.05332, 2023, On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving. 文远知行的团队近期的论文,把GPT应用在自动驾驶领域。测试结果显示GPT在图像识别,点云识别,天气识别,V2X图像,模拟图像识别,多角度图片识别都有较高准确率;在交通灯识别,左右空间区分上容易出错
Abstract:
The pursuit of autonomous driving technology hinges on the sophisticatedintegration of perception, decision-making, and control systems. Traditionalapproaches, both data-driven and rule-based, have been hindered by theirinability to grasp the nuance … >>>
The pursuit of autonomous driving technology hinges on the sophisticatedintegration of perception, decision-making, and control systems. Traditionalapproaches, both data-driven and rule-based, have been hindered by theirinability to grasp the nuance of complex driving environments and theintentions of other road users. This has been a significant bottleneck,particularly in the development of common sense reasoning and nuanced sceneunderstanding necessary for safe and reliable autonomous driving. The advent ofVisual Language Models (VLM) represents a novel frontier in realizing fullyautonomous vehicle driving. This report provides an exhaustive evaluation ofthe latest state-of-the-art VLM, GPT-4V(ision), and its application inautonomous driving scenarios. We explore the model's abilities to understandand reason about driving scenes, make decisions, and ultimately act in thecapacity of a driver. Our comprehensive tests span from basic scene recognitionto complex causal reasoning and real-time decision-making under varyingconditions. Our findings reveal that GPT-4V demonstrates superior performancein scene understanding and causal reasoning compared to existing autonomoussystems. It showcases the potential to handle out-of-distribution scenarios,recognize intentions, and make informed decisions in real driving contexts.However, challenges remain, particularly in direction discernment, trafficlight recognition, vision grounding, and spatial reasoning tasks. Theselimitations underscore the need for further research and development. Projectis now available on GitHub for interested parties to access and utilize:\url{https://github.com/PJLab-ADG/GPT4V-AD-Exploration} <<<
翻译
63.
Vincent (2023-11-30 16:34):
#paper Contrastive Variational Autoencoder Enhances Salient Features, arxiv, 2019 https://arxiv.org/abs/1902.04601 最近的对比PCA采用了对比学习的思路,能够捕捉目标数据集与背景之间的差异,从而实现保留对比信号的无监督降维。然而对比PCA跟PCA类似,只能对变量做线性组合进行降维,无法捕捉变量间的非线性关系。这篇文章对对比PCA做了拓展,使用变分自编码模型(VAE)来实现对非线性关系的捕捉,该方法称为对比VAE。对比VAE通过对数据集间的共享特征以及富集在目标数据中的特征进行显式建模,从而分离和增强目标数据中的突出潜在特征。该方法的运算时间与VAE类似,并且对噪音和数据纯度有较高的鲁棒性。文章在多个数据集上(例如手写数字MNIST)验证了该方法在捕捉突出潜在特征方面的有效性,比起传统的VAE也有持续提高。同时其作为一种生成式学习工具,训练好以后也能够用这些显著潜在特征来生成新的数据。
Abstract:
Variational autoencoders are powerful algorithms for identifying dominantlatent structure in a single dataset. In many applications, however, we areinterested in modeling latent structure and variation that are enriched in atarget … >>>
Variational autoencoders are powerful algorithms for identifying dominantlatent structure in a single dataset. In many applications, however, we areinterested in modeling latent structure and variation that are enriched in atarget dataset compared to some background---e.g. enriched in patients comparedto the general population. Contrastive learning is a principled framework tocapture such enriched variation between the target and background, butstate-of-the-art contrastive methods are limited to linear models. In thispaper, we introduce the contrastive variational autoencoder (cVAE), whichcombines the benefits of contrastive learning with the power of deep generativemodels. The cVAE is designed to identify and enhance salient latent features.The cVAE is trained on two related but unpaired datasets, one of which hasminimal contribution from the salient latent features. The cVAE explicitlymodels latent features that are shared between the datasets, as well as thosethat are enriched in one dataset relative to the other, which allows thealgorithm to isolate and enhance the salient latent features. The algorithm isstraightforward to implement, has a similar run-time to the standard VAE, andis robust to noise and dataset purity. We conduct experiments across diversetypes of data, including gene expression and facial images, showing that thecVAE effectively uncovers latent structure that is salient in a particularanalysis. <<<
翻译
64.
Ricardo (2023-10-31 22:15):
#paper https://doi.org/10.48550/arXiv.2308.01316 Patched Denoising Diffusion Models For High-Resolution Image Synthesis 最近在研究如何使用生成模型将脑分割图像映射回T1w/T2w图像,不过大多数医学图像生成算法都是基于patch的,然后将patch在体素空间拼回,但是这样的方法会出现边界不连续的现象。这篇文章提出用patch训练扩散模型,并在特征空间中消除边界效应。因此最近在尝试如何将这个方法应用于我的工作里。最近在做的工作是在全年龄段上构建脑模板图像,有机会可以和大家讲一讲这方面的工作。
Abstract:
We propose an effective denoising diffusion model for generatinghigh-resolution images (e.g., 1024$\times$512), trained on small-size imagepatches (e.g., 64$\times$64). We name our algorithm Patch-DM, in which a newfeature collage strategy is … >>>
We propose an effective denoising diffusion model for generatinghigh-resolution images (e.g., 1024$\times$512), trained on small-size imagepatches (e.g., 64$\times$64). We name our algorithm Patch-DM, in which a newfeature collage strategy is designed to avoid the boundary artifact whensynthesizing large-size images. Feature collage systematically crops andcombines partial features of the neighboring patches to predict the features ofa shifted image patch, allowing the seamless generation of the entire image dueto the overlap in the patch feature space. Patch-DM produces high-quality imagesynthesis results on our newly collected dataset of nature images(1024$\times$512), as well as on standard benchmarks of smaller sizes(256$\times$256), including LSUN-Bedroom, LSUN-Church, and FFHQ. We compare ourmethod with previous patch-based generation methods and achievestate-of-the-art FID scores on all four datasets. Further, Patch-DM alsoreduces memory complexity compared to the classic diffusion models. <<<
翻译
65.
Vincent (2023-08-31 23:50):
#paper https://doi.org/10.48550/arXiv.2306.03301. arxiv 2023, Estimating Conditional Mutual Information for Dynamic Feature Selection. 动态特征选择涉及到学习特征选择策略,以及使用任意特征对目标值进行预测。其中学习选择策略往往十分具有挑战性。这篇文章介绍了一种基于特征与预测目标的条件互信息(conditional mutual information)对特征进行优先级排序,该方法通过训练一个神经网络估算在给定特征集情况下,其他特征的预测能力(条件互信息),每一步选择最具信息的特征加入到已有特征集中。依次迭代下去直到满足停止条件(例如达到给定特征数量,不确定度,代价等)。此外,该框架同样能够利用先验信息。文章验证了该方法在表格与图像数据集测试中均有不错效果。
Abstract:
Dynamic feature selection, where we sequentially query features to make accurate predictions with a minimal budget, is a promising paradigm to reduce feature acquisition costs and provide transparency into the … >>>
Dynamic feature selection, where we sequentially query features to make accurate predictions with a minimal budget, is a promising paradigm to reduce feature acquisition costs and provide transparency into the prediction process. The problem is challenging, however, as it requires both making predictions with arbitrary feature sets and learning a policy to identify the most valuable selections. Here, we take an information-theoretic perspective and prioritize features based on their mutual information with the response variable. The main challenge is learning this selection policy, and we design a straightforward new modeling approach that estimates the mutual information in a discriminative rather than generative fashion. Building on our learning approach, we introduce several further improvements: allowing variable feature budgets across samples, enabling non-uniform costs between features, incorporating prior information, and exploring modern architectures to handle partial input information. We find that our method provides consistent gains over recent state-of-the-art methods across a variety of datasets. <<<
翻译
66.
符毓 (2023-08-31 22:39):
#paper doi.org/10.48550/arXiv.2303.09165 2023, A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation。 为了解决机器视觉中大量人工标注的成本问题,团队尝试通过用合成数据的方式解决。基于一定规则生成合成数据后,本文展示了通过合成数据进行预训练的方式优于真实数据,同时也能优于几种数据增加后的结果的可能性。未来应用具有较大的想象力
Abstract:
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data. However, exhaustive data annotation is impracticable for each task of all domains of … >>>
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data. However, exhaustive data annotation is impracticable for each task of all domains of interest, due to high labor costs and unguaranteed labeling accuracy. Besides, the uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist. All these nuisances may hinder the verification of typical theories and exposure to new findings. To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization. We in this work push forward along this line by doing profound and extensive research on bare supervised learning and downstream domain adaptation. Specifically, under the well-controlled, IID data setting enabled by 3D rendering, we systematically verify the typical, important learning insights, e.g., shortcut learning, and discover the new laws of various data regimes and network architectures in generalization. We further investigate the effect of image formation factors on generalization, e.g., object scale, material texture, illumination, camera viewpoint, and background in a 3D scene. Moreover, we use the simulation-to-reality adaptation as a downstream task for comparing the transferability between synthetic and real data when used for pre-training, which demonstrates that synthetic data pre-training is also promising to improve real test results. Lastly, to promote future research, we develop a new large-scale synthetic-to-real benchmark for image classification, termed S2RDA, which provides more significant challenges for transfer from simulation to reality. The code and datasets are available at this https URL. <<<
翻译
67.
尹志 (2023-08-31 22:11):
#paper https://doi.org/10.48550/arXiv.1812.07907 PnP-AdaNet: Plug-and-Play Adversarial Domain Adaptation Network at Unpaired Cross-Modality Cardiac Segmentation。调研高效生成模型的过程中偶遇的论文,发现还是有点意思的。文章提出了一个网络结构:PnP-AdaNet,实现了无监督的不同模态间分割任务领域适应。考虑到是2018年的老文章,其替换网络结构和利用对抗学习的想法现在已经比较常见,但我认为替换网络的思想在大模型盛行的今天有着更深刻的内涵,本人手头的一个研究主题也是沿着这条线索,目前看部分实验结果还是很不错的。
Abstract:
Deep convolutional networks have demonstrated the state-of-the-art performance on various medical image computing tasks. Leveraging images from different modalities for the same analysis task holds clinical benefits. However, the generalization … >>>
Deep convolutional networks have demonstrated the state-of-the-art performance on various medical image computing tasks. Leveraging images from different modalities for the same analysis task holds clinical benefits. However, the generalization capability of deep models on test data with different distributions remain as a major challenge. In this paper, we propose the PnPAdaNet (plug-and-play adversarial domain adaptation network) for adapting segmentation networks between different modalities of medical images, e.g., MRI and CT. We propose to tackle the significant domain shift by aligning the feature spaces of source and target domains in an unsupervised manner. Specifically, a domain adaptation module flexibly replaces the early encoder layers of the source network, and the higher layers are shared between domains. With adversarial learning, we build two discriminators whose inputs are respectively multi-level features and predicted segmentation masks. We have validated our domain adaptation method on cardiac structure segmentation in unpaired MRI and CT. The experimental results with comprehensive ablation studies demonstrate the excellent efficacy of our proposed PnP-AdaNet. Moreover, we introduce a novel benchmark on the cardiac dataset for the task of unsupervised cross-modality domain adaptation. We will make our code and database publicly available, aiming to promote future studies on this challenging yet important research topic in medical imaging. <<<
翻译
68.
尹志 (2023-07-31 22:52):
#paper doi: https://doi.org/10.48550/arXiv.2210.13695 Structure-based Drug Design with Equivariant Diffusion Models 又读了一遍这篇文献,用等变扩散模型进行结构化药物设计确实是一种有效的药物设计方式,越来越多的工作也在不断证明它的价值。这篇工作挺经典的(虽然貌似被iclr拒了),它基于蛋白质口袋利用se3等变扩散模型进行了分子生成。大量实验证明它生成药物分子的新颖性和多样性在效率和有效性上都很不错。文章还讨论了使用该方法对现有分子的优化,基于补全进行分子设计等问题,虽然在效果上还存在很多缺陷,但这些思路对于小分子药物设计及现有方法的改进都非常有价值。
Abstract:
Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. In this paper, we formulate SBDD as a 3D-conditional generation … >>>
Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. In this paper, we formulate SBDD as a 3D-conditional generation problem and present DiffSBDD, an SE(3)-equivariant 3D-conditional diffusion model that generates novel ligands conditioned on protein pockets. Comprehensive in silico experiments demonstrate the efficiency and effectiveness of DiffSBDD in generating novel and diverse drug-like ligands with competitive docking scores. We further explore the flexibility of the diffusion framework for a broader range of tasks in drug design campaigns, such as off-the-shelf property optimization and partial molecular design with inpainting. <<<
翻译
69.
Ricardo (2023-07-31 22:16):
#paper doi: https://doi.org/10.48550/arXiv.2112.05149 DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model 形变图像配准是医学成像的基本任务之一。经典的配准算法通常需要较高的计算成本进行迭代优化。尽管基于深度学习的图像配准方法已被用于快速图像配准,但要获得从运动图像到固定图像的真实连续形变且拓扑折叠较少,仍然是一个挑战性的问题。为解决这个问题,本文提出一种新的基于扩散模型的图像配准方法DiffuseMorph。DiffuseMorph不仅可以通过反向扩散生成合成的变形图像,而且可以通过变形场进行图像配准。具体来说,形变场由运动图像和固定图像之间的形变的条件得分函数生成,通过简单缩放得分的潜在特征即可从连续形变中进行配准。在2D人脸和3D医学图像配准任务上的实验结果表明,该方法可以提供灵活的形变和拓扑保持能力。
Abstract:
Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed … >>>
Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed for fast image registration, it is still challenging to obtain realistic continuous deformations from a moving image to a fixed image with less topological folding problem. To address this, here we present a novel diffusion-model-based image registration method, called DiffuseMorph. DiffuseMorph not only generates synthetic deformed images through reverse diffusion but also allows image registration by deformation fields. Specifically, the deformation fields are generated by the conditional score function of the deformation between the moving and fixed images, so that the registration can be performed from continuous deformation by simply scaling the latent feature of the score. Experimental results on 2D facial and 3D medical image registration tasks demonstrate that our method provides flexible deformations with topology preservation capability. <<<
翻译
70.
符毓 (2023-07-31 16:41):
#paper doi: 10.48550/arXiv.2307.05973 2023, Composable 3D Value Maps for Robotic Manipulation with Language Models. 李飞飞团队最新论文研究,把语言模型与机器人操作结合。与大语言模型结合后人机交互效率得到提高,并且能做到基于视觉的实时轨迹规划。目测机械臂移动速率为常见机械臂工作速率的八分之一,到真实应用的话稳定性还需要进一步提高(超过25%的出错率)
Abstract:
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, … >>>
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, most still rely on pre-defined motion primitives to carry out the physical interactions with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a visual-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Project website: this https URL <<<
翻译
71.
Ricardo (2023-06-30 23:49):
#paper Denoising Diffusion Probabilistic Models. doi: https://doi.org/10.48550/arXiv.2006.11239 大名鼎鼎的DDPM模型,算法结构出奇的简单,分为前向加噪过程和反向去噪过程。前向加噪过程是通过在多个时间步里加小噪声,反向去噪过程则在每一个时间步上通过网络学习噪声分布去掉噪声。通过一长串的公式推导,其最终的损失函数相当的简单,就是个mse。看起来就像是很多个VAE叠加在一起。DDPM的一个缺点就是采样步长很长,通常需要1000步以上;而之后提出的DDIM模型将这个采样步长缩小到了50步左右,而这个效果是通过牺牲生成样本多样性实现的。DDIM模型通过一个叫做飘逸扩散方程的模型(这个模型在行为决策等研究中常常被采纳)来解释其原理。原本的DDPM模型其实只有漂移扩散方程中的扩散部分,而DDIM模型则加上了漂移的部分,可以将模型往数据采样密度较高的地方去靠近。
Abstract:
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training … >>>
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at this https URL <<<
翻译
72.
张浩彬 (2023-06-30 11:45):
#paper The Capacity and Robustness Trade-off: Revisiting the Channel Independent Strategy for Multivariate Time Series Forecasting doi: https://doi.org/10.48550/arXiv.2304.05206Focus to learn more 专门研究了针对多元时间序列的预测问题,探讨了使用独立预测以及联合预测的差异,证明了由于分布偏移的存在,独立预测的方法更好,应为其更加有利于缓解分布偏移的问题,提高模型的繁华性。并且文章证明了独立预测和联合预测,是一种模型容量和模型鲁棒性的权衡。随州论文提出了包括正则化,低秩分解、采用MAE代替MSE,调整序列长度等方法提高联合预测的精度
Abstract:
Multivariate time series data comprises various channels of variables. The multivariate forecasting models need to capture the relationship between the channels to accurately predict future values. However, recently, there has … >>>
Multivariate time series data comprises various channels of variables. The multivariate forecasting models need to capture the relationship between the channels to accurately predict future values. However, recently, there has been an emergence of methods that employ the Channel Independent (CI) strategy. These methods view multivariate time series data as separate univariate time series and disregard the correlation between channels. Surprisingly, our empirical results have shown that models trained with the CI strategy outperform those trained with the Channel Dependent (CD) strategy, usually by a significant margin. Nevertheless, the reasons behind this phenomenon have not yet been thoroughly explored in the literature. This paper provides comprehensive empirical and theoretical analyses of the characteristics of multivariate time series datasets and the CI/CD strategy. Our results conclude that the CD approach has higher capacity but often lacks robustness to accurately predict distributionally drifted time series. In contrast, the CI approach trades capacity for robust prediction. Practical measures inspired by these analyses are proposed to address the capacity and robustness dilemma, including a modified CD method called Predict Residuals with Regularization (PRReg) that can surpass the CI strategy. We hope our findings can raise awareness among researchers about the characteristics of multivariate time series and inspire the construction of better forecasting models. <<<
翻译
73.
Ricardo (2023-05-31 23:53):
#paper DOI:https://doi.org/10.48550/arXiv.2304.00217 DrDisco: Deep Registration for Distortion Correction of Diffusion MRI with single phase-encoding 弥散加权磁共振成像(DW-MRI)是一种对人脑白质束进行无创成像的方法。dw - mri通常采用高梯度回波平面成像(echo-planar imaging, EPI)获得,会引入严重的几何畸变,影响进一步的分析。大多数校正失真的工具需要两张不同相位编码方向获取的最小加权DW-MRI图像(B0),处理每个受试者可能需要数小时。由于大量扩散数据仅在单一相位编码方向下获取,现有方法的应用受到限制。本文提出一种基于深度学习的配准方法,仅使用从单一相位编码方向获得的B0来纠正失真。通过一个深度学习模型,将未失真的t1加权图像与失真的B0图像进行配准,以消除失真。在训练过程中应用可微的互信息损失来改善模态间对齐。在Human Connectome Project数据集上的实验表明,所提出的方法在多个指标上优于SyN和VoxelMorph,且处理一个受试者只需几秒钟。
Abstract:
Diffusion-weighted magnetic resonance imaging (DW-MRI) is a non-invasive way of imaging white matter tracts in the human brain. DW-MRIs are usually acquired using echo-planar imaging (EPI) with high gradient fields, … >>>
Diffusion-weighted magnetic resonance imaging (DW-MRI) is a non-invasive way of imaging white matter tracts in the human brain. DW-MRIs are usually acquired using echo-planar imaging (EPI) with high gradient fields, which could introduce severe geometric distortions that interfere with further analyses. Most tools for correcting distortion require two minimally weighted DW-MRI images (B0) acquired with different phase-encoding directions, and they can take hours to process per subject. Since a great amount of diffusion data are only acquired with a single phase-encoding direction, the application of existing approaches is limited. We propose a deep learning-based registration approach to correct distortion using only the B0 acquired from a single phase-encoding direction. Specifically, we register undistorted T1-weighted images and distorted B0 to remove the distortion through a deep learning model. We apply a differentiable mutual information loss during training to improve inter-modality alignment. Experiments on the Human Connectome Project dataset show the proposed method outperforms SyN and VoxelMorph on several metrics, and only takes a few seconds to process one subject. <<<
翻译
74.
符毓 (2023-05-31 22:40):
#paper doi.org/10.48550/arXiv.2212.12669 Nature, 2023, On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective。本文讨论了由于不确定性和动态环境,在现实场景中实现机器主导的智能决策(IDM)所面临的挑战。作者提出了一个基础决策模型(FDM)的想法来克服这些挑战,并使IDM得到广泛采用。本文还展示了人工智能增强IDM潜在的各种方法和理论可行性。
Abstract:
The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems. Consequently, IDM should possess the ability to continuously … >>>
The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems. Consequently, IDM should possess the ability to continuously acquire new skills and effectively generalize across a broad range of applications. The advancement of Artificial General Intelligence (AGI) that transcends task and application boundaries is critical for enhancing IDM. Recent studies have extensively investigated the Transformer neural architecture as a foundational model for various tasks, including computer vision, natural language processing, and reinforcement learning. We propose that a Foundation Decision Model (FDM) can be developed by formulating diverse decision-making tasks as sequence decoding tasks using the Transformer architecture, offering a promising solution for expanding IDM applications in complex real-world situations. In this paper, we discuss the efficiency and generalization improvements offered by a foundation decision model for IDM and explore its potential applications in multi-agent game AI, production scheduling, and robotics tasks. Lastly, we present a case study demonstrating our FDM implementation, DigitalBrain (DB1) with 1.3 billion parameters, achieving human-level performance in 870 tasks, such as text generation, image captioning, video game playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 represents an initial step toward more autonomous and efficient real-world IDM applications. <<<
翻译
75.
周周复始 (2023-05-31 22:29):
#paper doi:https://doi.org/10.48550/arXiv.2201.00308. DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents.2022.目前扩散概率模型在几个有竞争性图像合成基准上产生最先进的结果,但缺乏低维、可解释的潜在空间,并且生成速度较慢。而变分自编码器(VAEs)通常具有低维潜在空间,但生成的样本质量较差。基于此本文提出了一种新的生成框架DiffuseVAE,它将VAE集成到扩散模型框架中,并利用它为扩散模型设计新的条件参数化。文章表明,所得到的模型为扩散模型配备了低维VAE推断潜在代码,可用于下游任务,如条件生成。
Abstract:
Diffusion probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the … >>>
Diffusion probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand, standard Variational Autoencoders (VAEs) typically have access to a low-dimensional latent space but exhibit poor sample quality. We present DiffuseVAE, a novel generative framework that integrates VAE within a diffusion model framework, and leverage this to design novel conditional parameterizations for diffusion models. We show that the resulting model equips diffusion models with a low-dimensional VAE inferred latent code which can be used for downstream tasks like controllable synthesis. The proposed method also improves upon the speed vs quality tradeoff exhibited in standard unconditional DDPM/DDIM models (for instance, FID of 16.47 vs 34.36 using a standard DDIM on the CelebA-HQ-128 benchmark using T=10 reverse process steps) without having explicitly trained for such an objective. Furthermore, the proposed model exhibits synthesis quality comparable to state-of-the-art models on standard image synthesis benchmarks like CIFAR-10 and CelebA-64 while outperforming most existing VAE-based methods. Lastly, we show that the proposed method exhibits inherent generalization to different types of noise in the conditioning signal. For reproducibility, our source code is publicly available at this https URL. <<<
翻译
76.
张浩彬 (2023-05-30 11:48):
#paper:doi:10.48550/arXiv.2010.04515 Principal Component Analysis using Frequency Components of Multivariate Time Series 提出了一个新的谱分解方法,使得对多元时间序列(二阶平稳,宽平稳)进行分解,从而使得分解后的子序列在组内是有非零的谱相关,而跨组的子序列则具有零的谱相关性。从写作上,则是典型的问题引入,方法介绍、理论的渐近性质证明,数值模拟,实证研究,其中有大量的推导。
Abstract:
Dimension reduction techniques for multivariate time series decompose the observed series into a few useful independent/orthogonal univariate components. We develop a spectral domain method for multivariate second-order stationary time series … >>>
Dimension reduction techniques for multivariate time series decompose the observed series into a few useful independent/orthogonal univariate components. We develop a spectral domain method for multivariate second-order stationary time series that linearly transforms the observed series into several groups of lower-dimensional multivariate subseries. These multivariate subseries have non-zero spectral coherence among components within a group but have zero spectral coherence among components across groups. The observed series is expressed as a sum of frequency components whose variances are proportional to the spectral matrices at the respective frequencies. The demixing matrix is then estimated using an eigendecomposition on the sum of the variance matrices of these frequency components and its asymptotic properties are derived. Finally, a consistent test on the cross-spectrum of pairs of components is used to find the desired segmentation into the lower-dimensional subseries. The numerical performance of the proposed method is illustrated through simulation examples and an application to modeling and forecasting wind data is presented. <<<
翻译
77.
张德祥 (2023-05-16 08:14):
#paper https://doi.org/10.48550/arXiv.2203.11740 我们可以把我们的大脑想象成是地球,地心熔岩的产生如同在海马体的短期记忆的发生,过程是量子的;地表的地震因为势能释放,选出强的短期记忆成为长期记忆存储在不同皮层的记忆印记细胞能被释放。 AI+脑科学+量子力学的结合。我们提出了PNN,但它不仅仅是简单的时间序列模型。 除了突触连接的共享权重,我们提出了新的神经网络包括突触有效范围权重也会进行前向和反向计算。而且很多仿真是RNN无法实现的。 正向和负向记忆的大脑塑性是量子的并产生短期记忆,并且波函数展现出在一段时间表现出指数衰减,在海马体里产生。而指数衰减是因为壁垒,壁垒可能和星形胶质细胞有关。工作记忆的大脑塑性在大脑流动从海马体到不同皮层通过方向导数。强的工作记忆的大脑塑性转变成长期记忆也就是最大的方向导数,而最大的方向导数就是梯度。这样长期记忆是工作记忆的大脑塑性的梯度。短期记忆变成长期记忆的过程,也就是非经典力学变成经典力学的过程。 PNN的仿真符合了6篇正刊、6篇子刊和1篇物理顶刊的脑科学实验和假设。 更多可以参考: https://mp.weixin.qq.com/s/k-KD1KcQo9FiYcQvSypBjQ
Abstract:
In addition to the shared weights of the synaptic connections, we proposed a new neural network that includes the synaptic effective range weights for both the forward and back propagation. … >>>
In addition to the shared weights of the synaptic connections, we proposed a new neural network that includes the synaptic effective range weights for both the forward and back propagation. And lots of simulations were used which RNN cannot be achieved. The simulations of PNN fit very well in experiments and hypotheses of 6 papers CNS Journals, 6 papers of CNS family Journals and 1 paper top Physics Journal [14-26]. The brain plasticity in positive or negative memory may be quantum and produce short-term memory, and exhibits an exponential decay in the wave function over a period of time, produced in the hippocampus. And exponential decay occurs due to barriers, and barriers can refer to astrocytes. Brain plasticity in working memory flows through the brain, from the hippocampus to the cortex, through directional derivatives. The strong working memory brain plasticity turns to long-term memory means maximum of directional derivatives, and maximum of directional derivatives is gradient. Thus, long-term memory signifies the gradient of brain plasticity in working memory. The process of short-term memory turns to long-term memory is the process of non-classically turns to classically. Astrocytic cortex memory persistence factor also inhibits local synaptic accumulation, and the model inspires experiments. This could be the process of astrocytes phagocytose synapses is driven by both positive and negative memories of plasticity in the brain. In simulation, it is possible that thicker cortices and more diverse individuals within the brain could have high IQ, but thickest cortices and most diverse individuals may have low IQ in simulation. PSO considers global solution or best previous solution, but also considers relatively good and relatively inferior solution. And PNN modified ResNet to consider memory gradient. The simple PNN only considers astrocytes phagocytosed synapses. <<<
翻译
78.
姗姗来迟 (2023-05-14 19:34):
#paper Multimodal Graph Transformer for Multimodal Question Answering https://arxiv.org/abs/2305.00581 这项工作从这两个世界中受益,并提出了一种新的多模态图转换器,用于需要跨多模态执行推理的问答任务。引入了一种涉及图形的即插即用类注意机制,将从文本和视觉数据中获得的多模态图形信息作为有效的先验信息整合到vanilla自注意力中。 具体来说,文章构建文本图、密集区域图和语义图来生成邻接矩阵,然后将它们与输入的视觉和语言特征组合在一起进行下游推理。 学习笔记链接:https://blog.csdn.net/weixin_44845357/article/details/130577459?csdn_share_tail=%7B%22type%22%3A%22blog%22%2C%22rType%22%3A%22article%22%2C%22rId%22%3A%22130577459%22%2C%22source%22%3A%22weixin_44845357%22%7D
Abstract:
Despite the success of Transformer models in vision and language tasks, they often learn knowledge from enormous data implicitly and cannot utilize structured input data directly. On the other hand, … >>>
Despite the success of Transformer models in vision and language tasks, they often learn knowledge from enormous data implicitly and cannot utilize structured input data directly. On the other hand, structured learning approaches such as graph neural networks (GNNs) that integrate prior information can barely compete with Transformer models. In this work, we aim to benefit from both worlds and propose a novel Multimodal Graph Transformer for question answering tasks that requires performing reasoning across multiple modalities. We introduce a graph-involved plug-and-play quasi-attention mechanism to incorporate multimodal graph information, acquired from text and visual data, to the vanilla self-attention as effective prior. In particular, we construct the text graph, dense region graph, and semantic graph to generate adjacency matrices, and then compose them with input vision and language features to perform downstream reasoning. Such a way of regularizing self-attention with graph information significantly improves the inferring ability and helps align features from different modalities. We validate the effectiveness of Multimodal Graph Transformer over its Transformer baselines on GQA, VQAv2, and MultiModalQA datasets. <<<
翻译
79.
muton (2023-04-30 23:19):
#paper Amygdala and cortical gamma-band responses to emotional faces depend on the attended to valence https://arxiv.org/pdf/2304.05700.pdf 杏仁核被认为贡献于情绪面孔视觉加工中自下而上的注意偏好,然而其对于情绪的反应如何与自上而下的注意相互作用却并不清楚。并且,杏仁核对情绪和注意的反应与头皮脑电相比有多大程度相似也仍有待探究。因此作者分别记录了杏仁核脑区的颅内电极以及头皮脑电伽马段的脑电活动来探究面孔加工过程中情绪和注意的交互。结果发现,在情绪检测实验中杏仁核的高频伽马出现在以中性面孔作为识别目标时,当以负性面孔作为识别目标时,低频伽马在负性面孔出现时会显著增加,并且不仅局限于杏仁核,同时在后部脑区头皮脑电记录中也存在,且时间窗早于杏仁核。这一结果符合情绪加工的多通路模型,并且是从注意(自上而下)的角度发现了伽马波在加工情绪面孔中的作用。
Abstract:
The amygdala is assumed to contribute to a bottom-up attentional bias during visual processing of emotional faces. Still, how its response to emotion interacts with top-down attention is not fully … >>>
The amygdala is assumed to contribute to a bottom-up attentional bias during visual processing of emotional faces. Still, how its response to emotion interacts with top-down attention is not fully understood. It is also unclear if amygdala activity and scalp EEG respond to emotion and attention in a similar way. Therefore, we studied the interaction of emotion and attention during face processing in oscillatory gamma-band activity (GBA) in the amygdala and on the scalp. Amygdala signals were recorded via intracranial EEG (iEEG) in 9 patients with epilepsy. Scalp recordings were collected from 19 healthy participants. Three randomized blocks of angry, neutral, and happy faces were presented, and either negative, neutral, or positive expressions were denoted as targets. Both groups detected happy faces fastest and most accurately. In the amygdala, the earliest effect was observed around 170 ms in high GBA (105-117.5 Hz) when neutral faces served as targets. Here, GBA was higher for emotional than neutral faces. During attention to negative faces, low GBA (< 90 Hz) increased specifically for angry faces both in the amygdala and over posterior scalp regions, albeit earlier on the scalp (60 ms) than in the amygdala (210 ms). From 570 ms, amygdala high GBA (117.5-145 Hz) was also increased for both angry and neutral, compared to happy, faces. When positive faces were the targets, GBA did not differentiate between expressions. The present data reveal that attention-independent emotion detection in amygdala high GBA may only occur during a neutral focus of attention. Top-down threat vigilance coordinates widespread low GBA, biasing stimulus processing in favor of negative faces. These results are in line with a multi-pathway model of emotion processing and help specify the role of GBA in this process by revealing how attentional focus can tune timing and amplitude of emotional GBA responses. <<<
翻译
80.
周周复始 (2023-04-30 23:12):
#paper doi: https://doi.org/10.48550/arXiv.2112.05149.DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model.可形变图像配准是医学成像中的基本任务之一。经典的配准算法通常需要较高的计算代价来进行迭代优化。虽然基于深度学习的方法进行快速图像配准已经发展起来,但要获得从移动图像到固定图像较少拓扑折叠的真实连续形变问题仍然具有挑战性。为了解决这个问题,本文提出了一种新的基于扩散模型的图像配准方法,称为DiffuseMorph。DiffuseMorph不仅通过逆扩散过程生成合成的变形图像,并且通过形变场进行图像配准。具体来说,形变场由移动图像和固定图像之间形变的条件分数函数生成。所以可以通过简单地缩放分数的潜在特征,对连续形变进行配准。2D面部和3D医学图像配准任务的实验结果表明,本文方法提供了灵活的形变和拓扑保持能力。
Abstract:
Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed … >>>
Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed for fast image registration, it is still challenging to obtain realistic continuous deformations from a moving image to a fixed image with less topological folding problem. To address this, here we present a novel diffusion-model-based image registration method, called DiffuseMorph. DiffuseMorph not only generates synthetic deformed images through reverse diffusion but also allows image registration by deformation fields. Specifically, the deformation fields are generated by the conditional score function of the deformation between the moving and fixed images, so that the registration can be performed from continuous deformation by simply scaling the latent feature of the score. Experimental results on 2D facial and 3D medical image registration tasks demonstrate that our method provides flexible deformations with topology preservation capability. <<<
翻译
回到顶部