来自用户 前进 的文献。
当前共找到 18 篇文献分享。
1.
前进 (2024-06-30 22:29):
#paper Liu R , Li Z , Fan X ,et al.Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond[J]. 2020.DOI:10.48550/arXiv.2004.14557. 论文提出了一个新的基于深度学习的框架,旨在通过多尺度传播优化微分同胚模型来整合传统变形配准方法和基于深度学习的方法的优势,并避免它们的局限性。具体来说,作者提出了一个通用的优化模型来解决微分同胚配准问题,并开发了一系列可学习的架构,以从粗到细的学习图像特征完成配准。此外,论文还提出了一种新颖的双层自调整训练策略,允许高效地搜索任务特定的超参数,这增加了对各种类型数据的灵活性,同时减少了计算和人力负担。 作者多种数据集上进行了配准实验,包括大脑MRI数据的图像到图谱配准和肝脏CT数据的图像到图像配准。实验结果表明,所提出的方法在保持微分同胚的同时,达到了最先进的性能。此外,作者还将其框架应用于多模态图像配准,并研究了其配准如何支持医学图像分析的下游任务,包括多模态融合和图像分割。
Abstract:
Conventional deformable registration methods aim at solving an optimization model carefully designed on image pairs and their computational costs are exceptionally high. In contrast, recent deep learning based approaches can … >>>
Conventional deformable registration methods aim at solving an optimization model carefully designed on image pairs and their computational costs are exceptionally high. In contrast, recent deep learning based approaches can provide fast deformation estimation. These heuristic network architectures are fully data-driven and thus lack explicit geometric constraints, e.g., topology-preserving, which are indispensable to generate plausible deformations. We design a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation in order to integrate advantages and avoid limitations of these two categories of approaches. Specifically, we introduce a generic optimization model to formulate diffeomorphic registration and develop a series of learnable architectures to obtain propagative updating in the coarse-to-fine feature space. Moreover, we propose a novel bilevel self-tuned training strategy, allowing efficient search of task-specific hyper-parameters. This training strategy increases the flexibility to various types of data while reduces computational and human burdens. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data. Extensive results demonstrate the state-of-the-art performance of the proposed method with diffeomorphic guarantee and extreme efficiency. We also apply our framework to challenging multi-modal image registration, and investigate how our registration to support the down-streaming tasks for medical image analysis including multi-modal fusion and image segmentation. <<<
翻译
2.
前进 (2024-05-30 13:53):
#paper Luo S, Xie Z, Chen G, et al. Hierarchical DNN with Heterogeneous Computing Enabled High-Performance DNA Sequencing[C]//2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). IEEE, 2022: 35-40. 这篇论文采用深度学习算法进行第二代基因测序。算法AYB是所有测序算法中精度最高的,但是随着推移荧光信号减弱,AYB算法处理效果并不好,并且它也难以解决DNA的phasing效应。而深度学习方法则能很高的解决上述问题。它首先通过前5个循环的采集到的荧光图像检测cluster的位置,提取后续cluster强度,再通过传统通道校正算法校正强度色差,然后将校正后的结果输入到DNN中判断碱基类别。实验结果表明,深度学习的方案相比于传统算法能够多检测出12.18%的reads数量,且碱基的分类错误率从0.1432% 降到0.0175%
3.
前进 (2024-04-30 11:44):
#paper Han D, Pan X, Han Y, et al. Flatten transformer: Vision transformer using focused linear attention[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 5961-5971. 自注意力(self-attention)在计算机视觉任务中应用时面临的主要挑战是其二次计算复杂度,这使得处理视觉任务变得非常昂贵。作为Softmax注意力的一种替代方案,线性注意力通过精心设计的映射函数来近似Softmax操作,从而将计算复杂度从二次降低到线性。尽管线性注意力在理论上更加高效,但现有的线性注意力方法要么性能显著下降,要么需要额外的计算开销,这限制了它们的实际应用。为了克服这些限制,论文提出了FLA模块,它通过两个主要的改进来提高效率和表达能力:焦点能力:1 通过一个简单的映射函数,增强了自注意力对最信息特征的聚焦能力。特征多样性:引入了一个高效的秩恢复模块,通过深度卷积(DWC)来恢复注意力矩阵的秩,增加了特征的多样性。通过在多个先进的视觉Transformer模型上的广泛实验,FLA模块在多个基准测试中均显示出了一致的性能提升。
4.
前进 (2024-03-31 12:44):
#paper [1] Hu X , Kang M , Huang W ,et al.Dual-Stream Pyramid Registration Network[J].Springer, Cham, 2019.DOI:10.1007/978-3-030-32245-8_43. 这篇论文主要用于无监督的3D大脑医学图像配准。与以往的基于卷积神经网络(CNN)的配准方法不同,例如VoxelMorph,Dual-PRNet设计了一个双流架构,能够从一对3D体积图像中顺序估计多级配准场。 主要贡献包括: 设计了一个双流3D编码器-解码器网络,分别从两个输入体积计算两个卷积特征金字塔。 提出了一种顺序金字塔配准方法,设计了一系列金字塔配准(PR)模块,直接从解码特征金字塔预测多级配准场。通过顺序变形,逐渐以粗到细的方式细化配准场,使模型具有处理大变形的强大能力。 通过计算特征金字塔之间的局部3D相关性,可以进一步增强PR模块,从而得到改进的Dual-PRNet++,能够聚合丰富的详细解剖结构。 将Dual-PRNet++集成到3D分割框架中,通过精确变形体素级注释,实现联合配准和分割。 论文还介绍了相关工作,包括基于深度学习的医学图像配准方法,并对提出的方法进行了评估。在Mindboggle101数据集上,Dual-PRNet++在Dice得分上从0.511提高到0.748,大幅度超过了现有的最先进方法。此外,论文还展示了该方法在有限注释的联合学习框架中,如何通过利用有限的注释极大地促进分割任务的完成。
5.
前进 (2024-02-28 10:57):
#paper Mckenzie E M , Santhanam A , Ruan D ,et al.Multimodality image registration in the head‐and‐neck using a deep learning‐derived synthetic CT as a bridge[J].Medical Physics, 2020, 47(3).DOI:10.1002/mp.13976. 本文提出并验证一种利用深度学习驱动的跨模态综合技术的头颈多模式图像配准方法。 采用CycleGAN将MRI 转化为合成CT(sCT),将头颈部的MRI-CT多模态配准转化为sCT-CT的单模态配准。配准方法采用传统的B-spline方法。实验结果表明sCT→CT 配准精度好于MRI→CT。平均配准误差从9.8mm下降到6.0mm
Medical physics, 2020-Mar. DOI: 10.1002/mp.13976 PMID: 31853975
Abstract:
PURPOSE: To develop and demonstrate the efficacy of a novel head-and-neck multimodality image registration technique using deep-learning-based cross-modality synthesis. METHODS AND MATERIALS: Twenty-five head-and-neck patients received magnetic resonance (MR) and … >>>
PURPOSE: To develop and demonstrate the efficacy of a novel head-and-neck multimodality image registration technique using deep-learning-based cross-modality synthesis. METHODS AND MATERIALS: Twenty-five head-and-neck patients received magnetic resonance (MR) and computed tomography (CT) (CTaligned ) scans on the same day with the same immobilization. Fivefold cross validation was used with all of the MR-CT pairs to train a neural network to generate synthetic CTs from MR images. Twenty-four of 25 patients also had a separate CT without immobilization (CTnon-aligned ) and were used for testing. CTnon-aligned 's were deformed to the synthetic CT, and compared to CTnon-aligned registered to MR. The same registrations were performed from MR to CTnon-aligned and from synthetic CT to CTnon-aligned . All registrations used B-splines for modeling the deformation, and mutual information for the objective. Results were evaluated using the 95% Hausdorff distance among spinal cord contours, landmark error, inverse consistency, and Jacobian determinant of the estimated deformation fields. RESULTS: When large initial rigid misalignment is present, registering CT to MRI-derived synthetic CT aligns the cord better than a direct registration. The average landmark error decreased from 9.8 ± 3.1 mm in MR→CTnon-aligned to 6.0 ± 2.1 mm in CTsynth →CTnon-aligned deformable registrations. In the CT to MR direction, the landmark error decreased from 10.0 ± 4.3 mm in CTnon-aligned →MR deformable registrations to 6.6 ± 2.0 mm in CTnon-aligned →CTsynth deformable registrations. The Jacobian determinant had an average value of 0.98. The proposed method also demonstrated improved inverse consistency over the direct method. CONCLUSIONS: We showed that using a deep learning-derived synthetic CT in lieu of an MR for MR→CT and CT→MR deformable registration offers superior results to direct multimodal registration. <<<
翻译
6.
前进 (2024-01-31 22:50):
#paper arxiv.org//pdf/2311.026 2023 Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection. 大型多模态模型 (LMM) GPT-4V(ision) 赋予 GPT-4 视觉grounding能力,使得通过视觉问答 (VQA) 范式处理某些任务成为可能。本文探讨了面向 VQA 的 GPT-4V 在最近流行的视觉异常检测(AD)中的潜力,并首次对流行的 MVTec AD 和 VisA 数据集进行定性和定量评估。 考虑到该任务需要图像/像素级评估,提出的 GPT-4V-AD 框架包含三个组成部分:1)粒度区域划分,2)提示设计,3)用于轻松定量评估的 Text2Segmentation,并做了一些不同的 尝试进行比较分析。 结果表明,GPT-4V可以通过VQA范式在零样本AD任务中取得一定的结果,例如在MVTec AD和VisA数据集上分别实现图像级77.1/88.0和像素级68.0/76.6 AU-ROC 。 然而,其性能与最先进的零样本方法(例如WinCLIP和CLIP-AD)相比仍然存在一定差距,需要进一步研究。 这项研究为零样本 AD 任务中面向 VQA 的 LMM 的研究提供了基线参考
Abstract:
Large Multimodal Model (LMM) GPT-4V(ision) endows GPT-4 with visual grounding capabilities, making it possible to handle certain tasks through the Visual Question Answering (VQA) paradigm. This paper explores the potential … >>>
Large Multimodal Model (LMM) GPT-4V(ision) endows GPT-4 with visual grounding capabilities, making it possible to handle certain tasks through the Visual Question Answering (VQA) paradigm. This paper explores the potential of VQA-oriented GPT-4V in the recently popular visual Anomaly Detection (AD) and is the first to conduct qualitative and quantitative evaluations on the popular MVTec AD and VisA datasets. Considering that this task requires both image-/pixel-level evaluations, the proposed GPT-4V-AD framework contains three components: 1) Granular Region Division, 2) Prompt Designing, 3) Text2Segmentation for easy quantitative evaluation, and have made some different attempts for comparative analysis. The results show that GPT-4V can achieve certain results in the zero-shot AD task through a VQA paradigm, such as achieving image-level 77.1/88.0 and pixel-level 68.0/76.6 AU-ROCs on MVTec AD and VisA datasets, respectively. However, its performance still has a certain gap compared to the state-of-the-art zero-shot method, e.g., WinCLIP ann CLIP-AD, and further research is needed. This study provides a baseline reference for the research of VQA-oriented LMM in the zero-shot AD task, and we also post several possible future works. Code is available at \url{https://github.com/zhangzjn/GPT-4V-AD}. <<<
翻译
7.
前进 (2023-12-27 15:11):
#paper arXiv:2312.11514v1 ,2023, LLM in a flash: Efficient Large Language Model Inference with Limited Memory 大型语言模型(LLMs)在现代自然语言处理中具有重要作用,但其高昂的计算和内存需求对于内存有限的设备构成了挑战。为了高效运行超过可用DRAM容量的LLMs,该论文采用了存储模型参数在闪存上,并按需将其调入DRAM的方法。研究方法包括构建与闪存行为协调的推理模型,并在两个关键领域进行优化:减少闪存传输的数据量和以更大、更连续的块来读取数据。在这个框架下,引入了两种主要技术:“windowing”策略通过重复使用先前激活的神经元减少数据传输,“row-column bunding”则充分利用了闪存的顺序数据访问特性,增加了从闪存中读取的数据块的大小。这些方法使得可以在有限DRAM上运行比原先两倍大的模型,相较于朴素的加载方法,在CPU和GPU上推断速度分别提高了4-5倍和20-25倍。
Abstract:
Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their intensive computational and memory requirements present challenges, especially for devices with … >>>
Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their intensive computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameters on flash memory but bringing them on demand to DRAM. Our method involves constructing an inference cost model that harmonizes with the flash memory behavior, guiding us to optimize in two critical areas: reducing the volume of data transferred from flash and reading data in larger, more contiguous chunks. Within this flash memory-informed framework, we introduce two principal techniques. First, "windowing'" strategically reduces data transfer by reusing previously activated neurons, and second, "row-column bundling", tailored to the sequential data access strengths of flash memory, increases the size of data chunks read from flash memory. These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively. Our integration of sparsity awareness, context-adaptive loading, and a hardware-oriented design paves the way for effective inference of LLMs on devices with limited memory. <<<
翻译
8.
前进 (2023-11-30 10:22):
#paper GraformerDIR: Graph convolution transformer for deformable image registration Computers in Biology and Medicine 30 june 2022 https://doi.org/10.1016/j.compbiomed.2022.105799 这是一篇用图卷积来进行图像配准的论文,通过将图卷积变换器(Graformer)层放在 在特征提取网络中,提出了一个基于Graformer的DIR框架,命名为GraformerDIR。Graformer层由Graformer模块和Cheby-shev图卷积模块组成。其中 Graformer模块旨在捕获高质量的长期依赖关系。Cheby-shev图卷积模块用于进一步扩大感受野。GraformerDIR的性能已经在公开的大脑数据集中进行了评估,包括OASIS、LPBA40和MGH10数据集。与VoxelMorph相比,GraformerDIR在DSC方面获得4.6%的性能改进,在平均值方面获得0.055mm的性能改进,同时折叠率更低。
Abstract:
PURPOSE: Deformable image registration (DIR) plays an important role in assisting disease diagnosis. The emergence of the Transformer enables the DIR framework to extract long-range dependencies, which relieves the limitations … >>>
PURPOSE: Deformable image registration (DIR) plays an important role in assisting disease diagnosis. The emergence of the Transformer enables the DIR framework to extract long-range dependencies, which relieves the limitations of intrinsic locality caused by convolution operation. However, suffering from the interference of missing or spurious connections, it is a challenging task for Transformer-based methods to capture the high-quality long-range dependencies. METHODS: In this paper, by staking the graph convolution Transformer (Graformer) layer at the bottom of the feature extraction network, we propose a Graformer-based DIR framework, named GraformerDIR. The Graformer layer is consist of the Graformer module and the Cheby-shev graph convolution module. Among them, the Graformer module is designed to capture high-quality long-range dependencies. Cheby-shev graph convolution module is employed to further enlarge the receptive field. RESULTS: The performance and generalizability of GraformerDIR have been evaluated on publicly available brain datasets including the OASIS, LPBA40, and MGH10 datasets. Compared with VoxelMorph, the GraformerDIR has obtained performance improvements of 4.6% in Dice similarity coefficient (DSC) and 0.055 mm in the average symmetric surface distance (ASD) while reducing the non-positive rate of Jacobin determinant (Npr.Jac) index about 60 times on publicly available OASIS dataset. On unseen dataset MGH10, the GraformerDIR has obtained the performance improvements of 4.1% in DSC and 0.084 mm in ASD compared with VoxelMorph, which demonstrates the GraformerDIR with better generalizability. The promising performance on the clinical cardiac dataset ACDC indicates the GraformerDIR is practicable. CONCLUSION: With the advantage of Transformer and graph convolution, the GraformerDIR has obtained comparable performance with the state-of-the-art method VoxelMorph. <<<
翻译
9.
前进 (2023-10-30 13:57):
#paper https://doi.org/10.1088/1361-6560/ac5f70 Training low dose CT denoising network without high quality reference data 低剂量CT(LDCT)去噪领域主要是基于监督学习的方法,需要完全配准的LDCT对及其相应的干净参考图像(normal-dose CT)。然而,无干净标签的训练更具有实际意义,因为在临床上不可能获得大量的这些配对样本。本文提出了一种用于LDCT成像的自监督去噪方法。方法该方法不需要任何干净的图像。此外,在去噪过程中,利用感知损失来实现特征域的数据一致性。在解码阶段使用的注意块可以帮助进一步提高图像质量。在实验中横向对比了3种方法,并进行了6个消融实验,验证了提出的自监督框架的有效性,以及自注意模块和感知损失的有效性。
Abstract:
Currently, the field of low-dose CT (LDCT) denoising is dominated by supervised learning based methods, which need perfectly registered pairs of LDCT and its corresponding clean reference image (normal-dose CT). … >>>
Currently, the field of low-dose CT (LDCT) denoising is dominated by supervised learning based methods, which need perfectly registered pairs of LDCT and its corresponding clean reference image (normal-dose CT). However, training without clean labels is more practically feasible and significant, since it is clinically impossible to acquire a large amount of these paired samples. In this paper, a self-supervised denoising method is proposed for LDCT imaging.The proposed method does not require any clean images. In addition, the perceptual loss is used to achieve data consistency in feature domain during the denoising process. Attention blocks used in decoding phase can help further improve the image quality.In the experiments, we validate the effectiveness of our proposed self-supervised framework and compare our method with several state-of-the-art supervised and unsupervised methods. The results show that our proposed model achieves competitive performance in both qualitative and quantitative aspects to other methods.Our framework can be directly applied to most denoising scenarios without collecting pairs of training data, which is more flexible for real clinical scenario. <<<
翻译
10.
前进 (2023-09-27 10:56):
#paper doi:10.1109/cvpr.2019.00223  2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).  Noise2Void - Learning Denoising From Single Noisy Images. 基于深度学习的图像去噪一般是通过干净图像和噪声图像组成的图相对来进行训练的。目前也有一些做法可以无需干净图像,仅需多张噪声图像来完成模型的训练(N2N)。而本文提出了一种基于单张噪声图像的去噪方法。基于Patch去噪的观点认为,结果图像中的每一个像素点由于感受野的限制只取决于输入图像中的一部分区域。基于这个观点衍生出许多去噪的方法,例如Noise2Noise的方法,它不再需要干净的图像作为target。而本文提出了一种只需要单张噪声图像就能完成去噪的方法。作者认为,如果对于单张图像,以其中的一个patch作为网络的input,以这个patch中心位置的像素作为target,那么网络将会学习到直接将输入patch中心的像素映射到网络的输出这这种identity map。因此,作者设计了有一种特殊的感受野,将感受野中心的像素“抹去”,再要求网络去预测中心位置的信息。这种做法基于两个假设:1、不同位置的噪声像素之间是相互独立的 2、噪声的均值为0 。因此预测出来的中心像素点更有可能是信号而非噪声。
11.
前进 (2023-01-31 23:30):
#paper Rethinking 1x1 Convolutions: Can we train CNNs with Frozen Random Filters? arXiv:2301.11360 本文引入了一种新的卷积块,计算(冻结随机)滤波器的可学习线性组合(LC),并由此提出 LCResNets,还提出一种新的权重共享机制,可大幅减少权重的数量。在本文中,即使在仅随机初始化且从不更新空间滤波器的极端情况下,某些CNN架构也可以被训练以超过标准训练的精度。通过将逐点(1x1)卷积的概念重新解释为学习冻结(随机)空间滤波器的线性组合(LC)的算子,这种方法不仅可以在CIFAR和ImageNet上达到较高的测试精度,而且在模型鲁棒性、泛化、稀疏 性和所需权重的总数方面具有良好。此外本文提出了一种新的权重共享机制,该机制允许在所有空间卷积层之间共享单个权重张量,以大幅减少权重的数量。
arXiv, 2023.
Abstract:
Modern CNNs are learning the weights of vast numbers of convolutional operators. In this paper, we raise the fundamental question if this is actually necessary. We show that even in … >>>
Modern CNNs are learning the weights of vast numbers of convolutional operators. In this paper, we raise the fundamental question if this is actually necessary. We show that even in the extreme case of only randomly initializing and never updating spatial filters, certain CNN architectures can be trained to surpass the accuracy of standard training. By reinterpreting the notion of pointwise (1×1) convolutions as an operator to learn linear combinations (LC) of frozen (random) spatial filters, we are able to analyze these effects and propose a generic LC convolution block that allows tuning of the linear combination rate. Empirically, we show that this approach not only allows us to reach high test accuracies on CIFAR and ImageNet but also has favorable properties regarding model robustness, generalization, sparsity, and the total number of necessary weights. Additionally, we propose a novel weight sharing mechanism, which allows sharing of a single weight tensor between all spatial convolution layers to massively reduce the number of weights. <<<
翻译
12.
前进 (2022-12-31 11:39):
#paper Liu Y, Chen J, Wei S, et al. On Finite Difference Jacobian Computation in Deformable Image Registration[J]. arXiv preprint arXiv:2212.06060, 2022. 产生微分同胚的空间变换一直是变形图像配准的中心问题。作为一个微分同胚变换,应在任何位置都具有正的雅可比行列式|J|。|J|<0的体素数已被用于测试微分同胚性,也用于测量变换的不规则性。 对于数字变换,|J|通常使用中心差来近似,但是对于即使在体素分辨率级别上也明显不具有差分同胚性的变换,这种策略可以产生正的|J|。为了证明这一点,论文首先研究了|J|的不同有限差分近似的几何意义。为了确定数字图像的微分同胚性,使用任何单独的有限差分逼近|J|是不够的。论文证明对于2D变换,|J|的四个唯一的有限差分近似必须是正的,以确保整个域是可逆的,并且在像素级没有折叠。在3D中,|J|的十个唯一的有限差分近似值需要是正的。论文提出的数字微分同胚准则解决了|J|的中心差分近似中固有的几个误差,并准确地检测非微分同胚数字变换。
Abstract:
Producing spatial transformations that are diffeomorphic has been a central problem in deformable image registration. As a diffeomorphic transformation should have positive Jacobian determinant |J| everywhere, the number of voxels … >>>
Producing spatial transformations that are diffeomorphic has been a central problem in deformable image registration. As a diffeomorphic transformation should have positive Jacobian determinant |J| everywhere, the number of voxels with |J|<0 has been used to test for diffeomorphism and also to measure the irregularity of the transformation. For digital transformations, |J| is commonly approximated using central difference, but this strategy can yield positive |J|'s for transformations that are clearly not diffeomorphic -- even at the voxel resolution level. To show this, we first investigate the geometric meaning of different finite difference approximations of |J|. We show that to determine diffeomorphism for digital images, use of any individual finite difference approximations of |J| is insufficient. We show that for a 2D transformation, four unique finite difference approximations of |J|'s must be positive to ensure the entire domain is invertible and free of folding at the pixel level. We also show that in 3D, ten unique finite differences approximations of |J|'s are required to be positive. Our proposed digital diffeomorphism criteria solves several errors inherent in the central difference approximation of |J| and accurately detects non-diffeomorphic digital transformations. <<<
翻译
13.
前进 (2022-11-28 10:25):
#paper Zhu Y , Lu S . Swin-VoxelMorph: A Symmetric Unsupervised Learning Model forDeformable Medical Image Registration Using Swin Transformer[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2022. 可变形医学图像配准广泛应用于医学图像处理中,具有可逆一对一的映射。虽然最先进的图像配准方法是基于卷积神经网络,但很少有人尝试用Transformer的方法。现有的模型忽略了在嵌入学习中使用注意机制来处理远程交叉图,限制了这种方法来识别解剖结构的语义上有意义的对应关系。这些方法虽然实现了快速的图像配准,但也忽略了变换的拓扑保存和可逆性。在本文中,提出了一种新的基于Swin Transformer对称无监督学习网络,它可以最小化图像之间的差异,并同时估计正变换和逆变换像相关性.具体地说,本文提出了三维Swin-UNet,它应用具有Shfited window的分层Swin Transformer作为编码器来提取上下文特征。设计了一种基于patch expanding的symmetric swin Transformer解码器,进行上采样操作,估计配准场。此外,目标损失函数可以保证预测变换的实质性微分性质。本文在ADNI和PPMI两个数据集上验证了该方法,并在保持理想的微分性质的同时实现了最先进的配准精度。
Abstract:
Deformable medical image registration is widely used in medical image processing with the invertible and one-to-one mapping between images. While state-of-the-art image registration methods are based on convolutional neural networks, … >>>
Deformable medical image registration is widely used in medical image processing with the invertible and one-to-one mapping between images. While state-of-the-art image registration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on computer vision tasks. Existing models neglect to employ attention mechanisms to handle the long-range cross-image relevance in embedding learning, limiting such approaches to identify the semantically meaningful correspondence of anatomical structures. These methods also ignore the topology preservation and invertibility of the transformation although they achieve fast image registration. In this paper, we propose a novel, symmetric unsupervised learning network Swin-VoxelMorph based on the Swin Transformer which minimizes the dissimilarity between images and estimates both forward and inverse transformations simultaneously. Specifically, we propose 3D Swin-UNet, which applies hierarchical Swin Transformer with shifted windows as the encoder to extract context features. And a symmetric Swin Transformer-based decoder with patch expanding layer is designed to perform the up-sampling operation to estimate the registration fields. Besides, our objective loss functions can guarantee substantial diffeomorphic properties of the predicted transformations. We verify our method on two datasets including ADNI and PPMI, and it achieves state-of-the-art registration accuracy while maintaining desirable diffeomorphic properties. <<<
翻译
14.
前进 (2022-10-30 21:26):
#paper Shi J, He Y, Kong Y, et al. XMorpher: Full Transformer for Deformable Medical Image Registration via Cross Attention[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2022: 217-226.现有的深度网络专注于单个图像的特征提取,并且在对成对图像执行的配准任务方面受到限制。因此,本文提出了一种新的骨干网络XMorpher,有效地对变形配准中成对特征进行表示。1) 它提出了一种新的Transformer架构,包括双并行特征提取网络,该网络通过Cross Attention来改变信息,从而发现多级语义对应关系,同时逐步提取各自的特征,以实现最终的有效配准。2) 它提出了Cross Attention Transformer(CAT)块,以建立图像之间的注意力机制,该机制能够自动找到对应关系,并促使特征在网络中有效融合。3) 它限制了不同大小的基本窗口和搜索窗口之间的计算,从而集中于可变形配准的局部变换,同时提高了计算效率。XMorpher使Voxelmorph在DSC上提高了2.8%,证明了其在变形配准中对配对图像的特征的有效表示。
Abstract:
An effective backbone network is important to deep learning-based Deformable Medical Image Registration (DMIR), because it extracts and matches the features between two images to discover the mutual correspondence for … >>>
An effective backbone network is important to deep learning-based Deformable Medical Image Registration (DMIR), because it extracts and matches the features between two images to discover the mutual correspondence for fine registration. However, the existing deep networks focus on single image situation and are limited in registration task which is performed on paired images. Therefore, we advance a novel backbone network, XMorpher, for the effective corresponding feature representation in DMIR. 1) It proposes a novel full transformer architecture including dual parallel feature extraction networks which exchange information through cross attention, thus discovering multi-level semantic correspondence while extracting respective features gradually for final effective registration. 2) It advances the Cross Attention Transformer (CAT) blocks to establish the attention mechanism between images which is able to find the correspondence automatically and prompts the features to fuse efficiently in the network. 3) It constrains the attention computation between base windows and searching windows with different sizes, and thus focuses on the local transformation of deformable registration and enhances the computing efficiency at the same time. Without any bells and whistles, our XMorpher gives Voxelmorph 2.8% improvement on DSC, demonstrating its effective representation of the features from the paired images in DMIR. We believe that our XMorpher has great application potential in more paired medical images. Our XMorpher is open on https://github.com/Solemoon/XMorpher <<<
翻译
15.
前进 (2022-09-29 12:12):
#paper Affine Medical Image Registration with Coarse-to-Fine Vision Transformer Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 20835-20844 仿射配准是综合医学图像配准中不可缺少的一部分。然而,关于快速、鲁棒的仿射配准算法的研究很少。这些研究大多都是联合仿射和变形配准的CNN模型,而对仿射子网络的独立性能研究较少。此外,现有的基于CNN的仿射配准方法要么关注输入的局部错位,要么关注输入的全局方向和位置,以预测仿射变换矩阵,这种方法对空间初始化敏感,泛化能力有限。这篇论文提出了一种快速、鲁棒的基于学习的三维仿射医学图像配准算法C2FViT。该方法自然地利用Transformer的全局连通性和CNN的局部性以及多分辨率策略来学习全局仿射配准,并且在3D脑图谱配准中评估了该方法。结果表明该方法在配准精度、鲁棒性、配准速度和泛化性都表现良好。
Abstract:
Affine registration is indispensable in a comprehensive medical image registration pipeline. However, only a few studies focus on fast and robust affine registration algorithms. Most of these studies utilize convolutional … >>>
Affine registration is indispensable in a comprehensive medical image registration pipeline. However, only a few studies focus on fast and robust affine registration algorithms. Most of these studies utilize convolutional neural networks (CNNs) to learn joint affine and non-parametric registration, while the standalone performance of the affine subnetwork is less explored. Moreover, existing CNN-based affine registration approaches focus either on the local misalignment or the global orientation and position of the input to predict the affine transformation matrix, which are sensitive to spatial initialization and exhibit limited generalizability apart from the training dataset. In this paper, we present a fast and robust learning-based algorithm, Coarse-to-Fine Vision Transformer (C2FViT), for 3D affine medical image registration. Our method naturally leverages the global connectivity and locality of the convolutional vision transformer and the multi-resolution strategy to learn the global affine registration. We evaluate our method on 3D brain atlas registration and template-matching normalization. Comprehensive results demonstrate that our method is superior to the existing CNNs-based affine registration methods in terms of registration accuracy, robustness and generalizability while preserving the runtime advantage of the learning-based methods. The source code is available at this https URL. <<<
翻译
16.
前进 (2022-08-24 22:22):
#paper arXiv:2208.04939v1 ,2022,U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration? 基于Transformer的网络由于其长距离建模能力,在可变形图像配准中越来越流行。然而本文认为,一个具有5层卷积Unet网络的感受野足以在不需要依赖长距离建模能力的情况下捕捉精确的图像形变。本文想要探究UNet网络在应用于医学图像配准时,与现代基于Transformer的方法相比是否已经过时?为此,作者提出了一个具有大的卷积核的UNet网络(LKU-Net),即通过在一个普通的UNet网络内嵌入平行的卷积块来争强网络的感受野。在公用3D IXI 大脑数据集上进行基于atlas的配准实验,作者证明了LKU-Net的变现依旧可以和如今最先进的基于Transformer的方法相当甚至超越,而且只用了TransMorph 1.12%的参数量和10.8%的计算量。作者进一步将算法应用在MICCAI 2021的配准比赛中,同样超越了Transmorph,目前排在第一。只对UNet进行了简单的改造,基于Unet的配准算法依旧可以达到最先进的效果,证明基于UNet的配准网络并未过时。
Abstract:
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net … >>>
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based methods are outdated compared to modern transformer-based approaches when applied to medical image registration. For this, we propose a large kernel U-Net (LKU-Net) by embedding a parallel convolutional block to a vanilla U-Net in order to enhance the effective receptive field. On the public 3D IXI brain dataset for atlas-based registration, we show that the performance of the vanilla U-Net is already comparable with that of state-of-the-art transformer-based networks (such as TransMorph), and that the proposed LKU-Net outperforms TransMorph by using only 1.12% of its parameters and 10.8% of its mult-adds operations. We further evaluate LKU-Net on a MICCAI Learn2Reg 2021 challenge dataset for inter-subject registration, our LKU-Net also outperforms TransMorph on this dataset and ranks first on the public leaderboard as of the submission of this work. With only modest modifications to the vanilla U-Net, we show that U-Net can outperform transformer-based architectures on inter-subject and atlas-based 3D medical image registration. Code is available at this https URL. <<<
翻译
17.
前进 (2022-07-28 11:54):
#paper doi: 10.1109/TMI.2019.2953788 Transactions on Medical Imaging 2019 Progressively trained convolutional neural networks for deformable image registration 现有的基于深度学习的配准算法对存在大尺度变形的配准任务经常表现不佳。为了解决这种大尺度变形的问题,现有的方法主要分为两种:1、在配准前先采用传统的方法对图像进行预配准(affine,rigid)2、采用多个网络级联的方式,逐步变形,最终生成大尺度变形配准场。这两种方式都存在一定的弊端:1、传统方法耗时过长,削弱了利用深度学习进行后续配准的优势。2、级联网络在配准图像时,会对浮动图像进行多次插值,插值误差积累将会影响最后的变形场质量。因此论文作者提出只采用一个单独的网络联合渐进式训练方式来进行大尺度变形配准。渐进式训练方式首先是被用来提高GAN生成图像的分辨率,现被作者迁移用来解决配准问题。渐进式训练方式简单解释就是当网络的一层训练收敛以后,添加新层,再进行训练,直到生成最后的变形场。该论文有3点创新: 1、 提出了一个渐进式学习模型,能在同一个卷积网络内学习图像不同尺度的变形。 2、 证明了用神经网络配准两张图之前无需预配准。 3、 证明了神经网络可以采用合成的变形场进行监督训练,最后能够泛化解决实际配准问题。
Abstract:
Deep learning-based methods for deformable image registration are attractive alternatives to conventional registration methods because of their short registration times. However, these methods often fail to estimate larger displacements in … >>>
Deep learning-based methods for deformable image registration are attractive alternatives to conventional registration methods because of their short registration times. However, these methods often fail to estimate larger displacements in complex deformation fields, for which a multi-resolution strategy is required. In this article, we propose to train neural networks progressively to address this problem. Instead of training a large convolutional neural network on the registration task all at once, we initially train smaller versions of the network on lower resolution versions of the images and deformation fields. During training, we progressively expand the network with additional layers that are trained on higher resolution data. We show that this way of training allows a network to learn larger displacements without sacrificing registration accuracy and that the resulting network is less sensitive to large misregistrations compared to training the full network all at once. We generate a large number of ground truth example data by applying random synthetic transformations to a training set of images, and test the network on the problem of intrapatient lung CT registration. We analyze the learned representations in the progressively growing network to assess how the progressive learning strategy influences training. Finally, we show that a progressive training procedure leads to improved registration accuracy when learning large and complex deformations. <<<
翻译
18.
前进 (2022-06-30 17:14):
#paper doi:10.1109/CVPR42600.2020.00470 CVPR 2020 Fast Symmetric Diffeomorphic Image Registration with Convolutional Neural Networks 这篇图像配准论文的思路新颖,不同于以往浮动图像朝着固定图像配准的思路,本文将浮动图像和固定图像同时朝着中间图像进行配准。在图像配准过程中,需要保证变形场的微分同胚性,即需要保留图像的拓扑结构,保证变形场是可逆的(不发生折叠)。以往的基于学习的方法通常通过给变形场施加一个全局的正则化来实现这一要求。但是这种做法引入了超参数,要么容易导致变形场过度平坦使得配准精度下降,要么变形场变形过大无法保证变形场不发生折叠。受到传统的对称图像归一化方法的启发,本文提出了一种新的、有效的无监督对称图像配准方法,该方法使微分纯映射空间内图像之间的相似性最大化,并同时估计正变换和逆变换,使得输入的图像从两个方向朝中间对齐,能够同时保证配准精度和变形场的微分同胚性。
Abstract:
Diffeomorphic deformable image registration is crucial in many medical image studies, as it offers unique, special features including topology preservation and invertibility of the transformation. Recent deep learning-based deformable image … >>>
Diffeomorphic deformable image registration is crucial in many medical image studies, as it offers unique, special features including topology preservation and invertibility of the transformation. Recent deep learning-based deformable image registration methods achieve fast image registration by leveraging a convolutional neural network (CNN) to learn the spatial transformation from the synthetic ground truth or the similarity metric. However, these approaches often ignore the topology preservation of the transformation and the smoothness of the transformation which is enforced by a global smoothing energy function alone. Moreover, deep learning-based approaches often estimate the displacement field directly, which cannot guarantee the existence of the inverse transformation. In this paper, we present a novel, efficient unsupervised symmetric image registration method which maximizes the similarity between images within the space of diffeomorphic maps and estimates both forward and inverse transformations simultaneously. We evaluate our method on 3D image registration with a large scale brain image dataset. Our method achieves state-of-the-art registration accuracy and running time while maintaining desirable diffeomorphic properties. <<<
翻译
回到顶部