来自用户 前进 的文献。
当前共找到 26 篇文献分享,本页显示第 1 - 20 篇。
前进 (2025-02-28 16:52):
#paper DOI: 10.1109/TMI.2024.3362968. Haiqiao Wang, Dong Ni, and Yi Wang, "Recursive Deformable Pyramid Network for Unsupervised Medical Image Registration," IEEE Transactions on Medical Imaging, vol. 43, no. 6, pp. 2229-2240, Jun. 2024. 这篇论文提出了一种新的无监督医学图像配准方法,即递归可变形金字塔网络(RDP)。该方法采用纯卷积金字塔结构和逐步递归策略,从粗到细地预测变形场,同时整合高层语义信息,以确保变形场的合理性。其创新点在于提出了递归策略,通过多次特征融合,变形估计、变形融合以及跨层融合,能够有效处理大变形,且无需单独的仿射预对齐步骤,这在许多现有的可变形配准网络中是常见的要求。实验结果表明,RDP网络在三个公开的脑部磁共振成像(MRI)数据集上的表现优于多种现有的配准方法,在准确性和效率方面具有显著优势。
前进 (2025-01-31 22:31):
#paper 10.48550/arxiv.2408.10234 The Unbearable Slowness of Being: Why do we live at 10 bits/s? arXiv:2408.10234v2 [q-bio.NC] Jieyu Zheng, Markus Meiste 论文探讨了人类行为信息处理速度的悖论性缓慢。尽管人类的感官系统能够以每秒约10⁹比特(bits/s)的速度收集信息,但人类的整体信息处理速度却仅为每秒10比特。这种巨大的差异尚未得到充分解释,涉及大脑功能的许多基本方面。通过多种实验和案例,论文展示了人类行为的信息处理速度约为10 bits/s,且这种速度限制可能与大脑的串行处理特性有关。尽管外周神经系统(如视锥细胞和视神经)能够以极高的速率处理信息,但大脑的中枢部分似乎以串行方式处理信息,一次只能专注于一个任务。这种串行处理方式可能是大脑在进化过程中形成的,因为早期神经系统的主要功能是控制运动,而运动决策通常是局部的、单一的。此外,论文还提出大脑可能存在“外脑”和“内脑”两种模式:外脑负责处理高维度的感官输入和运动输出,信息处理速率极高;内脑则负责处理低维度的信息流,用于决策和行为控制,信息处理速率极低(约10 bits/s)。这种内外脑的分工可能是导致信息处理速度受限的重要原因。论文建议未来的研究需要进一步探索大脑内外信息处理的差异,以及如何优化信息处理效率。
arXiv, 2024-08-03T22:56:45Z. DOI: 10.48550/arXiv.2408.10234
This article is about the neural conundrum behind the slowness of humanbehavior. The information throughput of a human being is about 10 bits/s. Incomparison, our sensory systems gather data at … >>>
This article is about the neural conundrum behind the slowness of humanbehavior. The information throughput of a human being is about 10 bits/s. Incomparison, our sensory systems gather data at ~10^9 bits/s. The stark contrastbetween these numbers remains unexplained and touches on fundamental aspects ofbrain function: What neural substrate sets this speed limit on the pace of ourexistence? Why does the brain need billions of neurons to process 10 bits/s?Why can we only think about one thing at a time? The brain seems to operate intwo distinct modes: the "outer" brain handles fast high-dimensional sensory andmotor signals, whereas the "inner" brain processes the reduced few bits neededto control behavior. Plausible explanations exist for the large neuron numbersin the outer brain, but not for the inner brain, and we propose new researchdirections to remedy this. <<<
前进 (2024-12-31 20:09):
#paper DOI 10.48550/arXiv.2111.06377 He, K., Chen, X., Xie, S., Li, Y., Doll'ar, P., & Girshick, R. B. (2021). Masked Autoencoders Are Scalable Vision Learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 本文提出了一种创新的自监督学习框架器(MAE)。该方法的核心创新在于采用随机遮蔽策略,仅利用图像中未被遮蔽的25%像素来重建整个图像,从而迫使模型学习到更有效的视觉特征。此外,MAE采用非对称的编码器-解码器架构。使用一个编码器,仅处理未被遮蔽的图像部分,以及一个轻量级的解码器,它从编码器的输出和遮蔽部分的位置信息中重建原始图像。大幅降低了计算成本,提高了训练效率。实验结果表明,MAE在自监督预训练方面具有出色的泛化能力,可应用于多种下游任务,且具备良好的可扩展性。
arXiv, 2021-11-11T18:46:40Z. DOI: 10.48550/arXiv.2111.06377
This paper shows that masked autoencoders (MAE) are scalable self-supervisedlearners for computer vision. Our MAE approach is simple: we mask randompatches of the input image and reconstruct the missing pixels. … >>>
This paper shows that masked autoencoders (MAE) are scalable self-supervisedlearners for computer vision. Our MAE approach is simple: we mask randompatches of the input image and reconstruct the missing pixels. It is based ontwo core designs. First, we develop an asymmetric encoder-decoder architecture,with an encoder that operates only on the visible subset of patches (withoutmask tokens), along with a lightweight decoder that reconstructs the originalimage from the latent representation and mask tokens. Second, we find thatmasking a high proportion of the input image, e.g., 75%, yields a nontrivialand meaningful self-supervisory task. Coupling these two designs enables us totrain large models efficiently and effectively: we accelerate training (by 3xor more) and improve accuracy. Our scalable approach allows for learninghigh-capacity models that generalize well: e.g., a vanilla ViT-Huge modelachieves the best accuracy (87.8%) among methods that use only ImageNet-1Kdata. Transfer performance in downstream tasks outperforms supervisedpre-training and shows promising scaling behavior. <<<
前进 (2024-11-30 23:26):
# paper DOI: [10.1093/bib/bbv088](https://doi.org/10.1093/bib/bbv088) A Comparison of Base-calling Algorithms for Illumina Sequencing Technology 这篇论文主要讲述了Illumina测序技术中用于basecalling的不同算法的性能比较。文章提供了一个综合的比较分析,涵盖了多种最近开发的bascalling算法,并提出了一个统一的统计模型,该模型能够涵盖大多数现有的basecall算法。研究的目的在于通过比较这些算法在处理Illumina平台产生的测序数据时的准确性和效率,来帮助科研人员选择最适合他们需求的basecall工具。论文中提到的算法包括Bustard、Srfim、AYB、Ibis和freeIbis等,并通过实验数据评估了它们的对齐率、错误率和区分能力。通过这些比较,论文旨在为高通量测序数据分析中basecall步骤提供指导和建议。
前进 (2024-10-31 15:09):
#paper arXiv:2408.05839v2 Deep Learning in Medical Image Registration: Magic or Mirage? 38th Conference on Neural Information Processing Systems (NeurIPS 2024) 这篇论文深入探讨了医学图像配准领域中,基于深度学习的图像配准(DLIR)与传统优化方法的性能对比。论文比较了传统优化方法和基于学习的学习方法在DIR中的性能,指出传统方法在跨模态的泛化能力和稳健性能方面具有优势,而基于学习的方法则通过弱监督来实现更优的性能。通过一系列实验,论文验证了在无监督设置下,基于学习的方法在标签匹配性能上并没有显著超越传统方法,并提出了一个假设,即学习方法中的架构设计不太可能影响像素强度分布和标签之间的互信息,因此也不太可能显著提升基于学习的方法的性能。此外,论文还展示了在弱监督下,基于学习的方法具有更高的配准精度,这是传统方法难以实现的。然而,基于学习的方法对数据分布的变化较为敏感,并且未能展现出对数据分布变化的鲁棒性。论文最后给出结论,如果没有大型标记数据集,传统优化方法仍然是更优的选择。
arXiv, 2024-08-11T18:20:08Z. DOI: 10.48550/arXiv.2408.05839
Classical optimization and learning-based methods are the two reigningparadigms in deformable image registration. While optimization-based methodsboast generalizability across modalities and robust performance, learning-basedmethods promise peak performance, incorporating weak supervision and … >>>
Classical optimization and learning-based methods are the two reigningparadigms in deformable image registration. While optimization-based methodsboast generalizability across modalities and robust performance, learning-basedmethods promise peak performance, incorporating weak supervision and amortizedoptimization. However, the exact conditions for either paradigm to perform wellover the other are shrouded and not explicitly outlined in the existingliterature. In this paper, we make an explicit correspondence between themutual information of the distribution of per-pixel intensity and labels, andthe performance of classical registration methods. This strong correlationhints to the fact that architectural designs in learning-based methods isunlikely to affect this correlation, and therefore, the performance oflearning-based methods. This hypothesis is thoroughly validated withstate-of-the-art classical and learning-based methods. However, learning-basedmethods with weak supervision can perform high-fidelity intensity and labelregistration, which is not possible with classical methods. Next, we show thatthis high-fidelity feature learning does not translate to invariance to domainshift, and learning-based methods are sensitive to such changes in the datadistribution. Finally, we propose a general recipe to choose the best paradigmfor a given registration problem, based on these observations. <<<
前进 (2024-09-30 16:31):
#paper DOI 10.1186/1471-2105-12-451 Frazer Meacham, Dario Boffelli, Joseph , Identification and correction of systematic error in high-throughput sequence data 这篇论文主要研究了高通量测序数据中系统性错误的问题。系统性错误是指在基因组(或转录组)特定位置的测序读段中,以统计上不太可能的方式累积出现的错误。作者们通过使用高覆盖率数据中的重叠配对读段来表征和描述系统性错误,发现这类错误大约每1000个碱基对中发生一次,并且在不同实验中高度可复制。他们识别了在系统性错误位点频繁出现的序列,并设计了一个分类器,用于区分杂合位点和系统性错误。这个分类器可以用于处理杂合位点等位基因频率不一定为0.5的实验数据,并且可以用于单端数据集。论文的结论是,系统性错误可能很容易被误认为是个体中的杂合位点,或者是群体分析中的SNPs。作者们通过系统性错误的特征描述,开发了一个名为SysCall的程序,用于识别和纠正这类错误,并得出结论认为,在设计和解释高通量测序实验时,考虑纠正系统性错误是很重要的。
Abstract Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from … >>>
Abstract Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. Results We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. Conclusions Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments. <<<
前进 (2024-08-31 14:29):
#paper https://doi.org/10.15326/jcopdf.2023.0399 Chen, J., Xu, Z., Sun, L., Yu, K., Hersh, C. P., Boueiz, A., ... Batmanghelich, K. (2023). Deep learning integration of chest computed tomography and gene expression identifies novel aspects of COPD. Chronic Obstructive Pulmonary Diseases: Journal of the COPD Foundation, 10(4), 355-368. 这篇论文通过深度学习的方法,联合分析了慢性阻塞性肺病(COPD)患者的胸部CT扫描图像和血液RNA测序数据,以探索肺部结构变化与血液转录组模式之间的新型关系。研究识别出了两种图像-表达轴(IEAs),分别与肺气肿和气道疾病相关,揭示了它们与COPD的不同临床测量和健康预后的关联。此外,研究还通过生物信息学分析,确定了与这两种IEAs相关的生物学通路。这项研究为理解COPD的异质性提供了新的视角,并可能有助于开发针对性的治疗方法。
Rationale: Chronic obstructive pulmonary disease (COPD) is characterized by pathologic changes in the airways, lung parenchyma, and persistent inflammation, but the links between lung structural changes and blood transcriptome patterns … >>>
Rationale: Chronic obstructive pulmonary disease (COPD) is characterized by pathologic changes in the airways, lung parenchyma, and persistent inflammation, but the links between lung structural changes and blood transcriptome patterns have not been fully described.Objections: The objective of this study was to identify novel relationships between lung structural changes measured by chest computed tomography (CT) and blood transcriptome patterns measured by blood RNA sequencing (RNA-seq).Methods: CT scan images and blood RNA-seq gene expression from 1223 participants in the COPD Genetic Epidemiology (COPDGene®) study were jointly analyzed using deep learning to identify shared aspects of inflammation and lung structural changes that we labeled image-expression axes (IEAs). We related IEAs to COPD-related measurements and prospective health outcomes through regression and Cox proportional hazards models and tested them for biological pathway enrichment.Results: We identified 2 distinct IEAs: IEAemph which captures an emphysema-predominant process with a strong positive correlation to CT emphysema and a negative correlation to forced expiratory volume in 1 second and body mass index (BMI); and IEAairway which captures an airway-predominant process with a positive correlation to BMI and airway wall thickness and a negative correlation to emphysema. Pathway enrichment analysis identified 29 and 13 pathways significantly associated with IEAemph and IEAairway, respectively (adjusted p<0.001).Conclusions: Integration of CT scans and blood RNA-seq data identified 2 IEAs that capture distinct inflammatory processes associated with emphysema and airway-predominant COPD. <<<
前进 (2024-07-31 11:35):
#paper DOI:https://doi.org/10.48550/arXiv.2006.16236 Katharopoulos A, Vyas A, Pappas N, et al. Transformers are rnns: Fast autoregressive transformers with linear attention[C]//International conference on machine learning. PMLR, 2020: 5156-5165. 这篇论文提出了一种新型的线性Transformer模型,该模型通过将自注意力机制表达为线性点积的核特征映射,并利用矩阵乘法的结合性质,显著降低了传统Transformer在处理长序列时的计算复杂度,从O(N^2)降低到O(N)。作者展示了这种新模型不仅能够实现与标准Transformer相似的性能,而且在自回归预测长序列时速度提升了多达4000倍。此外,论文还探讨了Transformer与循环神经网络(RNN)之间的关系,证明了通过适当的转换,Transformer可以像RNN一样高效地进行自回归预测。
arXiv, 2020-06-29T17:55:38Z. DOI: 10.48550/arXiv.2006.16236
Transformers achieve remarkable performance in several tasks but due to theirquadratic complexity, with respect to the input's length, they areprohibitively slow for very long sequences. To address this limitation, weexpress … >>>
Transformers achieve remarkable performance in several tasks but due to theirquadratic complexity, with respect to the input's length, they areprohibitively slow for very long sequences. To address this limitation, weexpress the self-attention as a linear dot-product of kernel feature maps andmake use of the associativity property of matrix products to reduce thecomplexity from $\mathcal{O}\left(N^2\right)$ to $\mathcal{O}\left(N\right)$,where $N$ is the sequence length. We show that this formulation permits aniterative implementation that dramatically accelerates autoregressivetransformers and reveals their relationship to recurrent neural networks. Ourlinear transformers achieve similar performance to vanilla transformers andthey are up to 4000x faster on autoregressive prediction of very longsequences. <<<
前进 (2024-06-30 22:29):
#paper Liu R , Li Z , Fan X ,et al.Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond[J]. 2020.DOI:10.48550/arXiv.2004.14557. 论文提出了一个新的基于深度学习的框架,旨在通过多尺度传播优化微分同胚模型来整合传统变形配准方法和基于深度学习的方法的优势,并避免它们的局限性。具体来说,作者提出了一个通用的优化模型来解决微分同胚配准问题,并开发了一系列可学习的架构,以从粗到细的学习图像特征完成配准。此外,论文还提出了一种新颖的双层自调整训练策略,允许高效地搜索任务特定的超参数,这增加了对各种类型数据的灵活性,同时减少了计算和人力负担。 作者多种数据集上进行了配准实验,包括大脑MRI数据的图像到图谱配准和肝脏CT数据的图像到图像配准。实验结果表明,所提出的方法在保持微分同胚的同时,达到了最先进的性能。此外,作者还将其框架应用于多模态图像配准,并研究了其配准如何支持医学图像分析的下游任务,包括多模态融合和图像分割。
Conventional deformable registration methods aim at solving an optimizationmodel carefully designed on image pairs and their computational costs areexceptionally high. In contrast, recent deep learning based approaches canprovide fast deformation … >>>
Conventional deformable registration methods aim at solving an optimizationmodel carefully designed on image pairs and their computational costs areexceptionally high. In contrast, recent deep learning based approaches canprovide fast deformation estimation. These heuristic network architectures arefully data-driven and thus lack explicit geometric constraints, e.g.,topology-preserving, which are indispensable to generate plausibledeformations. We design a new deep learning based framework to optimize adiffeomorphic model via multi-scale propagation in order to integrateadvantages and avoid limitations of these two categories of approaches.Specifically, we introduce a generic optimization model to formulatediffeomorphic registration and develop a series of learnable architectures toobtain propagative updating in the coarse-to-fine feature space. Moreover, wepropose a novel bilevel self-tuned training strategy, allowing efficient searchof task-specific hyper-parameters. This training strategy increases theflexibility to various types of data while reduces computational and humanburdens. We conduct two groups of image registration experiments on 3D volumedatasets including image-to-atlas registration on brain MRI data andimage-to-image registration on liver CT data. Extensive results demonstrate thestate-of-the-art performance of the proposed method with diffeomorphicguarantee and extreme efficiency. We also apply our framework to challengingmulti-modal image registration, and investigate how our registration to supportthe down-streaming tasks for medical image analysis including multi-modalfusion and image segmentation. <<<
前进 (2024-05-30 13:53):
#paper Luo S, Xie Z, Chen G, et al. Hierarchical DNN with Heterogeneous Computing Enabled High-Performance DNA Sequencing[C]//2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). IEEE, 2022: 35-40. 这篇论文采用深度学习算法进行第二代基因测序。算法AYB是所有测序算法中精度最高的,但是随着推移荧光信号减弱,AYB算法处理效果并不好,并且它也难以解决DNA的phasing效应。而深度学习方法则能很高的解决上述问题。它首先通过前5个循环的采集到的荧光图像检测cluster的位置,提取后续cluster强度,再通过传统通道校正算法校正强度色差,然后将校正后的结果输入到DNN中判断碱基类别。实验结果表明,深度学习的方案相比于传统算法能够多检测出12.18%的reads数量,且碱基的分类错误率从0.1432% 降到0.0175%
前进 (2024-04-30 11:44):
#paper Han D, Pan X, Han Y, et al. Flatten transformer: Vision transformer using focused linear attention[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 5961-5971. 自注意力(self-attention)在计算机视觉任务中应用时面临的主要挑战是其二次计算复杂度,这使得处理视觉任务变得非常昂贵。作为Softmax注意力的一种替代方案,线性注意力通过精心设计的映射函数来近似Softmax操作,从而将计算复杂度从二次降低到线性。尽管线性注意力在理论上更加高效,但现有的线性注意力方法要么性能显著下降,要么需要额外的计算开销,这限制了它们的实际应用。为了克服这些限制,论文提出了FLA模块,它通过两个主要的改进来提高效率和表达能力:焦点能力:1 通过一个简单的映射函数,增强了自注意力对最信息特征的聚焦能力。特征多样性:引入了一个高效的秩恢复模块,通过深度卷积(DWC)来恢复注意力矩阵的秩,增加了特征的多样性。通过在多个先进的视觉Transformer模型上的广泛实验,FLA模块在多个基准测试中均显示出了一致的性能提升。
前进 (2024-03-31 12:44):
#paper [1] Hu X , Kang M , Huang W ,et al.Dual-Stream Pyramid Registration Network[J].Springer, Cham, 2019.DOI:10.1007/978-3-030-32245-8_43. 这篇论文主要用于无监督的3D大脑医学图像配准。与以往的基于卷积神经网络(CNN)的配准方法不同,例如VoxelMorph,Dual-PRNet设计了一个双流架构,能够从一对3D体积图像中顺序估计多级配准场。 主要贡献包括: 设计了一个双流3D编码器-解码器网络,分别从两个输入体积计算两个卷积特征金字塔。 提出了一种顺序金字塔配准方法,设计了一系列金字塔配准(PR)模块,直接从解码特征金字塔预测多级配准场。通过顺序变形,逐渐以粗到细的方式细化配准场,使模型具有处理大变形的强大能力。 通过计算特征金字塔之间的局部3D相关性,可以进一步增强PR模块,从而得到改进的Dual-PRNet++,能够聚合丰富的详细解剖结构。 将Dual-PRNet++集成到3D分割框架中,通过精确变形体素级注释,实现联合配准和分割。 论文还介绍了相关工作,包括基于深度学习的医学图像配准方法,并对提出的方法进行了评估。在Mindboggle101数据集上,Dual-PRNet++在Dice得分上从0.511提高到0.748,大幅度超过了现有的最先进方法。此外,论文还展示了该方法在有限注释的联合学习框架中,如何通过利用有限的注释极大地促进分割任务的完成。
前进 (2024-02-28 10:57):
#paper Mckenzie E M , Santhanam A , Ruan D ,et al.Multimodality image registration in the head‐and‐neck using a deep learning‐derived synthetic CT as a bridge[J].Medical Physics, 2020, 47(3).DOI:10.1002/mp.13976. 本文提出并验证一种利用深度学习驱动的跨模态综合技术的头颈多模式图像配准方法。 采用CycleGAN将MRI 转化为合成CT(sCT),将头颈部的MRI-CT多模态配准转化为sCT-CT的单模态配准。配准方法采用传统的B-spline方法。实验结果表明sCT→CT 配准精度好于MRI→CT。平均配准误差从9.8mm下降到6.0mm
IF:3.200Q1 Medical physics, 2020-Mar. DOI: 10.1002/mp.13976 PMID: 31853975
PURPOSE: To develop and demonstrate the efficacy of a novel head-and-neck multimodality image registration technique using deep-learning-based cross-modality synthesis.METHODS AND MATERIALS: Twenty-five head-and-neck patients received magnetic resonance (MR) and computed … >>>
PURPOSE: To develop and demonstrate the efficacy of a novel head-and-neck multimodality image registration technique using deep-learning-based cross-modality synthesis.METHODS AND MATERIALS: Twenty-five head-and-neck patients received magnetic resonance (MR) and computed tomography (CT) (CTaligned ) scans on the same day with the same immobilization. Fivefold cross validation was used with all of the MR-CT pairs to train a neural network to generate synthetic CTs from MR images. Twenty-four of 25 patients also had a separate CT without immobilization (CTnon-aligned ) and were used for testing. CTnon-aligned 's were deformed to the synthetic CT, and compared to CTnon-aligned registered to MR. The same registrations were performed from MR to CTnon-aligned and from synthetic CT to CTnon-aligned . All registrations used B-splines for modeling the deformation, and mutual information for the objective. Results were evaluated using the 95% Hausdorff distance among spinal cord contours, landmark error, inverse consistency, and Jacobian determinant of the estimated deformation fields.RESULTS: When large initial rigid misalignment is present, registering CT to MRI-derived synthetic CT aligns the cord better than a direct registration. The average landmark error decreased from 9.8 ± 3.1 mm in MR→CTnon-aligned to 6.0 ± 2.1 mm in CTsynth →CTnon-aligned deformable registrations. In the CT to MR direction, the landmark error decreased from 10.0 ± 4.3 mm in CTnon-aligned →MR deformable registrations to 6.6 ± 2.0 mm in CTnon-aligned →CTsynth deformable registrations. The Jacobian determinant had an average value of 0.98. The proposed method also demonstrated improved inverse consistency over the direct method.CONCLUSIONS: We showed that using a deep learning-derived synthetic CT in lieu of an MR for MR→CT and CT→MR deformable registration offers superior results to direct multimodal registration. <<<
前进 (2024-01-31 22:50):
#paper arxiv.org//pdf/2311.026 2023 Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection. 大型多模态模型 (LMM) GPT-4V(ision) 赋予 GPT-4 视觉grounding能力,使得通过视觉问答 (VQA) 范式处理某些任务成为可能。本文探讨了面向 VQA 的 GPT-4V 在最近流行的视觉异常检测(AD)中的潜力,并首次对流行的 MVTec AD 和 VisA 数据集进行定性和定量评估。 考虑到该任务需要图像/像素级评估,提出的 GPT-4V-AD 框架包含三个组成部分:1)粒度区域划分,2)提示设计,3)用于轻松定量评估的 Text2Segmentation,并做了一些不同的 尝试进行比较分析。 结果表明,GPT-4V可以通过VQA范式在零样本AD任务中取得一定的结果,例如在MVTec AD和VisA数据集上分别实现图像级77.1/88.0和像素级68.0/76.6 AU-ROC 。 然而,其性能与最先进的零样本方法(例如WinCLIP和CLIP-AD)相比仍然存在一定差距,需要进一步研究。 这项研究为零样本 AD 任务中面向 VQA 的 LMM 的研究提供了基线参考
Large Multimodal Model (LMM) GPT-4V(ision) endows GPT-4 with visual groundingcapabilities, making it possible to handle certain tasks through the VisualQuestion Answering (VQA) paradigm. This paper explores the potential ofVQA-oriented GPT-4V … >>>
Large Multimodal Model (LMM) GPT-4V(ision) endows GPT-4 with visual groundingcapabilities, making it possible to handle certain tasks through the VisualQuestion Answering (VQA) paradigm. This paper explores the potential ofVQA-oriented GPT-4V in the recently popular visual Anomaly Detection (AD) andis the first to conduct qualitative and quantitative evaluations on the popularMVTec AD and VisA datasets. Considering that this task requires bothimage-/pixel-level evaluations, the proposed GPT-4V-AD framework contains threecomponents: 1) Granular Region Division, 2) Prompt Designing, 3)Text2Segmentation for easy quantitative evaluation, and have made somedifferent attempts for comparative analysis. The results show that GPT-4V canachieve certain results in the zero-shot AD task through a VQA paradigm, suchas achieving image-level 77.1/88.0 and pixel-level 68.0/76.6 AU-ROCs on MVTecAD and VisA datasets, respectively. However, its performance still has acertain gap compared to the state-of-the-art zero-shot method, e.g., WinCLIPann CLIP-AD, and further research is needed. This study provides a baselinereference for the research of VQA-oriented LMM in the zero-shot AD task, and wealso post several possible future works. Code is available at\url{https://github.com/zhangzjn/GPT-4V-AD}. <<<
前进 (2023-12-27 15:11):
#paper arXiv:2312.11514v1 ,2023, LLM in a flash: Efficient Large Language Model Inference with Limited Memory 大型语言模型(LLMs)在现代自然语言处理中具有重要作用,但其高昂的计算和内存需求对于内存有限的设备构成了挑战。为了高效运行超过可用DRAM容量的LLMs,该论文采用了存储模型参数在闪存上,并按需将其调入DRAM的方法。研究方法包括构建与闪存行为协调的推理模型,并在两个关键领域进行优化:减少闪存传输的数据量和以更大、更连续的块来读取数据。在这个框架下,引入了两种主要技术:“windowing”策略通过重复使用先前激活的神经元减少数据传输,“row-column bunding”则充分利用了闪存的顺序数据访问特性,增加了从闪存中读取的数据块的大小。这些方法使得可以在有限DRAM上运行比原先两倍大的模型,相较于朴素的加载方法,在CPU和GPU上推断速度分别提高了4-5倍和20-25倍。
Large language models (LLMs) are central to modern natural languageprocessing, delivering exceptional performance in various tasks. However, theirintensive computational and memory requirements present challenges, especiallyfor devices with limited DRAM capacity. … >>>
Large language models (LLMs) are central to modern natural languageprocessing, delivering exceptional performance in various tasks. However, theirintensive computational and memory requirements present challenges, especiallyfor devices with limited DRAM capacity. This paper tackles the challenge ofefficiently running LLMs that exceed the available DRAM capacity by storing themodel parameters on flash memory but bringing them on demand to DRAM. Ourmethod involves constructing an inference cost model that harmonizes with theflash memory behavior, guiding us to optimize in two critical areas: reducingthe volume of data transferred from flash and reading data in larger, morecontiguous chunks. Within this flash memory-informed framework, we introducetwo principal techniques. First, "windowing'" strategically reduces datatransfer by reusing previously activated neurons, and second, "row-columnbundling", tailored to the sequential data access strengths of flash memory,increases the size of data chunks read from flash memory. These methodscollectively enable running models up to twice the size of the available DRAM,with a 4-5x and 20-25x increase in inference speed compared to naive loadingapproaches in CPU and GPU, respectively. Our integration of sparsity awareness,context-adaptive loading, and a hardware-oriented design paves the way foreffective inference of LLMs on devices with limited memory. <<<
前进 (2023-11-30 10:22):
#paper GraformerDIR: Graph convolution transformer for deformable image registration Computers in Biology and Medicine 30 june 2022 https://doi.org/10.1016/j.compbiomed.2022.105799 这是一篇用图卷积来进行图像配准的论文,通过将图卷积变换器(Graformer)层放在 在特征提取网络中,提出了一个基于Graformer的DIR框架,命名为GraformerDIR。Graformer层由Graformer模块和Cheby-shev图卷积模块组成。其中 Graformer模块旨在捕获高质量的长期依赖关系。Cheby-shev图卷积模块用于进一步扩大感受野。GraformerDIR的性能已经在公开的大脑数据集中进行了评估,包括OASIS、LPBA40和MGH10数据集。与VoxelMorph相比,GraformerDIR在DSC方面获得4.6%的性能改进,在平均值方面获得0.055mm的性能改进,同时折叠率更低。
PURPOSE: Deformable image registration (DIR) plays an important role in assisting disease diagnosis. The emergence of the Transformer enables the DIR framework to extract long-range dependencies, which relieves the limitations … >>>
PURPOSE: Deformable image registration (DIR) plays an important role in assisting disease diagnosis. The emergence of the Transformer enables the DIR framework to extract long-range dependencies, which relieves the limitations of intrinsic locality caused by convolution operation. However, suffering from the interference of missing or spurious connections, it is a challenging task for Transformer-based methods to capture the high-quality long-range dependencies.METHODS: In this paper, by staking the graph convolution Transformer (Graformer) layer at the bottom of the feature extraction network, we propose a Graformer-based DIR framework, named GraformerDIR. The Graformer layer is consist of the Graformer module and the Cheby-shev graph convolution module. Among them, the Graformer module is designed to capture high-quality long-range dependencies. Cheby-shev graph convolution module is employed to further enlarge the receptive field.RESULTS: The performance and generalizability of GraformerDIR have been evaluated on publicly available brain datasets including the OASIS, LPBA40, and MGH10 datasets. Compared with VoxelMorph, the GraformerDIR has obtained performance improvements of 4.6% in Dice similarity coefficient (DSC) and 0.055 mm in the average symmetric surface distance (ASD) while reducing the non-positive rate of Jacobin determinant (Npr.Jac) index about 60 times on publicly available OASIS dataset. On unseen dataset MGH10, the GraformerDIR has obtained the performance improvements of 4.1% in DSC and 0.084 mm in ASD compared with VoxelMorph, which demonstrates the GraformerDIR with better generalizability. The promising performance on the clinical cardiac dataset ACDC indicates the GraformerDIR is practicable.CONCLUSION: With the advantage of Transformer and graph convolution, the GraformerDIR has obtained comparable performance with the state-of-the-art method VoxelMorph. <<<
前进 (2023-10-30 13:57):
#paper https://doi.org/10.1088/1361-6560/ac5f70 Training low dose CT denoising network without high quality reference data 低剂量CT(LDCT)去噪领域主要是基于监督学习的方法,需要完全配准的LDCT对及其相应的干净参考图像(normal-dose CT)。然而,无干净标签的训练更具有实际意义,因为在临床上不可能获得大量的这些配对样本。本文提出了一种用于LDCT成像的自监督去噪方法。方法该方法不需要任何干净的图像。此外,在去噪过程中,利用感知损失来实现特征域的数据一致性。在解码阶段使用的注意块可以帮助进一步提高图像质量。在实验中横向对比了3种方法,并进行了6个消融实验,验证了提出的自监督框架的有效性,以及自注意模块和感知损失的有效性。
Currently, the field of low-dose CT (LDCT) denoising is dominated by supervised learning based methods, which need perfectly registered pairs of LDCT and its corresponding clean reference image (normal-dose CT). … >>>
Currently, the field of low-dose CT (LDCT) denoising is dominated by supervised learning based methods, which need perfectly registered pairs of LDCT and its corresponding clean reference image (normal-dose CT). However, training without clean labels is more practically feasible and significant, since it is clinically impossible to acquire a large amount of these paired samples. In this paper, a self-supervised denoising method is proposed for LDCT imaging.The proposed method does not require any clean images. In addition, the perceptual loss is used to achieve data consistency in feature domain during the denoising process. Attention blocks used in decoding phase can help further improve the image quality.In the experiments, we validate the effectiveness of our proposed self-supervised framework and compare our method with several state-of-the-art supervised and unsupervised methods. The results show that our proposed model achieves competitive performance in both qualitative and quantitative aspects to other methods.Our framework can be directly applied to most denoising scenarios without collecting pairs of training data, which is more flexible for real clinical scenario. <<<
前进 (2023-09-27 10:56):
#paper doi:10.1109/cvpr.2019.00223  2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).  Noise2Void - Learning Denoising From Single Noisy Images. 基于深度学习的图像去噪一般是通过干净图像和噪声图像组成的图相对来进行训练的。目前也有一些做法可以无需干净图像,仅需多张噪声图像来完成模型的训练(N2N)。而本文提出了一种基于单张噪声图像的去噪方法。基于Patch去噪的观点认为,结果图像中的每一个像素点由于感受野的限制只取决于输入图像中的一部分区域。基于这个观点衍生出许多去噪的方法,例如Noise2Noise的方法,它不再需要干净的图像作为target。而本文提出了一种只需要单张噪声图像就能完成去噪的方法。作者认为,如果对于单张图像,以其中的一个patch作为网络的input,以这个patch中心位置的像素作为target,那么网络将会学习到直接将输入patch中心的像素映射到网络的输出这这种identity map。因此,作者设计了有一种特殊的感受野,将感受野中心的像素“抹去”,再要求网络去预测中心位置的信息。这种做法基于两个假设:1、不同位置的噪声像素之间是相互独立的 2、噪声的均值为0 。因此预测出来的中心像素点更有可能是信号而非噪声。
前进 (2023-01-31 23:30):
#paper Rethinking 1x1 Convolutions: Can we train CNNs with Frozen Random Filters? arXiv:2301.11360 本文引入了一种新的卷积块,计算(冻结随机)滤波器的可学习线性组合(LC),并由此提出 LCResNets,还提出一种新的权重共享机制,可大幅减少权重的数量。在本文中,即使在仅随机初始化且从不更新空间滤波器的极端情况下,某些CNN架构也可以被训练以超过标准训练的精度。通过将逐点(1x1)卷积的概念重新解释为学习冻结(随机)空间滤波器的线性组合(LC)的算子,这种方法不仅可以在CIFAR和ImageNet上达到较高的测试精度,而且在模型鲁棒性、泛化、稀疏 性和所需权重的总数方面具有良好。此外本文提出了一种新的权重共享机制,该机制允许在所有空间卷积层之间共享单个权重张量,以大幅减少权重的数量。
arXiv, 2023.
Modern CNNs are learning the weights of vast numbers of convolutional operators. In this paper, we raise the fundamental question if this is actually necessary. We show that even in … >>>
Modern CNNs are learning the weights of vast numbers of convolutional operators. In this paper, we raise the fundamental question if this is actually necessary. We show that even in the extreme case of only randomly initializing and never updating spatial filters, certain CNN architectures can be trained to surpass the accuracy of standard training. By reinterpreting the notion of pointwise (1×1) convolutions as an operator to learn linear combinations (LC) of frozen (random) spatial filters, we are able to analyze these effects and propose a generic LC convolution block that allows tuning of the linear combination rate. Empirically, we show that this approach not only allows us to reach high test accuracies on CIFAR and ImageNet but also has favorable properties regarding model robustness, generalization, sparsity, and the total number of necessary weights. Additionally, we propose a novel weight sharing mechanism, which allows sharing of a single weight tensor between all spatial convolution layers to massively reduce the number of weights. <<<
前进 (2022-12-31 11:39):
#paper Liu Y, Chen J, Wei S, et al. On Finite Difference Jacobian Computation in Deformable Image Registration[J]. arXiv preprint arXiv:2212.06060, 2022. 产生微分同胚的空间变换一直是变形图像配准的中心问题。作为一个微分同胚变换,应在任何位置都具有正的雅可比行列式|J|。|J|<0的体素数已被用于测试微分同胚性,也用于测量变换的不规则性。 对于数字变换,|J|通常使用中心差来近似,但是对于即使在体素分辨率级别上也明显不具有差分同胚性的变换,这种策略可以产生正的|J|。为了证明这一点,论文首先研究了|J|的不同有限差分近似的几何意义。为了确定数字图像的微分同胚性,使用任何单独的有限差分逼近|J|是不够的。论文证明对于2D变换,|J|的四个唯一的有限差分近似必须是正的,以确保整个域是可逆的,并且在像素级没有折叠。在3D中,|J|的十个唯一的有限差分近似值需要是正的。论文提出的数字微分同胚准则解决了|J|的中心差分近似中固有的几个误差,并准确地检测非微分同胚数字变换。
Producing spatial transformations that are diffeomorphic has been a central problem in deformable image registration. As a diffeomorphic transformation should have positive Jacobian determinant |J| everywhere, the number of voxels … >>>
Producing spatial transformations that are diffeomorphic has been a central problem in deformable image registration. As a diffeomorphic transformation should have positive Jacobian determinant |J| everywhere, the number of voxels with |J|<0 has been used to test for diffeomorphism and also to measure the irregularity of the transformation. For digital transformations, |J| is commonly approximated using central difference, but this strategy can yield positive |J|'s for transformations that are clearly not diffeomorphic -- even at the voxel resolution level. To show this, we first investigate the geometric meaning of different finite difference approximations of |J|. We show that to determine diffeomorphism for digital images, use of any individual finite difference approximations of |J| is insufficient. We show that for a 2D transformation, four unique finite difference approximations of |J|'s must be positive to ensure the entire domain is invertible and free of folding at the pixel level. We also show that in 3D, ten unique finite differences approximations of |J|'s are required to be positive. Our proposed digital diffeomorphism criteria solves several errors inherent in the central difference approximation of |J| and accurately detects non-diffeomorphic digital transformations. <<<