当前共找到 1086 篇文献分享,本页显示第 841 - 860 篇。
841.
沈么是快乐星球
(2022-07-29 08:53):
#paper doi:10.1038/s41467-020-19681-1 Nature Communications, 2020, Genome-enabled discovery of anthraquinone biosynthesis in Senna tora.决明作为一种中草药,主要活性物质为其大量蒽醌,蒽醌主要存在于种子中。本文通过全基因组测序,比较基因组学分析发现决明中CHS基因家族的特异快速扩展的特征,且集中分布在染色体7上;通过不同发育时期种子的代谢物测定与转录组测定,筛选出3个候选基因,根据表达模式,进化关系与基因结构确定一个候选基因,并选择亲缘关系较远的另一个CHS基因家族为阴性对照;最后通过体外酶学反应进行验证(候选基因表达蛋白、失活候选基因表达蛋白、阴性对照蛋白,仅候选基因蛋白催化底物生成下游产物)。思路简单明了,在筛选候选基因时,使用了基因表达模式与代谢物表达模式相似的基因簇为基础,并构建了“代谢库”,分析其主要富集的代谢通路。在进行酶学反应时,因涉及到大部分的代谢知识,还并未详细研究。
Abstract:
Senna tora is a widely used medicinal plant. Its health benefits have been attributed to the large quantity of anthraquinones, but how they are made in plants remains a mystery. …
>>>
Senna tora is a widely used medicinal plant. Its health benefits have been attributed to the large quantity of anthraquinones, but how they are made in plants remains a mystery. To identify the genes responsible for plant anthraquinone biosynthesis, we reveal the genome sequence of S. tora at the chromosome level with 526 Mb (96%) assembled into 13 chromosomes. Comparison among related plant species shows that a chalcone synthase-like (CHS-L) gene family has lineage-specifically and rapidly expanded in S. tora. Combining genomics, transcriptomics, metabolomics, and biochemistry, we identify a CHS-L gene contributing to the biosynthesis of anthraquinones. The S. tora reference genome will accelerate the discovery of biologically active anthraquinone biosynthesis pathways in medicinal plants.
<<<
翻译
842.
颜林林
(2022-07-29 08:21):
#paper doi:10.1093/nar/gkac586 Nucleic Acid Research, 2022, De novo assembly of human genome at single-cell levels. 作者之前开发的一项名为 SMOOTH-seq 的技术,大致原理是:用 Tn5 转座子插入基因组DNA,使其随机片段化,然后用带有 barcode 的引物对片段进行链置换和扩增,再将双链末端分别连入一段序列以成环,进行滚环扩增,得到可供长读长测序的长片段,该长片段上带有多份原始序列片段,因而可以准确校正序列碱基。本文在此基础上进行了改进,使用 PacBio HiFi 和 Oxford Nanopore Technologies(ONT)两种测序平台,对 K562 和 HG002 两个细胞系进行单细胞测序。首次在单细胞水平上完成了具有高连续性的人类基因组组装。其结果包括:95 个 K562 细胞,总测序深度约37x(如果没理解错,应该每个细胞的测序深度为 37/95 = 0.4 x),NG50 约 2 Mb;30 个 HG002 细胞,每个细胞的测序深度约为 1G(相当于是 0.33x),NG50 约 1.3 Mb。按文章摘要的说法“开启了单细胞基因组从头组装实践的新篇章”。这个主题看似创新度很高,仔细推敲却不禁有些疑问:单细胞基因组测序很难区分不同类群细胞,因而应该只能在单细胞水平上分别进行组装,否则大量不同类群细胞混合起来组装,则又失去了原本的立意。但是,单个细胞的基因组覆盖度是不可能很全面的(文章提到平均覆盖率约是 41.7%,我猜提升测序数据量也未必对此会有大幅改善),这又很大程度上会限制组装本身,因而最终只能关注其中的结构变异鉴定结果。此外,单细胞基因组结果其实很难验证,很难用其他细胞的结果来评判当前被测细胞的结果是否准确,这应该也是一个逻辑上的硬伤。所以,最终这篇文章的贡献,除了两个细胞系的单细胞基因组测序数据本身外,大概主要还是在于实验方法摸索优化和技术方法建立吧,当然其数据分析方法过程也是值得参考的。
IF:16.600Q1
Nucleic acids research,
2022-07-22.
DOI: 10.1093/nar/gkac586
PMID: 35819189
PMCID:PMC9303314
人类基因组在单细胞水平上的从头组装
Abstract:
Genome assembly has been benefited from long-read sequencing technologies with higher accuracy and higher continuity. However, most human genome assembly require large amount of DNAs from homogeneous cell lines without …
>>>
Genome assembly has been benefited from long-read sequencing technologies with higher accuracy and higher continuity. However, most human genome assembly require large amount of DNAs from homogeneous cell lines without keeping cell heterogeneities, since cell heterogeneity could profoundly affect haplotype assembly results. Herein, using single-cell genome long-read sequencing technology (SMOOTH-seq), we have sequenced K562 and HG002 cells on PacBio HiFi and Oxford Nanopore Technologies (ONT) platforms and conducted de novo genome assembly. For the first time, we have completed the human genome assembly with high continuity (with NG50 of ∼2 Mb using 95 individual K562 cells) at single-cell levels, and explored the impact of different assemblers and sequencing strategies on genome assembly. With sequencing data from 30 diploid individual HG002 cells of relatively high genome coverage (average coverage ∼41.7%) on ONT platform, the NG50 can reach over 1.3 Mb. Furthermore, with the assembled genome from K562 single-cell dataset, more complete and accurate set of insertion events and complex structural variations could be identified. This study opened a new chapter on the practice of single-cell genome de novo assembly.
<<<
翻译
843.
李翛然
(2022-07-28 13:15):
#paper DOI: 10.1126/science.aba2374 Preventing Engrailed-1 activation in Mbrob- lasts yields wound regeneration without scarring 2021年4月份发表在Science的皮肤损伤修复靶点很有意思,号称不留伤疤,目前发现一个老药作用在这个靶点,没有其它药物进入临床,但是有其它抑制剂。那个老药Verteporfin是通过激光照射眼睛治疗眼部血管破裂的,不知道用于皮肤损伤新靶点的疗效会受制于老药靶点(假定靶点不同)。这个靶点已经成功地引起了我司注意
Abstract:
Skin scarring, the end result of adult wound healing, is detrimental to tissue form and function. lineage-positive fibroblasts (EPFs) are known to function in scarring, but lineage-negative fibroblasts (ENFs) remain …
>>>
Skin scarring, the end result of adult wound healing, is detrimental to tissue form and function. lineage-positive fibroblasts (EPFs) are known to function in scarring, but lineage-negative fibroblasts (ENFs) remain poorly characterized. Using cell transplantation and transgenic mouse models, we identified a dermal ENF subpopulation that gives rise to postnatally derived EPFs by activating expression during adult wound healing. By studying ENF responses to substrate mechanics, we found that mechanical tension drives activation via canonical mechanotransduction signaling. Finally, we showed that blocking mechanotransduction signaling with either verteporfin, an inhibitor of Yes-associated protein (YAP), or fibroblast-specific transgenic YAP knockout prevents activation and promotes wound regeneration by ENFs, with recovery of skin appendages, ultrastructure, and mechanical strength. This finding suggests that there are two possible outcomes to postnatal wound healing: a fibrotic response (EPF-mediated) and a regenerative response (ENF-mediated).
<<<
翻译
844.
muton
(2022-07-28 11:58):
#paper DOI: 10.1371/journal.pcbi.1009267 Unveiling functions of the visual cortex using task-specific deep neural networks.人类的视觉感知是一种复杂的认知能力,它是由大脑不同皮层区域控制调节的。然而目前这些区域的确切功能我们了解的仍不完全清楚,进而这些区域是如何协调视觉感知的也没有确切的答案。目前的观点认为视觉信息的转变过程是通过不同功能区域的层次化计算,通常我们概括为这些功能区域为腹侧和背侧视觉通路。无论是发现各个视觉皮层区域的确切功能还是利用计算建模的方法实现这种功能都是具有挑战性的,但也是我们的最终诉求。深度神经网络(DNNs)用于实现建模和预测视觉区域反应的一种较有前景的方法。本文通过比较不同视觉任务中的fMRI数据集与针对不同视觉任务优化过的DNN 模型子集的相关(作者选择了通过Taskonomy数据集训练的18个DNNs模型,这些模型分别对应于室内场景图片理解的18个不同任务的优化)发现了视觉信息沿腹侧和背侧视觉通路的结构化映射。低级视觉任务映射到早期视觉皮层,三维场景感知任务映射到背侧流,语义任务映射到腹侧流。文章的亮点可能就是通过模型和人脑实际数据相似性比较的方法能够得出哪些脑区贡献于哪些任务的这种思路。
Abstract:
The human visual cortex enables visual perception through a cascade of hierarchical computations in cortical regions with distinct functionalities. Here, we introduce an AI-driven approach to discover the functional mapping …
>>>
The human visual cortex enables visual perception through a cascade of hierarchical computations in cortical regions with distinct functionalities. Here, we introduce an AI-driven approach to discover the functional mapping of the visual cortex. We related human brain responses to scene images measured with functional MRI (fMRI) systematically to a diverse set of deep neural networks (DNNs) optimized to perform different scene perception tasks. We found a structured mapping between DNN tasks and brain regions along the ventral and dorsal visual streams. Low-level visual tasks mapped onto early brain regions, 3-dimensional scene perception tasks mapped onto the dorsal stream, and semantic tasks mapped onto the ventral stream. This mapping was of high fidelity, with more than 60% of the explainable variance in nine key regions being explained. Together, our results provide a novel functional mapping of the human visual cortex and demonstrate the power of the computational approach.
<<<
翻译
845.
前进
(2022-07-28 11:54):
#paper doi: 10.1109/TMI.2019.2953788 Transactions on Medical Imaging 2019
Progressively trained convolutional neural networks for deformable image registration
现有的基于深度学习的配准算法对存在大尺度变形的配准任务经常表现不佳。为了解决这种大尺度变形的问题,现有的方法主要分为两种:1、在配准前先采用传统的方法对图像进行预配准(affine,rigid)2、采用多个网络级联的方式,逐步变形,最终生成大尺度变形配准场。这两种方式都存在一定的弊端:1、传统方法耗时过长,削弱了利用深度学习进行后续配准的优势。2、级联网络在配准图像时,会对浮动图像进行多次插值,插值误差积累将会影响最后的变形场质量。因此论文作者提出只采用一个单独的网络联合渐进式训练方式来进行大尺度变形配准。渐进式训练方式首先是被用来提高GAN生成图像的分辨率,现被作者迁移用来解决配准问题。渐进式训练方式简单解释就是当网络的一层训练收敛以后,添加新层,再进行训练,直到生成最后的变形场。该论文有3点创新:
1、 提出了一个渐进式学习模型,能在同一个卷积网络内学习图像不同尺度的变形。
2、 证明了用神经网络配准两张图之前无需预配准。
3、 证明了神经网络可以采用合成的变形场进行监督训练,最后能够泛化解决实际配准问题。
IF:8.900Q1
IEEE transactions on medical imaging,
2020-05.
DOI: 10.1109/TMI.2019.2953788
PMID: 31751269
Abstract:
Deep learning-based methods for deformable image registration are attractive alternatives to conventional registration methods because of their short registration times. However, these methods often fail to estimate larger displacements in …
>>>
Deep learning-based methods for deformable image registration are attractive alternatives to conventional registration methods because of their short registration times. However, these methods often fail to estimate larger displacements in complex deformation fields, for which a multi-resolution strategy is required. In this article, we propose to train neural networks progressively to address this problem. Instead of training a large convolutional neural network on the registration task all at once, we initially train smaller versions of the network on lower resolution versions of the images and deformation fields. During training, we progressively expand the network with additional layers that are trained on higher resolution data. We show that this way of training allows a network to learn larger displacements without sacrificing registration accuracy and that the resulting network is less sensitive to large misregistrations compared to training the full network all at once. We generate a large number of ground truth example data by applying random synthetic transformations to a training set of images, and test the network on the problem of intrapatient lung CT registration. We analyze the learned representations in the progressively growing network to assess how the progressive learning strategy influences training. Finally, we show that a progressive training procedure leads to improved registration accuracy when learning large and complex deformations.
<<<
翻译
846.
芝麻
(2022-07-28 09:52):
#paper doi: 10.1016/j.tranon.2021.101016. Epub 2021 Jan 16. PMID: 33465745; PMCID: PMC7815805. Machine learning analysis using 77,044 genomic and transcriptomic profiles to accurately predict tumor type. Transl Oncol. 肿瘤转移是肿瘤患者的主要死亡威胁之一,而对一部分转移瘤患者,仅凭形态学观察无法确定肿瘤的原发部位,这样的转移瘤被临床称为原发灶不明转移瘤(Cancer of unknown primary, CUP)因为CUP具有较高的转移侵袭性,且没有可识别的起源部位,医生在选择治疗方案时会有的困扰,因此CUP的精准治疗是肿瘤临床的一个挑战。2021年,Jim Abraham 和同事在超过20000个癌症样本中,结合基因组突变和转录组表达特征两类数据进行基于机器学习的模型训练,并且先后尝试了超过300个不同的机器学习模型,最后在19555个样本的独立验证集中达到了97%的正确率
IF:4.500Q1
Translational oncology,
2021-Mar.
DOI: 10.1016/j.tranon.2021.101016
PMID: 33465745
PMCID:PMC7815805
Abstract:
Cancer of Unknown Primary (CUP) occurs in 3-5% of patients when standard histological diagnostic tests are unable to determine the origin of metastatic cancer. Typically, a CUP diagnosis is treated …
>>>
Cancer of Unknown Primary (CUP) occurs in 3-5% of patients when standard histological diagnostic tests are unable to determine the origin of metastatic cancer. Typically, a CUP diagnosis is treated empirically and has very poor outcomes, with median overall survival less than one year. Gene expression profiling alone has been used to identify the tissue of origin but struggles with low neoplastic percentage in metastatic sites which is where identification is often most needed. MI GPSai, a Genomic Prevalence Score, uses DNA sequencing and whole transcriptome data coupled with machine learning to aid in the diagnosis of cancer. The algorithm trained on genomic data from 34,352 cases and genomic and transcriptomic data from 23,137 cases and was validated on 19,555 cases. MI GPSai predicted the tumor type in the labeled data set with an accuracy of over 94% on 93% of cases while deliberating amongst 21 possible categories of cancer. When also considering the second highest prediction, the accuracy increases to 97%. Additionally, MI GPSai rendered a prediction for 71.7% of CUP cases. Pathologist evaluation of discrepancies between submitted diagnosis and MI GPSai predictions resulted in change of diagnosis in 41.3% of the time. MI GPSai provides clinically meaningful information in a large proportion of CUP cases and inclusion of MI GPSai in clinical routine could improve diagnostic fidelity. Moreover, all genomic markers essential for therapy selection are assessed in this assay, maximizing the clinical utility for patients within a single test.
<<<
翻译
847.
王昊
(2022-07-28 09:51):
#paper doi:10.48550/arXiv.2207.04630 Yi Ma, Doris Tsao, and Heung-Yeung Shum. 2022. On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence. 作者马毅数学功底很好,和做神经科学的Doris Tsao合作的一篇讲述他们认为的2个重要的AI基本原理的文章。本文提出了一个理解深度神经网络的新框架:压缩闭环转录,并回答了从数据中学习的目标是什么,如何衡量?(信息编码论)以及 如何通过高效和有效的计算实现这样的目标?(控制)这两个问题。提出理解AI的两个基本原理:简约性与自洽性。
arXiv,
2022.
DOI: 10.48550/arXiv.2207.04630
Abstract:
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in …
>>>
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that address two fundamental questions regarding Intelligence: what to learn and how to learn, respectively. We believe the two principles are the cornerstones for the emergence of Intelligence, artificial or natural. While these two principles have rich classical roots, we argue that they can be stated anew in entirely measurable and computable ways. More specifically, the two principles lead to an effective and efficient computational framework, compressive closed-loop transcription, that unifies and explains the evolution of modern deep networks and many artificial intelligence practices. While we mainly use modeling of visual data as an example, we believe the two principles will unify understanding of broad families of autonomous intelligent systems and provide a framework for understanding the brain.
<<<
翻译
848.
李欣
(2022-07-28 09:45):
#paperFertil Steril. 2022 Apr;117(4):792-800. doi: 10.1016/j.fertnstert.2021.12.025. Epub 2022 Jan 31. PMID: 35109980
在IVF周期中,子宫内膜厚度是常规测量的,子宫内膜薄与流产、异位妊娠、前置胎盘、低出生体重、以及其他产科并发症风险增加有关。既往研究表明,在鲜胚移植周期中,子宫内膜厚度的增加对妊娠结局的改善有帮助。冷冻胚胎移植周期中,子宫内膜厚度与IVF妊娠结局的关系不一致,也有研究认为FET周期中子宫内膜厚度不能预测活产率。因此,目前尚不清楚妊娠率和活产率是否在某一点趋于稳定,或者是否随着子宫内膜厚度的增加而继续上升。此外,FET与fresh ET的最佳子宫内膜厚度是否相同仍有待揭示。本研究探索了在鲜胚周期与冻胚周期中是否存在最合适的内膜厚度。
研究目的主要目的是确定在新鲜IVF-ET和FET周期中,是否存在活产率达到峰值的子宫内膜厚度,以及是否存在活产率下降的子宫内膜厚度。同时比较了患者年龄、胚胎期别及获卵数是否影响子宫内膜厚度与活产率。
纳入数据来自加拿大辅助生殖技术注册+(CARTR Plus)数据库,纳入2013年1月至2019年12月之间96760个自体周期。这包括43383个鲜胚周期和53377个冻胚周期。
研究性质回顾性队列研究,将冻胚与鲜胚周期分别进行分析,观察其合适的内膜厚度。鲜胚周期的内膜厚度记录的是扳机当天的,而在冷冻周期中,内膜厚度的记录主要是来自开始给孕酮之前或在LH峰或HCG扳机前的。
这是迄今为止该方向最大样本量的一项研究,比较了新鲜和冻融体外受精周期中子宫内膜厚度对活产率的影响。在鲜胚周期中,子宫内膜厚度增加与回收的卵母细胞平均数、雌二醇平均峰值水平和可用胚胎平均数显著增加有关,这可能导致内膜厚度与预后良好患者对于妊娠结局改善的混淆。新鲜和冷冻周期之间的“最佳”内膜厚度似乎存在差异,可能是由于控制性卵巢过度刺激(COH)对子宫内膜的影响导致的。
结论 在新鲜胚胎移植的周期中,活产率显著增加,直到子宫内膜厚度为10-12mm,而在FET周期中,活产率在内膜为7-10mm后趋于稳定。
Abstract:
OBJECTIVE: To study the effect of increasing endometrial thickness on live birth rates in fresh and frozen-thaw embryo transfer (FET) cycles.DESIGN: Retrospective cohort study.SETTING: National data from Autologous in vitro …
>>>
OBJECTIVE: To study the effect of increasing endometrial thickness on live birth rates in fresh and frozen-thaw embryo transfer (FET) cycles.DESIGN: Retrospective cohort study.SETTING: National data from Autologous in vitro fertilization (IVF) embryo transfer and FET cycles in Canada from the Canadian Assisted Reproductive Technology Registry Plus (CARTR Plus) database for records between January 2013 and December 2019.PATIENTS: Thirty-three Canadians clinics participated in voluntary reporting of IVF and pregnancy outcomes to the Canadian Assisted Reproductive Technology Registry Plus database, and a total of 43,383 fresh and 53,377 frozen transfers were included.INTERVENTION(S): None.MAIN OUTCOME MEASURE(S): Clinical pregnancy, pregnancy loss, and live birth rates.RESULTS: In fresh IVF-embryo transfer cycles, increasing endometrial thickness is associated with significant increases in the mean number of oocytes retrieved, peak estradiol levels, number of usable embryos, clinical pregnancy rates, live birth rates, and mean term singleton birth weights, and a decrease in pregnancy loss rates. However, live birth rates plateau after 10-12 mm. In contrast, in FET cycles live birth rates plateau after the endometrium measures 7-10 mm. The improvement in live birth rates with increasing endometrial thickness was independent of patient age, timing of embryo transfer (e.g., cleavage stage vs. blastocyst stage), or the number of oocytes at retrieval.CONCLUSIONS: In cycles with a fresh embryo transfer, live birth rates increase significantly until an endometrial thickness of 10-12 mm, while in FET cycles live birth rates plateau after 7-10 mm. However, an endometrial thickness <6 mm was associated clearly with a dramatic reduction in live birth rates in fresh and frozen embryo transfer cycles.
<<<
翻译
849.
颜林林
(2022-07-28 08:50):
#paper doi:10.1093/bioinformatics/btac137 Bioinformatics, 2022, BWA-MEME: BWA-MEM emulated with a machine learning approach. 看到李恒在Twitter上转发这篇文章,本以为大神又升级了bwa mem2,之后发现原来是他人的作品,得到了李恒钦点而已。作为某个知名软件的后继者,必然是要在某个方面有较大改进的,这篇的改进主要在性能。用于高通量测序数据的短序列比对算法,通常都是先用精确匹配种子(这几乎都是查表法在常数时间内完成),然后进行延伸匹配。而种子序列的长度选择,是一项比较有技巧性的事,太短可能导致重复匹配(hit)过多,太长则可能大量单词无匹配(在基因组上无该序列)却占据字典,导致字典过大。为此,过去也有一些算法,会采用变长种子来解决该问题(我也设想过这个策略,但惭愧的是,最终未能付诸实践)。而变长种子的策略,存在内存块大小不定、访问频繁等问题,会导致性能瓶颈。在本文中,通过机器学习的方法,在建立种子索引的阶段进行预处理,使得索引能够根据基因组序列数据进行适应,使不同长度种子的内存访问次数固定,从而获得性能提升。在最终的评测中,bwa-meme 能保持与 bwa-mem2 的输出相同,运行速度则提升了 3.45 倍。这篇文章的算法,可以再仔细深入学习下。
Abstract:
MOTIVATION: The growing use of next-generation sequencing and enlarged sequencing throughput require efficient short-read alignment, where seeding is one of the major performance bottlenecks. The key challenge in the seeding …
>>>
MOTIVATION: The growing use of next-generation sequencing and enlarged sequencing throughput require efficient short-read alignment, where seeding is one of the major performance bottlenecks. The key challenge in the seeding phase is searching for exact matches of substrings of short reads in the reference DNA sequence. Existing algorithms, however, present limitations in performance due to their frequent memory accesses.RESULTS: This article presents BWA-MEME, the first full-fledged short read alignment software that leverages learned indices for solving the exact match search problem for efficient seeding. BWA-MEME is a practical and efficient seeding algorithm based on a suffix array search algorithm that solves the challenges in utilizing learned indices for SMEM search which is extensively used in the seeding phase. Our evaluation shows that BWA-MEME achieves up to 3.45× speedup in seeding throughput over BWA-MEM2 by reducing the number of instructions by 4.60×, memory accesses by 8.77× and LLC misses by 2.21×, while ensuring the identical SAM output to BWA-MEM2.AVAILABILITY AND IMPLEMENTATION: The source code and test scripts are available for academic use at https://github.com/kaist-ina/BWA-MEME/.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
<<<
翻译
850.
徐炳祥
(2022-07-27 21:51):
#paper International Conference on Learning Representations, 2020, Hyper-SAGNN: a self-attention based graph neural network for hypergraphs. 对具有高阶连接的超图进行图表示学习是提取很多现实问题中有用模式的必经步骤,然而当前(2020)的超图表示学习算法均无法很好处理超边大小不一致的超图。本文作者基于自注意力思想设计了一种称为Hyper-SAGNN的图神经网络结构,很好的处理了有可变超边大小的超图网络学习问题。此网络架构首先使用一单层神经网络将输入特征映射为“静态嵌入”,然后使用Multi-heat attention结构将位于同一超边内的节点映射为“动态嵌入”,进而使用Hadamard积刻画“静态表示”和“动态表示”的相似性,结果传入一单层神经网络,最终预测超边存在的概率。模型在通用测试数据集上均有比当时通行模型更好的表现,同时在单细胞Hi-C数据的表示和细胞分类问题中也有上佳表现。2022年,他们在Nature biotechnology上发表了基于此网络结构的单细胞Hi-C数据表示方法Higashi(doi: 10.1038/s41587-021-01034-y)
IF:33.100Q1
Nature biotechnology,
2022-02.
DOI: 10.1038/s41587-021-01034-y
PMID: 34635838
PMCID:PMC8843812
Abstract:
Single-cell Hi-C (scHi-C) can identify cell-to-cell variability of three-dimensional (3D) chromatin organization, but the sparseness of measured interactions poses an analysis challenge. Here we report Higashi, an algorithm based on …
>>>
Single-cell Hi-C (scHi-C) can identify cell-to-cell variability of three-dimensional (3D) chromatin organization, but the sparseness of measured interactions poses an analysis challenge. Here we report Higashi, an algorithm based on hypergraph representation learning that can incorporate the latent correlations among single cells to enhance overall imputation of contact maps. Higashi outperforms existing methods for embedding and imputation of scHi-C data and is able to identify multiscale 3D genome features in single cells, such as compartmentalization and TAD-like domain boundaries, allowing refined delineation of their cell-to-cell variability. Moreover, Higashi can incorporate epigenomic signals jointly profiled in the same cell into the hypergraph representation learning framework, as compared to separate analysis of two modalities, leading to improved embeddings for single-nucleus methyl-3C data. In an scHi-C dataset from human prefrontal cortex, Higashi identifies connections between 3D genome features and cell-type-specific gene regulation. Higashi can also potentially be extended to analyze single-cell multiway chromatin interactions and other multimodal single-cell omics data.
<<<
翻译
851.
cellsarts
(2022-07-27 20:16):
利用内生信号肽序列进行苔藓小立碗藓细胞的快速生产和高效分泌外源的蛋白
BMC Biotechnol . 2005 Nov 7;5:30. doi: 10.1186/1472-6750-5-30.
Andreas Schaaf†3,Stefanie Tintelnot†1,Armin baur1,2, Ralf Reski1, Gilbert Gorr2和Eva L Decker*1
1Department of Plant Biotechnology, Faculty of Biology, University of Freiburg, Schaenzlestr. 1, 79104 Freiburg, Germany
摘要
背景:如何将重组蛋白高效的输送到合适的细胞器以实现高效靶向是利用植物系统进行重组蛋白生产的瓶颈之一。常用的做法是利用所要生产的外源蛋白的天然分泌信号肽(native signal peptide)。虽然分泌信号的一般特征在植物和动物之间是保守的,信号多肽之间的广泛序列变异性表明信号多肽识别效率不同。
结果:为了提高小立碗藓(Physcomitrella patens)原丝体生物反应器的分泌效率,我们定量比较了两种人类信号肽和最近分离的6种苔藓(小立碗藓)蛋白的信号肽的效率。因此,我们将不同信号融合到异源报告基因序列中瞬时转染苔藓细胞,并分别测定重组蛋白rhVEGF和GST在细胞外和细胞内的积累情况。我们的数据表明,与使用的两种人类信号肽相比,内源性苔藓信号肽的分泌效率高达五倍。
结论:从细胞外和细胞内重组蛋白的分布,我们认为信号识别粒子周期(SRP-cycle)期间的翻译抑制是人类信号细胞外积累减少的几种可能的解释中最可能的。在这项工作中,我们报告了在小立碗藓-生物反应器系统中,利用苔藓分泌信号的分泌量高于利用的重组蛋白的原有的信号肽。虽然这一效应的分子细节仍有待阐明,但我们的研究结果将有助于分子农业系统的改进。
Abstract:
Abstract Background Efficient targeting to appropriate cell organelles is one of the bottlenecks for the production of recombinant proteins in plant systems. A common practice is to use the native …
>>>
Abstract Background Efficient targeting to appropriate cell organelles is one of the bottlenecks for the production of recombinant proteins in plant systems. A common practice is to use the native secretory signal peptide of the heterologous protein to be produced. Though general features of secretion signals are conserved between plants and animals, the broad sequence variability among signal peptides suggests differing efficiency of signal peptide recognition. Results Aiming to improve secretion in moss bioreactors, we quantitatively compared the efficiency of two human signal peptides and six signals from recently isolated moss (Physcomitrella patens) proteins. We therefore used fusions of the different signals to heterologous reporter sequences for transient transfection of moss cells and measured the extra- and intracellular accumulation of the recombinant proteins rhVEGF and GST, respectively. Our data demonstrates an up to fivefold higher secretion efficiency with endogenous moss signals compared to the two utilised human signal peptides. Conclusion From the distribution of extra- and intracellular recombinant proteins, we suggest translational inhibition during the signal recognition particle-cycle (SRP-cycle) as the most probable of several possible explanations for the decreased extracellular accumulation with the human signals. In this work, we report on the supremacy of moss secretion signals over the utilised heterologous ones within the moss-bioreactor system. Though the molecular details of this effect remain to be elucidated, our results will contribute to the improvement of molecular farming systems.
<<<
翻译
852.
颜林林
(2022-07-26 23:37):
#paper doi:10.1002/jbio.202100389 Journal of Biophotonics, 2022, Skin's green autofluorescence at dorsal centremetacarpus may become a novel biomarker for diagnosis of lung cancer. 肿瘤早筛是当下最热门的研发方向之一,过热到都似乎开始裁员的地步,因为大家都在同质化地走类似的路线(如甲基化测序)。而这篇来自上海交大的文章,另辟蹊径地采取对皮肤的自发荧光进行检测的方法,尝试将其用于肺癌早期筛查和诊断。这是一种真正无创的新型检测方法,其原理在于皮肤表皮的棘层中,存在一种角蛋白分子,在蓝光照射下会发出荧光。而这种荧光的强度,又与疾病状态相关。本文研究中纳入了临床实际病例和异体移植的小鼠肿瘤模型,从肺部感染或健康对照中分别区分肺癌,AUC分别可达到 0.871 和 0.813,证明了这是一种潜在的生物标志物,可用于肺癌早期筛查和诊断。
Abstract:
It is critical to discover novel biomarkers of lung cancer for establishing economical technology for diagnosis of lung cancer. Our study has suggested that the autofluorescence (AF) of the skin …
>>>
It is critical to discover novel biomarkers of lung cancer for establishing economical technology for diagnosis of lung cancer. Our study has suggested that the autofluorescence (AF) of the skin may become a novel biomarker of this type: First, development of lung cancer led to a significant increase in the skin's green AF in a mouse model of lung cancer; second, lung cancer patients had significantly higher skin's green AF at certain positions compared with healthy volunteers and pulmonary infection patients; and third, using the skin's green AF intensity at dorsal centremetacarpus as the variable, the areas under curve (AUC) for differentiating lung cancer patients and pulmonary infection patients and for differentiating lung cancer patients and healthy volunteers was 0.871 and 0.813, respectively. Collectively, our study has indicated that the skin's green AF at dorsal centremetacarpus may become a novel biomarker for establishing a ground-breaking diagnostic strategy for lung cancer.
<<<
翻译
853.
半面阳光
(2022-07-26 14:25):
#paper DOI: 10.1073/pnas.2019768118, 2021 Feb 2;118(5):e2019768118. Genome-wide detection of cytosine methylation by single molecule real-time sequencing. 这篇文章并非一篇最新发表的文献,是香港中文大学卢煜明团队于2021年发表在PANS上一篇研究文献。因为近期在一个学术会议上听到卢煜明教授介绍了这篇文献有关的研究结果,因此拿来研读。这篇文章的核心内容是利用PacBio的SMRT三代测序技术和卷积神经网络来检测DNA的甲基化。胞嘧啶的甲基化修饰,5-Methylcytosine (5mC) 是表观修饰中最重要的一种类型。应用比较广泛的检测CpG测序方法是亚硫酸盐测序(BS-seq)。但是BS-seq有一些不足之处,比如亚硫酸盐会导致DNA降解、还会将DNA序列中非甲基化的胞嘧啶(C)转化为胸腺嘧啶(T),影响后续的比对;而原始序列中C->T的点突变则又无法被亚硫酸盐所修饰。因此,在这篇文献中,作者采用单分子实时测序(Single molecular rea-time sequencing, SMRT sequencing)技术,开发了一个直接检测5mC的方法。这个方法将SMRT测序中的两个关键信息作为输入数据,结合卷积神经网络(CNN)构建了一个称为Holistic Kinetic (HK)Model 的检测方法。关键输入数据包括两个:一是SMRT测序中DNA聚合酶的动态信号(包括单个碱基发出荧光信号的时间和两个连续碱基之间的间隔时间),二是“序列背景”信息,即待检测的一段固定长度的DNA序列信息,这段固定长度的序列被称为一个“检测窗口”。作者首先用全基因组扩增的方法构建了一个非甲基化的数据集(阴性数据集,所有序列几乎都没有甲基化),同时用M.SssI 转甲基酶处理DNA样本构建了一个甲基化(阳性数据集, M.SssI 能够对双链DNA上的所有CpG位点进行甲基化);接着从这两个数据集中各取出一半数据来训练卷积神经网络,剩下的数据用于验证HK model的检测效果。结果显示,用HK model来区分甲基化状态的AUC最高达到了0.97。全基因组范围内在单碱基分辨率水平上检测5mC的敏感性和特异性分别达到90%和94%。研究结果还发现通过调节检测窗口大小和测序深度能够改变HK模型的检测效果。为了平衡下游数据分析与准确性之间的关系,最后选定21nt作为检测窗口的默认值,将10×作为测序深度的默认值。后续,作者采用一段人和小鼠杂交序列验证了HK模型在检测“杂合甲基化”序列(即同一段序列中同时包括甲基化和非甲基化的CpG )的可行性。此外,作者还对BS-seq的检测效果和HK model的检测效果进行了简单的比较研究。看这篇文献的感受一方面是工作量大,二是体现了作者对分子生物学的理论知识和测序技术特点的充分理解和应用。另外,这篇文献的整体研究框架和卢煜明团队以往的研究在思维上有着一脉相承的感觉,都体现了透彻地理解基本理论、灵活地运用测序技术来解决临床检测的难题。
IF:9.400Q1
Proceedings of the National Academy of Sciences of the United States of America,
2021-02-02.
DOI: 10.1073/pnas.2019768118
PMID: 33495335
Abstract:
5-Methylcytosine (5mC) is an important type of epigenetic modification. Bisulfite sequencing (BS-seq) has limitations, such as severe DNA degradation. Using single molecule real-time sequencing, we developed a methodology to directly …
>>>
5-Methylcytosine (5mC) is an important type of epigenetic modification. Bisulfite sequencing (BS-seq) has limitations, such as severe DNA degradation. Using single molecule real-time sequencing, we developed a methodology to directly examine 5mC. This approach holistically examined kinetic signals of a DNA polymerase (including interpulse duration and pulse width) and sequence context for every nucleotide within a measurement window, termed the holistic kinetic (HK) model. The measurement window of each analyzed double-stranded DNA molecule comprised 21 nucleotides with a cytosine in a CpG site in the center. We used amplified DNA (unmethylated) and M.SssI-treated DNA (methylated) (M.SssI being a CpG methyltransferase) to train a convolutional neural network. The area under the curve for differentiating methylation states using such samples was up to 0.97. The sensitivity and specificity for genome-wide 5mC detection at single-base resolution reached 90% and 94%, respectively. The HK model was then tested on human-mouse hybrid fragments in which each member of the hybrid had a different methylation status. The model was also tested on human genomic DNA molecules extracted from various biological samples, such as buffy coat, placental, and tumoral tissues. The overall methylation levels deduced by the HK model were well correlated with those by BS-seq ( = 0.99; < 0.0001) and allowed the measurement of allele-specific methylation patterns in imprinted genes. Taken together, this methodology has provided a system for simultaneous genome-wide genetic and epigenetic analyses.
<<<
翻译
854.
颜林林
(2022-07-25 07:28):
#paper doi:10.1038/s41380-022-01661-0 Molecular Psychiatry, 2022, The serotonin theory of depression: a systematic umbrella review of the evidence. 这是一篇meta分析,而且还是一篇阴性结果的报道,按照很多“业内人”的观点,这样的“水文”是不屑一顾或羞于启齿的。本文研究血清素(serotonin,即5-羟色胺)是否与抑郁症病因有关。这是一个流行于大多数公众和专业研究人员的观点,人们普遍认为血清素降低与抑郁症有关。本文采取了“伞式”审查(umbrella review)方法,纳入多个不同领域对血清素系统进行的大量研究,以便为结论提供可及的最高证据等级支持。涵盖的六个领域分别是:(1) 血清素及其代谢物5-HIAA(5-羟吲哚乙酸)是否在抑郁症患者体液中含量更低;(2) 抑郁症患者的血清素受体是否表达水平更低;(3) 血清素转运蛋白(SERT)是否抑郁症患者中表达更高;(4) 色氨酸(5-羟色胺的前体)耗竭是否会导致抑郁症;(5) 抑郁症患者的 SERT 基因是否表达更高;(6) 抑郁症患者的SERT基因与压力之间是否存在相互作用。本文研究在 PROSPERO 注册(CRD42020207203),共纳入 17 项研究:12 项系统评价和meta分析(systematic reviews and meta-analyses),1 项协作meta分析(collaborative meta-analysis),1 项大型队列研究的meta分析(meta-analysis of large cohort studies),1 项系统评价和综述(systematic review and narrative synthesis),1 项遗传关联研究(genetic association study)和 1 项伞式审查(umbrella review)。最终在六个领域问题上,分别以各自可及的最大样本量(从数百到数万),否定了血清素活性标志物与抑郁症之间的关联,并建议“it is time to acknowledge that the serotonin theory of depression is not empirically substantiated(是时候承认抑郁症的血清素理论并没有经验实证)”。可见,能够明确下一个阴性结论(否定结论),也是相当不容易的。
Abstract:
The serotonin hypothesis of depression is still influential. We aimed to synthesise and evaluate evidence on whether depression is associated with lowered serotonin concentration or activity in a systematic umbrella …
>>>
The serotonin hypothesis of depression is still influential. We aimed to synthesise and evaluate evidence on whether depression is associated with lowered serotonin concentration or activity in a systematic umbrella review of the principal relevant areas of research. PubMed, EMBASE and PsycINFO were searched using terms appropriate to each area of research, from their inception until December 2020. Systematic reviews, meta-analyses and large data-set analyses in the following areas were identified: serotonin and serotonin metabolite, 5-HIAA, concentrations in body fluids; serotonin 5-HT receptor binding; serotonin transporter (SERT) levels measured by imaging or at post-mortem; tryptophan depletion studies; SERT gene associations and SERT gene-environment interactions. Studies of depression associated with physical conditions and specific subtypes of depression (e.g. bipolar depression) were excluded. Two independent reviewers extracted the data and assessed the quality of included studies using the AMSTAR-2, an adapted AMSTAR-2, or the STREGA for a large genetic study. The certainty of study results was assessed using a modified version of the GRADE. We did not synthesise results of individual meta-analyses because they included overlapping studies. The review was registered with PROSPERO (CRD42020207203). 17 studies were included: 12 systematic reviews and meta-analyses, 1 collaborative meta-analysis, 1 meta-analysis of large cohort studies, 1 systematic review and narrative synthesis, 1 genetic association study and 1 umbrella review. Quality of reviews was variable with some genetic studies of high quality. Two meta-analyses of overlapping studies examining the serotonin metabolite, 5-HIAA, showed no association with depression (largest n = 1002). One meta-analysis of cohort studies of plasma serotonin showed no relationship with depression, and evidence that lowered serotonin concentration was associated with antidepressant use (n = 1869). Two meta-analyses of overlapping studies examining the 5-HT receptor (largest n = 561), and three meta-analyses of overlapping studies examining SERT binding (largest n = 1845) showed weak and inconsistent evidence of reduced binding in some areas, which would be consistent with increased synaptic availability of serotonin in people with depression, if this was the original, causal abnormaly. However, effects of prior antidepressant use were not reliably excluded. One meta-analysis of tryptophan depletion studies found no effect in most healthy volunteers (n = 566), but weak evidence of an effect in those with a family history of depression (n = 75). Another systematic review (n = 342) and a sample of ten subsequent studies (n = 407) found no effect in volunteers. No systematic review of tryptophan depletion studies has been performed since 2007. The two largest and highest quality studies of the SERT gene, one genetic association study (n = 115,257) and one collaborative meta-analysis (n = 43,165), revealed no evidence of an association with depression, or of an interaction between genotype, stress and depression. The main areas of serotonin research provide no consistent evidence of there being an association between serotonin and depression, and no support for the hypothesis that depression is caused by lowered serotonin activity or concentrations. Some evidence was consistent with the possibility that long-term antidepressant use reduces serotonin concentration.
<<<
翻译
855.
白义民
(2022-07-24 17:54):
#paper 《宁玛派龙钦巴研究》,龙钦巴是藏密大圆满教法ati-yoga的集大成者,这篇文章从其修行历程传记和他的著述两部分对龙钦巴做了概略的介绍,与一般的语焉不详的密宗表述不同,这篇博士论文从民族宗教学的学术角度比较准确,直白的概论了大圆满教法。对大圆满修行感兴趣的人而言,在学术性文章的指引下,可以避免少走弯路。
856.
颜林林
(2022-07-24 05:55):
#paper doi:10.1186/s12864-022-08762-8 BMC Genomics, 2022, Poly(a) selection introduces bias and undue noise in direct RNA-sequencing. 全转录组测序实验中,在初始的RNA提取环节后,经常会使用poly-A筛选方法,来富集mRNA。本文使用ONT平台,开展直接RNA测序(direct RNA-sequencing),并对同一样本,平行地采取使用和不适用poly-A筛选的方法。最终结果说明,省略该环节是合适的,虽然这么做可能轻微降低文库复杂度,但它能更有效避免该筛选环节带来的其他弊端,如需要更多RNA起始量、容易倾向地筛选出具有更长poly-A尾巴的mRNA、会导致差异表达基因也受到影响而更不稳定等。
Abstract:
BACKGROUND: Genome-wide RNA-sequencing technologies are increasingly critical to a wide variety of diagnostic and research applications. RNA-seq users often first enrich for mRNA, with the most popular enrichment method being …
>>>
BACKGROUND: Genome-wide RNA-sequencing technologies are increasingly critical to a wide variety of diagnostic and research applications. RNA-seq users often first enrich for mRNA, with the most popular enrichment method being poly(A) selection. In many applications it is well-known that poly(A) selection biases the view of the transcriptome by selecting for longer tailed mRNA species.RESULTS: Here, we show that poly(A) selection biases Oxford Nanopore direct RNA sequencing. As expected, poly(A) selection skews sequenced mRNAs toward longer poly(A) tail lengths. Interestingly, we identify a population of mRNAs (> 10% of genes' mRNAs) that are inconsistently captured by poly(A) selection due to highly variable poly(A) tails, and demonstrate this phenomenon in our hands and in published data. Importantly, we show poly(A) selection is dispensable for Oxford Nanopore's direct RNA-seq technique, and demonstrate successful library construction without poly(A) selection, with decreased input, and without loss of quality.CONCLUSIONS: Our work expands the utility of direct RNA-seq by validating the use of total RNA as input, and demonstrates important technical artifacts from poly(A) selection that inconsistently skew mRNA expression and poly(A) tail length measurements.
<<<
翻译
857.
颜林林
(2022-07-23 22:05):
#paper doi:10.1101/2022.07.21.500999 bioRxiv, 2022, High-resolution de novo structure prediction from primary sequence. 这篇预发表的文章,开发了一个工具,OmegaFold,可以基于单个蛋白的一级序列信息,预测三级结构。现在主流的方法,都需要依赖演化信息,即通过多序列比对作为辅助,进行蛋白质折叠结构的预测。而本文认为,蛋白从被翻译合成出来后,就会经历从一级序列自动折叠成为三级结构,因而这些演化信息对于结构预测而言并非必要。本文采取的深度模型,会依赖于一组预训练模型,帮助识别出一级序列中哪些氨基酸更为重要(即赋予不同的注意力),并采取基于BERT的语言模型技术,帮助进行蛋白质折叠的模型训练。最终实现的方法,可以有效解决孤儿蛋白(即当前结构数据库中缺乏其他可供参考的相近蛋白)的结构预测问题,且与AlphaFold等工具相比,在准确度上又有显著提升。
bioRxiv,
2022.
DOI: 10.1101/2022.07.21.500999
Abstract:
Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such …
>>>
Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such as with orphan proteins and fast-evolving proteins like antibodies, and a protein typically folds in a natural setting from its primary amino acid sequence into its three-dimensional structure, suggesting that evolutionary information and MSAs should not be necessary to predict a protein's folded form. Here, we introduce OmegaFold, the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures. OmegaFold enables accurate predictions on orphan proteins that do not belong to any functionally characterized protein family and antibodies that tend to have noisy MSAs due to fast evolution. Our study fills a much-needed structure prediction gap and brings us a step closer to understanding protein folding in nature.
<<<
翻译
858.
颜林林
(2022-07-22 00:00):
#paper doi:10.1056/NEJMe2207902 The New England Journal of Medicine, 2022, Setting the Benchmark for KRAS(G12C)-Mutated NSCLC. 这是一篇社论(Editorial),介绍了该期杂志上关于KRYSTAL-1二期临床试验的结果报道(doi:10.1056/NEJMoa2204619)。该临床试验的主角,是一种KRAS G12C抑制剂,阿达格拉西布(Adagrasib),其在此次临床试验中表现不错,对经过化疗与免疫治疗的携带KRAS G12C突变的患者,生存评估的指标(ORR、PFS和OS等),与此前另一个获批药物,索托拉西布(sotorasib)非常接近。这篇社论由此推测,这两个药物在机制上可能存在很大的重叠。此外,两个药物在代谢和动力学方面的差异(如穿越血脑屏障、在体内的半衰期等),则又为两个药物未来在选用时可采取的差异化,提供了方向提示。
Abstract:
No abstract available.
859.
颜林林
(2022-07-21 00:29):
#paper doi:10.1186/s13059-022-02726-7 Genome Biology, 2022, Integration of single-cell multi-omics data by regression analysis on unpaired observations. 受技术条件限制,绝大多数的单细胞多组学研究,其实都很难在同一细胞上同时检测多个不同组学。本文针对这个问题,基于“相似表达的靶基因的调控基因也相似”的直观认识和假设,采用回归分析方法,对scRNA-seq和ATAC-seq数据之间的关系进行关联和推断,使非配对的scRNA-seq和ATAC-seq实验(即并非同一细胞,而是在不同细胞上分别开展了这两项检测)中,可以通过其中一项数据(如ATAC-seq的染色质开放信息)去推断对应的被调控基因的表达。该方法在模拟数据和实测数据上进行评估,可以达到很高的准确度(与eQTL mapping进行对比,结果高度一致)。这为更好利用当前积累的大量非配对单细胞数据,提供了方法学上的支持。
IF:10.100Q1
Genome biology,
2022-07-19.
DOI: 10.1186/s13059-022-02726-7
PMID: 35854350
PMCID:PMC9295346
通过对未配对观察值的回归分析整合单细胞多组学数据
Abstract:
Despite recent developments, it is hard to profile all multi-omics single-cell data modalities on the same cell. Thus, huge amounts of single-cell genomics data of unpaired observations on different cells …
>>>
Despite recent developments, it is hard to profile all multi-omics single-cell data modalities on the same cell. Thus, huge amounts of single-cell genomics data of unpaired observations on different cells are generated. We propose a method named UnpairReg for the regression analysis on unpaired observations to integrate single-cell multi-omics data. On real and simulated data, UnpairReg provides an accurate estimation of cell gene expression where only chromatin accessibility data is available. The cis-regulatory network inferred from UnpairReg is highly consistent with eQTL mapping. UnpairReg improves cell type identification accuracy by joint analysis of single-cell gene expression and chromatin accessibility data.
<<<
翻译
860.
颜林林
(2022-07-20 07:49):
#paper doi:10.1101/2022.07.17.500374 bioRxiv, 2022, Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations. 这是一个“认真地做一件小事”的例子。在做基因组分析时,我们经常遭遇“究竟该用hg19还是hg38”的纠结,有时候不得不并行地分别使用两个参考基因组来进行两次差不多的分析,以避免由于使用liftOver之类的基因组坐标转换工具带来的信息丢失。这篇文章针对这个小小的(甚至不那么常见的)痛点,在兼容现有VCF格式的情况下,使其在同一个结果文件中带上两套基因组坐标,不仅不影响现有工具的使用,而且可以随时从中进行所需基因组坐标的提取。想法很简单,实现也不难,但却的确是有效解决了某些实际操作的问题。
bioRxiv,
2022.
DOI: 10.1101/2022.07.17.500374
Abstract:
We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented …
>>>
We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented in the Genozip platform, DVCF enables bioinformatics pipelines to seamlessly operate across two coordinate systems by leveraging the system most advantageous to each pipeline step, simplifying bioinformatics workflows and reducing file generation and associated data storage burden. Moreover, our benchmarking of Genozip DVCF shows that it produces more complete, less erroneous, and less biased translations across coordinate systems than two widely used alternative tools (i.e., LiftoverVcf and CrossMap).
<<<
翻译