当前共找到 1063 篇文献分享,本页显示第 841 - 860 篇。
841.
钟鸣 (2022-07-18 02:00):
#paper doi:10.1073/pnas.0404172101 Proc Natl Acad Sci U S A, 2004, Genomic analysis of Bacteroides fragilis reveals extensive DNA inversions regulating cell surface adaptation 脆弱拟杆菌(BF)和多形拟杆菌(BT)都是人体肠道内的细菌,但生态位以及致病性不同:BF附着于黏膜表面,具有致病,是致病力最强的拟杆菌;BT存在于结肠内,无致病性。这里,作者比较了二者的基因组,侧重于基因组倒位。 作者首先比较了荚膜基因座,因为荚膜是BF的主要毒力因子。比较发现,BF中有9个荚膜基因座,BT中则是7个。拟杆菌的荚膜基因座前的启动子序列两侧有IR(反转重复序列),IR的存在使得启动子在某些情况下发生翻转,翻转后的启动子序列随即失效,进而导致下游基因不被转录和表达,拟杆菌借此机制产生不同类型的荚膜(相变),以逃避宿主的免疫杀伤。对BT和BF荚膜基因座的启动子分析表明,BF的9个荚膜基因座都是可以翻转的,且翻转都是由丝氨酸型重组酶mpi介导的,这表明BF的9个荚膜基因座的翻转是全局调控的。而BT的7个荚膜基因座中只有4个是可翻转的,而且是分别被4个不同的酪氨酸型重组酶介导的,这与BF形成鲜明对比。 BF的荚膜多糖可以诱导形成脓肿,这种毒性离不开荚膜多糖重复单元中带正电荷的游离氨基和负电荷基团的存在。分析发现BF中有4种荚膜基因座可以产生同时带有游离氨基和负电荷集团的荚膜多糖,而BT中只有1种,这种差异可能与BF具有更高的毒力有关。 作者还预测了荚膜基因座之外的区域的翻转事件,在BF中鉴定了31个可翻转的区域并通过PCR证实了。其中代表性的是SusC/SusD(同源)基因,其产物定位于细菌表面,将环境中的淀粉多糖分解成单糖并转运到细菌内部提供营养。SusC/SusD(同源)基因的可变表达受7种倒位机制调节,这是已鉴定的全部倒位调节机制类型。作者使用图示简洁形象的解释了这7种调节机制,十分复杂且有趣。 总而言之,BF中广泛存在的DNA倒位调节了细菌的毒力调控和营养利用系统等生物功能的调控,且不同致病性物种间的机制有异同。
Abstract:
Bacteroides are predominant human colonic commensals, but the principal pathogenic species, Bacteroides fragilis (BF), lives closely associated with the mucosal surface, whereas a second major species, Bacteroides thetaiotaomicron (BT), concentrates … >>>
Bacteroides are predominant human colonic commensals, but the principal pathogenic species, Bacteroides fragilis (BF), lives closely associated with the mucosal surface, whereas a second major species, Bacteroides thetaiotaomicron (BT), concentrates within the colon. We find corresponding differences in their genomes, based on determination of the genome sequence of BF and comparative analysis with BT. Both species have acquired two mechanisms that contribute to their dominance among the colonic microbiota: an exceptional capability to use a wide range of dietary polysaccharides by gene amplification and the capacity to create variable surface antigenicities by multiple DNA inversion systems. However, the gene amplification for polysaccharide assimilation is more developed in BT, in keeping with its internal localization. In contrast, external antigenic structures can be changed more systematically in BF. Thereby, at the mucosal surface, where microbes encounter continuous attack by host defenses, BF evasion of the immune system is favored, and its colonization and infectious potential are increased. <<<
翻译
842.
钟鸣 (2022-07-17 02:20):
#paper doi:10.1128/JB.00933-10 J Bacteriol, 2011, Adhesive Activity of the Haemophilus Cryptic Genospecies Cha Autotransporter Is Modulated by Variation in Tandem Peptide Repeats 嗜血杆菌隐匿基因种(当时被认为是流感嗜血杆菌的一个型,现在已被命名为昆蒂尼嗜血杆菌)定植于女性生殖道,分娩时可感染胎儿并引发临床症状。Cha是存在于该物种中的保守的基因,编码三聚体自转运黏附素(TAA),促进该菌定植。TAA家族高度模块化,由N端信号肽、中间的乘客区和C端的外膜区组成。N端信号肽帮助TAA转运到周质区,在周质区,C末端三聚化并插入外膜,从而促进中间乘客结构域暴露于膜外,行使粘附功能。作者此前发现不同菌株间Cha基因长度略有不同,且Cha基因内部有84bp片段重复,他们猜想该片段的重复次数导致Cha基因长度不同,于是他们在不同菌株间使用South Blot验证了这一猜想。与此同时,他们发现拥有不同长度Cha的菌株,粘附效率也不同。他们猜想Cha蛋白的长度可能与粘附效率有关。于是作者分离了不同Cha长度的突变体,这些突变体重的84bp片段重复0次到100次不等。细胞粘附实验表明,84bp片段重复次数越多,粘附效率就越低。为了明确粘附效率的改变是否因Cha表达效率而变化,他们使用qPCR检查了不同突变体间Cha的mRNA含量,结果表明没有差异,这说明Cha的粘附效率不是蛋白丰度介导的,而是蛋白结构介导的。 此时,作者使用电镜观察了不同Cha长度的菌株的Cha蛋白形态,他们发现84bp片段重复数越多,则Cha越长,这间接证明了Cha结构变化影响细菌粘附效率。 接下来,为了探索Cha基因发挥粘附功能的结构域,作者对Cha基因做了不同长度的截短,构建了另一批突变株。对这些突变株的粘附效率进行测试,他们发现Cha的N端400aa的区域是Cha发挥粘附作用的充要条件,而84bp重复片段对Cha的粘附不是必须的。(注意,在这里测试粘附效率的方法是管沉降实验,即观察细菌在液体培养基中培养多长时间后沉底。因为有研究表明细菌的黏附素不仅介导细菌-宿主的粘附,也介导细菌-细菌间的粘附。) 最后,作者想知道Cha介导细菌-细菌粘附的机制具体是如何实现的,是不同菌株间Cha蛋白相互结合,还是Cha蛋白与另一个细菌的另一个蛋白结合?为此,他们分别测试了低粘附率菌株和高低粘附率菌株共培养这两种情况下的菌落聚集情况(菌落聚集也是本文用用于测量细菌间粘附能力的实验),结果表明Cha基因是自缔合的。 本研究通过一系列具有不同长度Cha基因的突变体,证明了84bp片段的重复数导致Cha基因长度不同,进而导致细菌粘附能力不同,且Cha越长,粘附能力越差。
IF:2.700Q3 Journal of bacteriology, 2011-Jan. DOI: 10.1128/JB.00933-10 PMID: 21037000
Abstract:
The Haemophilus cryptic genospecies is an important cause of maternal genital tract and neonatal systemic infections and initiates infection by colonizing the genital or respiratory epithelium. In recent work, we … >>>
The Haemophilus cryptic genospecies is an important cause of maternal genital tract and neonatal systemic infections and initiates infection by colonizing the genital or respiratory epithelium. In recent work, we identified a unique Haemophilus cryptic genospecies protein called Cha, which mediates efficient adherence to genital and respiratory epithelia. The Cha adhesin belongs to the trimeric autotransporter family and contains an N-terminal signal peptide, an internal passenger domain that harbors adhesive activity, and a C-terminal membrane anchor domain. The passenger domain in Cha contains clusters of YadA-like head domains and neck motifs as well as a series of tandem 28-amino-acid peptide repeats. In the current study, we report that variation in peptide repeat number gradually modulates Cha adhesive activity, associated with a direct effect on the length of Cha fibers on the bacterial cell surface. The N-terminal 404 residues of the Cha passenger domain mediate binding to host cells and also facilitate bacterial aggregation through intermolecular Cha-Cha binding. As the tandem peptide repeats expand, the Cha fiber becomes longer and Cha adherence activity decreases. The expansion and contraction of peptide repeats represent a novel mechanism for modulating adhesive capacity, potentially balancing the need of the organism to colonize the genital and respiratory tracts with the ability to attach to alternative substrates, disperse within the host, or evade the host immune system. <<<
翻译
843.
钟鸣 (2022-07-16 15:00):
#paper doi:10.7717/peerj.12272 PeerJ, 2021, Helicobacter pylori virulence factors: relationship between genetic variability and phylogeographic origin。幽门螺杆菌妇孺皆知,其感染全世界一半以上的人,并与慢性胃炎和胃癌相关。众所周知细菌致病有赖于毒力因子,在这里作者从135个全基因组中提取了幽门螺杆菌的7类、87个毒力因子(VF)并比较。 他们从4个角度做比较了毒力因子基因:拷贝数、基因大小(长度)、共线性、相似性,并根据保守性将87个毒力因子划分成了3类:高度保守、中度保守和低度保守。脲酶是高度保守VFs的代表,通过把尿素代谢成氨来中和胃酸,为细菌在胃中的存活提供便利。典型的低度保守毒力因子是黏附素,体现为高水平的重组,主要是基因倒位。基因倒位可能引发“位置效应,从而影响基因的表达。但是不同于其他基因,黏附素基因的倒位与致病表型/地理起源没有相关性。 作者还根据87个毒力因子的相似性将135个基因组划分成了a、b、c、d共4个单系群,a组更易引起胃炎和消化性溃疡,d组更易引发患胃癌和胃淋巴瘤。b组主要来源中东,c组主要来源于东亚。本文分析思路简单明确,分析结果为幽门螺杆菌的致病基因进化提供了富有洞察力的见解。 此外,作者的分析还表明,约34%的基因在基因组自动注释时被错误注释,这与早期上传的基因组注释结果不够准确、但又作为参考数据对后来的基因组注释,引起了错误的传播和渗透(以讹传讹),因此他们建议原核生物的基因注释应使用半手动的方式。
IF:2.300Q2 PeerJ, 2021. DOI: 10.7717/peerj.12272 PMID: 34900406
Abstract:
BACKGROUND: Helicobacter pylori is a pathogenic bacteria that colonize the gastrointestinal tract from human stomachs and causes diseases including gastritis, peptic ulcers, gastric lymphoma (MALT), and gastric cancer, with a … >>>
BACKGROUND: Helicobacter pylori is a pathogenic bacteria that colonize the gastrointestinal tract from human stomachs and causes diseases including gastritis, peptic ulcers, gastric lymphoma (MALT), and gastric cancer, with a higher prevalence in developing countries. Its high genetic diversity among strains is caused by a high mutation rate, observing virulence factors (VFs) variations in different geographic lineages. This study aimed to postulate the genetic variability associated with virulence factors present in the Helicobacter pylori strains, to identify the relationship of these genes with their phylogeographic origin.METHODS: The complete genomes of 135 strains available in NCBI, from different population origins, were analyzed using bioinformatics tools, identifying a high rate; as well as reorganization events in 87 virulence factor genes, divided into seven functional groups, to determine changes in position, number of copies, nucleotide identity and size, contrasting them with their geographical lineage and pathogenic phenotype.RESULTS: Bioinformatics analyses show a high rate of gene annotation errors in VF. Analysis of genetic variability of VFs shown that there is not a direct relationship between the reorganization and geographic lineage. However, regarding the pathogenic phenotype demonstrated in the analysis of many copies, size, and similarity when dividing the strains that possess and not the cag pathogenicity island (cagPAI), having a higher risk of developing gastritis and peptic ulcer was evidenced. Our data has shown that the analysis of the overall genetic variability of all VFs present in each strain of H. pylori is key information in understanding its pathogenic behavior. <<<
翻译
844.
颜林林 (2022-07-15 00:05):
#paper doi:10.3390/ijms23137446 International Journal of Molecular Sciences, 2022, Identification of Spliceogenic Variants beyond Canonical GT-AG Splice Sites in Hereditary Cancer Genes. 位于外显子边界附近的点突变,可能会影响基因表达的剪接形式,这在遗传病诊断和咨询过程中,是重要的信息。然而,大多数情况下,这类突变只能通过既往报道和计算工具预测来进行判定,而在美国医学遗传学和基因组学学会和分子病理学协会(ACMG/AMP)变异分类指南中,计算方法得到的结果,通常只能作为意义不确定的突变(VUS)。本文研究纳入了732例携带此类潜在可能影响RNA剪接的VUS突变的患者,涉及APC、ATM、FH、LZTR1、MSH6、PALB2、RAD51C和TP53基因,采用多重PCR方法,在RNA水平上进行了检测,以验证这些VUS所造成的影响。对于检测结果,本文逐一进行了生物学功能的分析与解读,以确定相应突变是否致病。最终对50%的VUS突变重新进行了分类,25%降级成为可能良性,25%升级成为可能致病。
Abstract:
Pathogenic/likely pathogenic variants in susceptibility genes that interrupt RNA splicing are a well-documented mechanism of hereditary cancer syndromes development. However, if RNA studies are not performed, most of the variants … >>>
Pathogenic/likely pathogenic variants in susceptibility genes that interrupt RNA splicing are a well-documented mechanism of hereditary cancer syndromes development. However, if RNA studies are not performed, most of the variants beyond the canonical GT-AG splice site are characterized as variants of uncertain significance (VUS). To decrease the VUS burden, we have bioinformatically evaluated all novel VUS detected in 732 consecutive patients tested in the routine genetic counseling process. Twelve VUS that were predicted to cause splicing defects were selected for mRNA analysis. Here, we report a functional characterization of 12 variants located beyond the first two intronic nucleotides using RNAseq in , , , , , , , and genes. Based on the analysis of mRNA, we have successfully reclassified 50% of investigated variants. 25% of variants were downgraded to likely benign, whereas 25% were upgraded to likely pathogenic leading to improved clinical management of the patient and the family members. <<<
翻译
845.
颜林林 (2022-07-14 21:57):
#paper doi:10.1126/science.abl9283 Science, 2022, Substitution mutational signatures in whole-genome–sequenced cancers in the UK population. 这篇今年四月发表在《Science》上的文章,被最新一期《Cancer Cell》所推荐(doi:10.1016/j.ccell.2022.05.011)。这些年做大规模人群做全基因组测序(WGS)的文章并不少见,时至今日仍能发表于顶刊,其创新点及意义,大概还是值得关注和了解下的。本文的入组病例样本来自Genomics England (GEL) 100,000 Genomes Project (100kGP),共计12,222个肿瘤样本(来自11,585位个体)的WGS,在分析得到与肿瘤发生的突变特征后,又在另外两个大型独立队列(来自国际癌症基因组联盟 (ICGC) 的 3001 例原发性癌症和来自 Hartwig 医学基金会的 3417 例转移性癌症)中进行了验证。本文重点关注由WGS分析得到的单碱基替换 (SBS) 和双碱基替换 (DBS) 特征,并建立了一个名为 Signature Fit Multi-Step (FitMS) 的计算框架。该方法用来区分哪些特征是各不同癌种中常见的,而哪些是罕见的、仅出现在特定癌种或器官。而通过对组织特异性特征进行聚类分析,并将其组合起来形成一组参考特征,帮助进行机制和病因的解释。从所解决的问题及方法看,似乎并无特别重大的创新,因此初步推断,之所以能跻身顶刊,与其超大人群及数据量,以及相应的工作量(参见长达94页的补充材料),还是密不可分的。
Abstract:
Whole-genome sequencing (WGS) permits comprehensive cancer genome analyses, revealing mutational signatures, imprints of DNA damage and repair processes that have arisen in each patient's cancer. We performed mutational signature analyses … >>>
Whole-genome sequencing (WGS) permits comprehensive cancer genome analyses, revealing mutational signatures, imprints of DNA damage and repair processes that have arisen in each patient's cancer. We performed mutational signature analyses on 12,222 WGS tumor-normal matched pairs, from patients recruited via the UK National Health Service. We contrasted our results to two independent cancer WGS datasets, the International Cancer Genome Consortium (ICGC) and Hartwig Foundation, involving 18,640 WGS cancers in total. Our analyses add 40 single and 18 double substitution signatures to the current mutational signature tally. Critically, we show for each organ, that cancers have a limited number of 'common' signatures and a long tail of 'rare' signatures. We provide a practical solution for utilizing this concept of common versus rare signatures in future analyses. <<<
翻译
846.
颜林林 (2022-07-13 00:46):
#paper doi:10.1093/bib/bbac221 Briefings in Bioinformatics, 2022, A comprehensive benchmarking of WGS-based deletion structural variant callers. 这是一篇工具比较的方法学文章,针对基于全基因组测序数据鉴定结构变异(SV,structural variant)的工具,而且仅限定缺失(deletion)类型的SV。文章使用了瓶中基因组(genome-in-a-bottle)的结构变异集合,以及经PCR实验进行过验证的小鼠模型的结构变异集合,作为金标准,以便准确计算出每个工具的灵敏度、特异度等性能指标。评价结果反映了过去类似工作的表现:不同工具的表现之间的确差异很大,也确有一些工具在平衡灵敏度和特异度时表现不错。最终文章给出了相应的建议,即针对不同长度的缺失类型结构变异,相应推荐使用的工具。本文中规中矩,做得也算细致。比较有意思的是,在SV工具选择时的吐槽:排除需要配对样本的工具、排除只能检测很小片段变异的工具、排除仅支持长读长测序数据的工具,最终筛选出61个合适的工具,然而测试只使用了15或14个(分别针对小鼠和人的数据),只因为:其他工具都装不上!我个人也深有同感,姑且不说那些不舍得开放源码提供他人使用者,即使开源的,很多工具也并不容易被正常使用起来,需要阅读其源码并手工debug才能用起来的工具,并不罕见。
IF:6.800Q1 Briefings in bioinformatics, 2022-07-18. DOI: 10.1093/bib/bbac221 PMID: 35753701 PMCID:PMC9294411
Abstract:
Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora … >>>
Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories. <<<
翻译
847.
颜林林 (2022-07-12 00:03):
#paper doi:10.1016/j.gpb.2022.04.009 Genomics, Proteomics & Bioinformatics, 2022, N6-methyladenosine and Its Implications in Viruses. 这是一篇关于m6A的综述。m6A是哺乳动物的mRNA上最常见的碱基修饰,而本文侧重于与病毒相关的m6A修饰的研究。这篇综述先概述了m6A的基本知识,包括m6A修饰碱基的占比及分布、进行m6A修饰或去修饰的调控蛋白,以及m6A在生物体中发挥的功能(如影响mRNA剪接、出核、翻译、降解等)。然后,又从技术角度,介绍检测该m6A修饰的不同实验方法。之后,进入正题,叙述这些年在各类病毒上开展的m6A相关研究,涉及SV40、乙肝、疱疹、HIV、丙肝、寨卡、登革热和新冠等病毒。从这些综述结果,可以看到m6A参与了各种各样的生物学活动。而在不同病毒中,m6A有时甚至行使着完全相反的功能。可见m6A更像是涉及底层机制过程的存在,而由它在基因调控网络中所处的时空位置不同,展示出不同的功能,而且,似乎万事都与之相关。m6A是近几年的研究热点,各类与之相关的数据挖掘层出不穷,大概也与这种“底层”且“普遍”的特性相关。对m6A的深入研究,有助于了解它对病毒复制等生命周期过程的影响,并为开发治疗病毒性疾病的药物提供基础研究支持,这很符合当前疫情时代之所需。
N6-甲基腺苷及其在病毒中的意义
Abstract:
N-methyladenine (mA) is the most abundant RNA modification in mammalian messenger RNAs (mRNAs), which participates in and regulates many important biological activities, such as tissue development and stem cell differentiation. … >>>
N-methyladenine (mA) is the most abundant RNA modification in mammalian messenger RNAs (mRNAs), which participates in and regulates many important biological activities, such as tissue development and stem cell differentiation. Due to an improved understanding of mA, researchers have discovered that the biological function of mA can be linked to many stages of mRNA metabolism and that mA can regulate a variety of complex biological processes. In addition to its location on mammalian mRNAs, mA has been identified on viral transcripts. mA also plays important roles in the life cycle of many viruses and in viral replication in host cells. In this review, we briefly introduce the detection methods of mA, the mA-related proteins, and the functions of mA. We also summarize the effects of mA-related proteins on viral replication and infection. We hope that this review provides researchers with some insights for elucidating the complex mechanisms of the epitranscriptome related to viruses, and provides information for further study of the mechanisms of other modified nucleobases acting on processes such as viral replication. We also anticipate that this review can stimulate collaborative research from different fields, such as chemistry, biology, and medicine, and promote the development of antiviral drugs and vaccines. <<<
翻译
N-甲基腺嘌呤 (mA) 是哺乳动物信使 RNA (mRNA) 中最丰富的 RNA 修饰,参与并调节许多重要的生物活动,如组织发育和干细胞分化。由于对 mA 的理解有所提高,研究人员发现 mA 的生物学功能可以与 mRNA 代谢的许多阶段相关联,并且 mA 可以调节各种复杂的生物过程。除了在哺乳动物 mRNA 上的位置外,mA 还在病毒转录本上被发现。mA在许多病毒的生命周期和病毒在宿主细胞中的复制中也起着重要作用。本文简要介绍了mA的检测方法、mA相关蛋白以及mA的功能。我们还总结了mA相关蛋白对病毒复制和感染的影响。我们希望这篇综述能为研究人员提供一些见解,以阐明与病毒相关的表观转录组的复杂机制,并为进一步研究其他修饰的核碱基作用于病毒复制等过程的机制提供信息。我们还预计,这篇综述可以促进化学、生物学和医学等不同领域的合作研究,并促进抗病毒药物和疫苗的开发。
848.
钟鸣 (2022-07-11 12:18):
#paper DOI: 10.1128/IAI.00963-06 Infection and immunity, 2007, Analysis of Bartonella adhesin A expression reveals differences between various B. henselae strains. 汉氏巴尔通体的BadA基因编码分子量340kDa的黏附素,是该物种重要的毒力因子。奇怪的是,在体外多次传代后,这个基因就不表达了。为探索可能的机制,作者分析了5株菌的BadA基因序列及启动子区域。他们发现,BadA的N端和C端是相同的,启动子区域也是相同的,他们认为BadA的表达缺失不是终止突变造成的,也不是启动子区域突变造成的,他们认为存在其他调控方式。
Abstract:
Bartonella henselae causes cat scratch disease and the vasculoproliferative disorders bacillary angiomatosis and peliosis hepatis in humans. One of the best known pathogenicity factors of B. henselae is Bartonella adhesin … >>>
Bartonella henselae causes cat scratch disease and the vasculoproliferative disorders bacillary angiomatosis and peliosis hepatis in humans. One of the best known pathogenicity factors of B. henselae is Bartonella adhesin A (BadA), which is modularly constructed, consisting of head, neck/stalk, and membrane anchor domains. BadA is important for the adhesion of B. henselae to extracellular-matrix proteins and endothelial cells (ECs). In this study, we analyzed different B. henselae strains for BadA expression, autoagglutination, fibronectin (Fn) binding, and adhesion to ECs. We found that the B. henselae strains Marseille, ATCC 49882, Freiburg 96BK3 (FR96BK3), FR96BK38, and G-5436 express BadA. Remarkably, BadA expression was lacking in a B. henselae ATCC 49882 variant, in strains ATCC 49793 and Berlin-1, and in the majority of bacteria of strain Berlin-2. Adherence of B. henselae to ECs and Fn reliably correlated with BadA expression. badA was present in all tested strains, although the length of the gene varied significantly due to length variations of the stalk region. Sequencing of the promoter, head, and membrane anchor regions revealed only minor differences that did not correlate with BadA expression, apart from strain Berlin-1, in which a 1-bp deletion led to a frameshift in the head region of BadA. Our data suggest that, apart from the identified genetic modifications (frameshift deletion and recombination), other so-far-unknown regulatory mechanisms influence BadA expression. Because of variations between and within different B. henselae isolates, BadA expression should be analyzed before performing infection experiments with B. henselae. <<<
翻译
849.
颜林林 (2022-07-11 00:41):
#paper doi:10.1101/2022.07.09.499321 bioRxiv, 2022, A Draft Human Pangenome Reference. 这应该又是一篇重磅文章,在bioRxiv上提前预发表出来。三十多家顶级单位合作,作者名单即使在使用“Human Pangenome Reference Consortium”做了浓缩后依然很长,包含不少让人熟知的名字,他们在过去这些年里曾反复出现在基因组学的各重磅文章中,比如其中就包含李恒这位大神,他赫然是通讯作者之一。全文篇幅长达97页(不含另外39页的补充材料),也反映出这项工作的体量重大。众所周知,我们一直在使用的人类参考基因组,其实来自最早的七八个人,他们的基因组,对于全人类的基因库而言,是很难相信有足够代表性的。于是这些年来,随着大量基因组数据的积累,参考基因组一直在更新迭代,打了一个又一个补丁。这篇文章所提出的“泛基因组参考(pangenome reference)”可以被认为是又一个重大改进和新版本发布,甚至可能这是接近“一劳永逸”的关键改进。它整合了多达47个个体基因组,这些个体基因组完成了定相位(phased)和二倍体组装(diploid assemblies)。且通过先前诸如HapMap、千人基因组等人类群体基因组研究的积累,确定了这47个个体的基因组差异足够大,能够涵盖超过 99% 的预期序列,并且在结构和碱基对水平上的准确率超过 99%。超长的篇幅中,详细展示了这套新参考基因组的完整构建过程,甚至精确到详细的命令行及参数,是非常值得仔细学习的。
Abstract:
The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover … >>>
The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence and are more than 99% accurate at the structural and base-pair levels. Based on alignments of the assemblies, we generated a draft pangenome that captures known variants and haplotypes, reveals novel alleles at structurally complex loci, and adds 119 million base pairs of euchromatic polymorphic sequence and 1,529 gene duplications relative to the existing reference, GRCh38. Roughly 90 million of the additional base pairs derive from structural variation. Using our draft pangenome to analyze short-read data reduces errors when discovering small variants by 34% and boosts the detected structural variants per haplotype by 104% compared to GRCh38-based workflows, and by 34% compared to using previous diversity sets of genome assemblies. <<<
翻译
850.
大象城南 (2022-07-10 09:35):
#paper doi:10.1016/j.ymeth.2022.06.001 Methods, 2022. Structural and functional connectivity abnormalities of the default mode network in patients with Alzheimer's disease and mild cognitive impairment within two independent阿尔茨海默病 (AD) 是一种以进行性痴呆为特征的慢性神经退行性疾病,遗忘性轻度认知障碍 (aMCI) 已被定义为正常衰老和 AD 之间的过渡阶段。越来越多的证据表明,默认模式网络 (DMN) 中改变的功能连接 (FC) 和结构连接 (SC) 是 AD 的突出标志。然而,DMN 的 SC 和 FC 变化之间的关系尚不清楚。在本研究中,我们利用功能性磁共振成像 (fMRI) 和弥散加权成像 (DWI) 数据导出了 DMN 的 FC 和 SC 矩阵,并在 120 名参与者(39 名正常对照)的发现数据集中进一步评估了 FC 和 SC 异常, 34 名 aMCI 患者和 47 名 AD 患者), 以及 122 名参与者(43 名正常对照、37 名 aMCI 患者和 42 名 AD 患者)的复制数据集。在发现数据集中的 aMCI 和 AD 组患者的 DMN 成分(例如,后扣带皮层 (PCC)、内侧前额叶皮层 (mPFC) 和海马)中发现了中断的 SC 和 FC;在复制数据集中也发现了大部分中断的连接。更重要的是,一些 SC 和 FC 元素与 aMCI 和 AD 患者的认知能力显着相关。此外,我们发现 aMCI 和 AD 组患者的 PCC 和右侧海马体之间存在结构-功能脱钩。这些关于神经退行性队列中 DMN 连接性改变的发现加深了我们对 AD 病理生理机制的理解。37 名 aMCI 患者和 42 名 AD 患者)。在发现数据集中的 aMCI 和 AD 组患者的 DMN 成分(例如,后扣带皮层 (PCC)、内侧前额叶皮层 (mPFC) 和海马)中发现了中断的 SC 和 FC;在复制数据集中也发现了大部分中断的连接。更重要的是,一些 SC 和 FC 元素与 aMCI 和 AD 患者的认知能力显着相关。此外,我们发现 aMCI 和 AD 组患者的 PCC 和右侧海马体之间存在结构-功能脱钩。这些关于神经退行性队列中 DMN 连接性改变的发现加深了我们对 AD 病理生理机制的理解。37 名 aMCI 患者和 42 名 AD 患者)。在发现数据集中的 aMCI 和 AD 组患者的 DMN 成分(例如,后扣带皮层 (PCC)、内侧前额叶皮层 (mPFC) 和海马)中发现了中断的 SC 和 FC;在复制数据集中也发现了大部分中断的连接。更重要的是,一些 SC 和 FC 元素与 aMCI 和 AD 患者的认知能力显着相关。此外,我们发现 aMCI 和 AD 组患者的 PCC 和右侧海马体之间存在结构-功能脱钩。这些关于神经退行性队列中 DMN 连接性改变的发现加深了我们对 AD 病理生理机制的理解。发现数据集中 aMCI 和 AD 组患者的后扣带皮层 (PCC)、内侧前额叶皮层 (mPFC) 和海马体;在复制数据集中也发现了大部分中断的连接。更重要的是,一些 SC 和 FC 元素与 aMCI 和 AD 患者的认知能力显着相关。此外,我们发现 aMCI 和 AD 组患者的 PCC 和右侧海马体之间存在结构-功能脱钩。这些关于神经退行性队列中 DMN 连接性改变的发现加深了我们对 AD 病理生理机制的理解。发现数据集中 aMCI 和 AD 组患者的后扣带皮层 (PCC)、内侧前额叶皮层 (mPFC) 和海马体;在复制数据集中也发现了大部分中断的连接。更重要的是,一些 SC 和 FC 元素与 aMCI 和 AD 患者的认知能力显着相关。此外,我们发现 aMCI 和 AD 组患者的 PCC 和右侧海马体之间存在结构-功能脱钩。这些关于神经退行性队列中 DMN 连接性改变的发现加深了我们对 AD 病理生理机制的理解。在复制数据集中也发现了大部分中断的连接。更重要的是,一些 SC 和 FC 元素与 aMCI 和 AD 患者的认知能力显着相关。此外,我们发现 aMCI 和 AD 组患者的 PCC 和右侧海马体之间存在结构-功能脱钩。这些关于神经退行性队列中 DMN 连接性改变的发现加深了我们对 AD 病理生理机制的理解。在复制数据集中也发现了大部分中断的连接。更重要的是,一些 SC 和 FC 元素与 aMCI 和 AD 患者的认知能力显着相关。此外,我们发现 aMCI 和 AD 组患者的 PCC 和右侧海马体之间存在结构-功能脱钩。这些关于神经退行性队列中 DMN 连接性改变的发现加深了我们对 AD 病理生理机制的理解。
Abstract:
Alzheimer's disease (AD) is a chronic neurodegenerative disease characterized by progressive dementia, and amnestic mild cognitive impairment (aMCI) has been defined as a transitional stage between normal aging and AD. … >>>
Alzheimer's disease (AD) is a chronic neurodegenerative disease characterized by progressive dementia, and amnestic mild cognitive impairment (aMCI) has been defined as a transitional stage between normal aging and AD. Accumulating evidence has shown that altered functional connectivity (FC) and structural connectivity (SC) in the default mode network (DMN) is the prominent hallmarks of AD. However, the relationship between the changes in SC and FC of the DMN is not yet clear. In the present study, we derived the FC and SC matrices of the DMN with functional magnetic resonance imaging (fMRI) and diffusion-weighted imaging (DWI) data and further assessed FC and SC abnormalities within a discovery dataset of 120 participants (39 normal controls, 34 patients with aMCI and 47 patients with AD), as well as a replication dataset of 122 participants (43 normal controls, 37 patients with aMCI and 42 patients with AD). Disrupted SC and FC were found among DMN components (e.g., the posterior cingulate cortex (PCC), medial prefrontal cortex (mPFC), and hippocampus) in patients in the aMCI and AD groups in the discovery dataset; most of the disrupted connections were also identified in the replication dataset. More importantly, some SC and FC elements were significantly correlated with the cognitive ability of patients with aMCI and AD. In addition, we found structural-functional decoupling between the PCC and the right hippocampus in patients in the aMCI and AD groups. These findings of the alteration of DMN connectivity in neurodegenerative cohorts deepen our understanding of the pathophysiological mechanisms of AD. <<<
翻译
851.
大象城南 (2022-07-10 09:29):
#paper doi:10.1111/epi.17320 Epilepsia, 2022. Development and validation of machine learning models for prediction of seizure outcome after pediatric epilepsy surgery. 小儿癫痫手术后报告的癫痫发作结果存在很大差异,并且缺乏可以评估术后癫痫发作自由概率的个体化预测工具。本研究的目的是开发和验证用于预测小儿癫痫手术后无癫痫发作的监督机器学习 (ML) 模型。这是一项针对在北美五个儿科癫痫中心接受癫痫手术的儿童的多中心回顾性研究。收集临床信息、诊断调查和手术特征,并将其用作预测术后 1 年无癫痫发作结果的特征。数据集被随机分成 80% 的训练数据和 20% 的测试数据。使用 10 倍交叉验证模型开发,在训练队列上评估了 5 个特征集和 7 个 ML 分类器的 35 个组合。在测试队列中评估 ML 分类器和特征集的最佳组合的性能,并与经典统计方法逻辑回归进行比较。在纳入的 801 名患者中,61.3% 的患者术后 1 年无癫痫发作。在模型开发过程中,最佳组合是 XGBoost ML 算法,它具有来自单变量特征集的五个特征,包括抗癫痫药物数量、磁共振成像病变、癫痫发作年龄、视频脑电图一致性和手术类型,平均面积低于0.73 的曲线 (AUC)(95% 置信区间 [CI] = .69–.77)。然后在测试队列上评估 XGBoost 和单变量特征集的组合并达到 0.74 的 AUC(95% CI = .66–.82;敏感性 = .87,95% CI = .81–.94;特异性 = .58, 95% CI = .47–.71)。XGBoost 模型优于逻辑回归模型(AUC = .72, 95% CI = .63–.80;敏感性 = .72, 95% CI = .63–.82;特异性 = .66, 95% CI = .53 –.77) 在测试队列 (p  = .005)。本研究确定了重要特征并验证了用于预测小儿癫痫手术后无癫痫发作概率的 ML 算法 XGBoost。改善癫痫手术的预后对于术前咨询至关重要,并将为治疗决策提供信息。
IF:6.600Q1 Epilepsia, 2022-08. DOI: 10.1111/epi.17320 PMID: 35661152
Abstract:
OBJECTIVE: There is substantial variability in reported seizure outcome following pediatric epilepsy surgery, and lack of individualized predictive tools that could evaluate the probability of seizure freedom postsurgery. The aim … >>>
OBJECTIVE: There is substantial variability in reported seizure outcome following pediatric epilepsy surgery, and lack of individualized predictive tools that could evaluate the probability of seizure freedom postsurgery. The aim of this study was to develop and validate a supervised machine learning (ML) model for predicting seizure freedom after pediatric epilepsy surgery.METHODS: This is a multicenter retrospective study of children who underwent epilepsy surgery at five pediatric epilepsy centers in North America. Clinical information, diagnostic investigations, and surgical characteristics were collected, and used as features to predict seizure-free outcome 1 year after surgery. The dataset was split randomly into 80% training and 20% testing data. Thirty-five combinations of five feature sets with seven ML classifiers were assessed on the training cohort using 10-fold cross-validation for model development. The performance of the optimal combination of ML classifier and feature set was evaluated in the testing cohort, and compared with logistic regression, a classical statistical approach.RESULTS: Of the 801 patients included, 61.3% were seizure-free 1 year postsurgery. During model development, the best combination was XGBoost ML algorithm with five features from the univariate feature set, including number of antiseizure medications, magnetic resonance imaging lesion, age at seizure onset, video-electroencephalography concordance, and surgery type, with a mean area under the curve (AUC) of .73 (95% confidence interval [CI] = .69-.77). The combination of XGBoost and univariate feature set was then evaluated on the testing cohort and achieved an AUC of .74 (95% CI = .66-.82; sensitivity = .87, 95% CI = .81-.94; specificity = .58, 95% CI = .47-.71). The XGBoost model outperformed the logistic regression model (AUC = .72, 95% CI = .63-.80; sensitivity = .72, 95% CI = .63-.82; specificity = .66, 95% CI = .53-.77) in the testing cohort (p = .005).SIGNIFICANCE: This study identified important features and validated an ML algorithm, XGBoost, for predicting the probability of seizure freedom after pediatric epilepsy surgery. Improved prognostication of epilepsy surgery is critical for presurgical counseling and will inform treatment decisions. <<<
翻译
852.
颜林林 (2022-07-10 09:00):
#paper doi:10.1109/TR.2022.3171220 IEEE Transactions on Reliability, 2022, Detecting C++ Compiler Front-End Bugs via Grammar Mutation and Differential Testing. 这篇来自大连理工大学的文章,设计了一套名为CCoft的软件框架,用以自动识别C++编译器前端部分的bug。编译器的内部结构,通常按流程分为两部分,前端和后端,前端是从C++源代码识别语义、并将其转化为中间语言的阶段,后端则是根据中间语言生成机器代码的步骤。本文仅针对前端部分。本文的框架,首先将C++语法转换为一种结构化格式,然后使用“突变”的方式,来生成大批量的各种C++代码,其中包括符合语法的,也包括不符合语法的,目的是覆盖尽可能多的代码场景,用以挑战C++编译器,看编译器是否能够符合预期地进行处理。之后,将代码丢给编译器,根据编译器的输出信息,评判是否得到了正确处理,从而识别出一系列软件bug,包括:错误拒绝了合法代码、错误接受了不合法代码、代码语义处理错误、代码编译执行崩溃、代码编译时间过长而超时等。通过使用主流编译器GCC和Clang进行测试,在三个月内找到了136个编译器bug,对比市面上主流的工具,有大幅提升。
Abstract:
C++ is a widely used programming language and the C++ front-end is a critical part of a C++ compiler. Although many techniques have been proposed to test compilers, few studies … >>>
C++ is a widely used programming language and the C++ front-end is a critical part of a C++ compiler. Although many techniques have been proposed to test compilers, few studies are devoted to detecting bugs in C++ compiler. In this study, we take the first step to detect bugs in C++ compiler front-ends. To do so, two main challenges need to be addressed, namely, the acquisition of test programs that are more likely to trigger bugs in compiler front-ends and the bug identification from complicated compiler outputs. In this article, we propose a novel framework named Ccoft to detect bugs in C++ compiler front-ends. To address the first challenge, Ccoft implements a practical program generator. The generator first transforms C++ grammars into a flexible structured format and then utilizes an equal-chance selection (ECS) strategy to conduct structure-aware grammar mutation to generate diverse C++ programs. Next, Ccoft employs a set of differential testing strategies to identify various kinds of bugs in C++ compiler front-ends by comparing complex outputs emitted by C++ compilers, thus tackling the second challenge. Empirical evaluation results over two mainstream compilers (i.e., GCC and Clang) show that Ccoft greatly improves two state-of-the-art approaches (i.e., Dharma and Grammarinator) by 135% and 111% in terms of the numbers of detected bugs, respectively. By running Ccoft for three months, we have successfully reported 136 bugs for two C++ compilers, of which 78 (57 confirmed, assigned, or fixed) for GCC and 58 (10 confirmed or fixed) for Clang. <<<
翻译
853.
颜林林 (2022-07-09 07:36):
#paper doi:10.1186/s13073-022-01079-x Genome Medicine, 2022, Identification of a cytokine-dominated immunosuppressive class in squamous cell lung carcinoma with implications for immunotherapy resistance. 这是一篇纯数据挖掘的文章,试图回答肺鳞癌中免疫检查点抑制剂耐药的机制问题。文章通过收集了来自TCGA和GEO的624例肺鳞癌转录组数据,使用无监督聚类,从中识别出与 T 细胞衰竭特征、免疫抑制细胞、临床特征和免疫治疗反应相关的表达模式,并定义了一组衰竭免疫等级 (EIC) 的免疫抑制患者。这些患者占到28%至36%,尽管他们表现出高密度的肿瘤浸润淋巴细胞,却因显著富集、高比例的免疫抑制细胞、多个免疫检查点基因同时上调等特性,表现出对ICB的耐药性。相应的表达特征,在具有 ICB 治疗抗性的黑色素瘤患者中也得到印证。文章还检查了基因组和表观组的数据,发现这些患者呈现出较低的染色体突变负担和独特的甲基化模式。由此,作者还建立了一个在线网站,整合了用到的数据及分析方法,供研究人员使用多组学数据分析来研究 ICB 耐药性的潜在关联。从分析方法看,这篇文章的套路应该是比较常见的,算不上有什么创新性,不过在单病种上整合数据,并以在线网站的形式来使分析过程能够泛化并提供他人使用,也算是一类可行的生信“原创”工作吧。
IF:10.400Q1 Genome medicine, 2022-07-08. DOI: 10.1186/s13073-022-01079-x PMID: 35799269
Abstract:
BACKGROUND: Immune checkpoint blockade (ICB) therapy has revolutionized the treatment of lung squamous cell carcinoma (LUSC). However, a significant proportion of patients with high tumour PD-L1 expression remain resistant to … >>>
BACKGROUND: Immune checkpoint blockade (ICB) therapy has revolutionized the treatment of lung squamous cell carcinoma (LUSC). However, a significant proportion of patients with high tumour PD-L1 expression remain resistant to immune checkpoint inhibitors. To understand the underlying resistance mechanisms, characterization of the immunosuppressive tumour microenvironment and identification of biomarkers to predict resistance in patients are urgently needed.METHODS: Our study retrospectively analysed RNA sequencing data of 624 LUSC samples. We analysed gene expression patterns from tumour microenvironment by unsupervised clustering. We correlated the expression patterns with a set of T cell exhaustion signatures, immunosuppressive cells, clinical characteristics, and immunotherapeutic responses. Internal and external testing datasets were used to validate the presence of exhausted immune status.RESULTS: Approximately 28 to 36% of LUSC patients were found to exhibit significant enrichments of T cell exhaustion signatures, high fraction of immunosuppressive cells (M2 macrophage and CD4 Treg), co-upregulation of 9 inhibitory checkpoints (CTLA4, PDCD1, LAG3, BTLA, TIGIT, HAVCR2, IDO1, SIGLEC7, and VISTA), and enhanced expression of anti-inflammatory cytokines (e.g. TGFβ and CCL18). We defined this immunosuppressive group of patients as exhausted immune class (EIC). Although EIC showed a high density of tumour-infiltrating lymphocytes, these were associated with poor prognosis. EIC had relatively elevated PD-L1 expression, but showed potential resistance to ICB therapy. The signature of 167 genes for EIC prediction was significantly enriched in melanoma patients with ICB therapy resistance. EIC was characterized by a lower chromosomal alteration burden and a unique methylation pattern. We developed a web application ( http://lilab2.sysu.edu.cn/tex & http://liwzlab.cn/tex ) for researchers to further investigate potential association of ICB resistance based on our multi-omics analysis data.CONCLUSIONS: We introduced a novel LUSC immunosuppressive class which expressed high PD-L1 but showed potential resistance to ICB therapy. This comprehensive characterization of immunosuppressive tumour microenvironment in LUSC provided new insights for further exploration of resistance mechanisms and optimization of immunotherapy strategies. <<<
翻译
854.
颜林林 (2022-07-08 07:19):
#paper doi:10.1038/s41540-022-00233-w npj Systems Biology and Applications, Adaptive coding for DNA storage with high storage density and low coverage. 基于生物大分子(如DNA)实现大规模数据存储功能,是我个人比较感兴趣的方向之一。这几年在这个领域突然涌现了许多优秀文章,这可能与高通量测序技术发展,以及相关的合成生物学的进步有关。这篇来自大连理工的文章,也正是这样一个案例。本文提出了一种自适应编码DNA存储系统,针对不同的编码区域位置采用不同的编码方案,将 698 KB 大小的图像、视频和 PDF 文件存储在 DNA 中,之后又将其无损地解码还原为原始数据。相比过去同类工作,本文在编码数据过程中,更细致地设计了各种DNA分子特性及约束,使在保持碱基平衡和避免非特异性杂交的同时,能在尽量低测序深度下,对测序错误的噪声进行容错。将原始内容打散并接上索引片段,从而使所存储的内容可以通过特异性扩增并测序的方式进行随机读取。比较可惜的是,本文只做了理论上的模拟和探讨,尚未开展实际的DNA合成和测序,这大大削弱了文章的说服力。
Abstract:
The rapid development of information technology has generated substantial data, which urgently requires new storage media and storage methods. DNA, as a storage medium with high density, high durability, and … >>>
The rapid development of information technology has generated substantial data, which urgently requires new storage media and storage methods. DNA, as a storage medium with high density, high durability, and ultra-long storage time characteristics, is promising as a potential solution. However, DNA storage is still in its infancy and suffers from low space utilization of DNA strands, high read coverage, and poor coding coupling. Therefore, in this work, an adaptive coding DNA storage system is proposed to use different coding schemes for different coding region locations, and the method of adaptively generating coding constraint thresholds is used to optimize at the system level to ensure the efficient operation of each link. Images, videos, and PDF files of size 698 KB were stored in DNA using adaptive coding algorithms. The data were sequenced and losslessly decoded into raw data. Compared with previous work, the DNA storage system implemented by adaptive coding proposed in this paper has high storage density and low read coverage, which promotes the development of carbon-based storage systems. <<<
翻译
855.
颜林林 (2022-07-07 07:41):
#paper doi:10.1186/s13059-022-02699-7 Genome Biology, 2022, Storing and analyzing a genome on a blockchain. 好几年前,我就听很多人说起,想把区块链技术用于基因组相关的应用,然而,后来各种结局惨淡,似乎都没了下文。在币圈跌跌不休一片哀嚎的最近,竟然《Genome Biology》上会发表出这么一篇文章,也真是神奇和亮眼。这篇来自耶鲁的文章,其全文和源码都是开放访问的,值得对区块链技术感兴趣的朋友仔细一读。文章设想了一个由测序仪、所有者、临床医生和研究人员组成的网络,每个人都参与同步 VCFchain 或 SAMchain,以此来形成分布式的数据共享,且数据分析过程也穿插在链的延伸过程中。在区块链有限的额外字节存储中,保存巨大的基因组数据,也确实需要一些技巧(如数据拆分和查询时的重新组合)加以实现,这篇文章也确实因此做了一些工作。但整体上还是有一种“为了区块链而区块链”的感觉。权限的管理和不容篡改可能是其特点和优势,但并未在文章中充分呈现,这与此前分享过的提及区块链技术的另外两篇文章有所不同(那两篇文章的DOI分别是:10.1038/s41591-022-01768-5 和 10.1038/s41586-021-03583-3,分别发表在 Nature Medicine 和 Nature,它们更多是AI算法及数据分享价值),而本文的重点还是在于区块链相关的程序实现细节。有这篇做铺垫,说不定类似文章后续真能冲击NBT呢。
IF:10.100Q1 Genome biology, 2022-06-29. DOI: 10.1186/s13059-022-02699-7 PMID: 35765079 PMCID:PMC9241283
Abstract:
There are major efforts underway to make genome sequencing a routine part of clinical practice. A critical barrier to these is achieving practical solutions for data ownership and integrity. Blockchain … >>>
There are major efforts underway to make genome sequencing a routine part of clinical practice. A critical barrier to these is achieving practical solutions for data ownership and integrity. Blockchain provides solutions to these challenges in other realms, such as finance. However, its use in genomics is stymied due to the difficulty in storing large-scale data on-chain, slow transaction speeds, and limitations on querying. To overcome these roadblocks, we developed a private blockchain network to store genomic variants and reference-aligned reads on-chain. It uses nested database indexing with an accompanying tool suite to rapidly access and analyze the data. <<<
翻译
856.
颜林林 (2022-07-06 00:02):
#paper doi:10.1186/s12864-022-08717-z BMC Genomics, 2022, The effects of sequencing depth on the assembly of coding and noncoding transcripts in the human genome. 众所周知,测序深度会影响其数据的分析结果。然而,到底影响多大,怎么影响的,往往视研究目的和研究对象而定,得具体分析,也值得研究。这篇文章,就是在系统研究测序深度对转录组数据的转录本组装的影响。文章纳入了来自150个人类干细胞样本的不同细胞组织的RNA-seq数据,除了短读长平台外,还包括四个PacBio平台的长读长数据。其中有两个样本还测了高达200M reads的NGS数据量,于是可以用它们来抽取不同比例数据,以模拟不同的测序数据量。分析结果表明,编码转录本与非编码转录本之间存在差异,前者随着测序深度增加而迅速进入饱和,后者在所分析的数据中则几乎始终未达到饱和。这可能与两者的组装难度有关。此外,长读长信息有助于含有转座元件的转录本组装。比较有意思的是单细胞RNA-seq(scRNA-seq),其非编码转录本的表达水平低,是由于表达细胞较少,而在表达的细胞中,非编码转录本的表达水平其实与编码转录本相似,这个现象的发现得益于长读长测序平台,因此文章得出结论是长读长测序更适合scRNA-seq。但我个人多少还是怀疑这些结论很可能与分析评估方法有关,也许值得重复下这篇文章的分析过程。
IF:3.500Q2 BMC genomics, 2022-Jul-04. DOI: 10.1186/s12864-022-08717-z PMID: 35787153
Abstract:
Investigating the functions and activities of genes requires proper annotation of the transcribed units. However, transcript assembly efforts have produced a surprisingly large variation in the number of transcripts, and … >>>
Investigating the functions and activities of genes requires proper annotation of the transcribed units. However, transcript assembly efforts have produced a surprisingly large variation in the number of transcripts, and especially so for noncoding transcripts. This heterogeneity in assembled transcript sets might be partially explained by sequencing depth. Here, we used real and simulated short-read sequencing data as well as long-read data to systematically investigate the impact of sequencing depths on the accuracy of assembled transcripts. We assembled and analyzed transcripts from 671 human short-read data sets and four long-read data sets. At the first level, there is a positive correlation between the number of reads and the number of recovered transcripts. However, the effect of the sequencing depth varied based on cell or tissue type, the type of read and the nature and expression levels of the transcripts. The detection of coding transcripts saturated rapidly with both short and long-reads, however, there was no sign of early saturation for noncoding transcripts at any sequencing depth. Increasing long-read sequencing depth specifically benefited transcripts containing transposable elements. Finally, we show how single-cell RNA-seq can be guided by transcripts assembled from bulk long-read samples, and demonstrate that noncoding transcripts are expressed at similar levels to coding transcripts but are expressed in fewer cells. This study highlights the impact of sequencing depth on transcript assembly. <<<
翻译
857.
DeDe宝 (2022-07-05 22:49):
#paper doi:10.1016/j.tics.2015.03.002 TRENDS IN COGNITIVE SCIENCES, 2015, A Bayesian perspective on magnitude estimation. 这篇综述可以结合作者11年发表的Iterative Bayesian Estimation as an Explanation for Range and Regression Effects: A Study on Human Path Integration(DOI:10.1523/JNEUROSCI.2028-11.2011)一起看。综述简要介绍了人类被试估计物理量(如距离估计、角度估计、时长估计)时的行为特征,如回归效应、范围效应、序列效应等,并使用贝叶斯模型模拟并解释行为特征。综述还列举了贝叶斯模型在心理物理学、神经成像研究和临床研究中的应用,适合贝叶斯模型入门。11年的文章里有对经典贝叶斯模型(固定先验)和二阶贝叶斯模型(可迭代先验)的详细推导。
Abstract:
Our representation of the physical world requires judgments of magnitudes, such as loudness, distance, or time. Interestingly, magnitude estimates are often not veridical but subject to characteristic biases. These biases … >>>
Our representation of the physical world requires judgments of magnitudes, such as loudness, distance, or time. Interestingly, magnitude estimates are often not veridical but subject to characteristic biases. These biases are strikingly similar across different sensory modalities, suggesting common processing mechanisms that are shared by different sensory systems. However, the search for universal neurobiological principles of magnitude judgments requires guidance by formal theories. Here, we discuss a unifying Bayesian framework for understanding biases in magnitude estimation. This Bayesian perspective enables a re-interpretation of a range of established psychophysical findings, reconciles seemingly incompatible classical views on magnitude estimation, and can guide future investigations of magnitude estimation and its neurobiological mechanisms in health and in psychiatric diseases, such as schizophrenia. <<<
翻译
858.
颜林林 (2022-07-05 00:03):
#paper doi:10.1093/database/baac049 Database, 2022, dbBIP: a comprehensive bipolar disorder database for genetic research. 这篇文章,正如其期刊名,是一个数据库。它的研究主题和对象是bipolar disorder(BIP,双相情感障碍,又称躁狂抑郁症)。通过整合既往关于该疾病的大规模组学数据,包括两个基于芯片的GWAS队列(PGC2和PGC3,分别贡献了20352例BIP病例和31358名对照、41917例BIP和371549对照),也包括后续多项研究的WGS/WES测序数据,还包括大规模脑组织的转录组测序数据(表达谱数据),并通过各类组学分析方法,提供了对这些数据的功能注释、连锁关联、蛋白质相互作用、时空表达模式等信息。所有这些信息都以网站形式提供查询和在线分析功能。这是典型的生物信息学类型研究工作,也是深入开启某一研究方向的有效开局方式。
Abstract:
Bipolar disorder (BIP) is one of the most common hereditary psychiatric disorders worldwide. Elucidating the genetic basis of BIP will play a pivotal role in mechanistic delineation. Genome-wide association studies … >>>
Bipolar disorder (BIP) is one of the most common hereditary psychiatric disorders worldwide. Elucidating the genetic basis of BIP will play a pivotal role in mechanistic delineation. Genome-wide association studies (GWAS) have successfully reported multiple susceptibility loci conferring BIP risk, thus providing insight into the effects of its underlying pathobiology. However, difficulties remain in the extrication of important and biologically relevant data from genetic discoveries related to psychiatric disorders such as BIP. There is an urgent need for an integrated and comprehensive online database with unified access to genetic and multi-omics data for in-depth data mining. Here, we developed the dbBIP, a database for BIP genetic research based on published data. The dbBIP consists of several modules, i.e.: (i) single nucleotide polymorphism (SNP) module, containing large-scale GWAS genetic summary statistics and functional annotation information relevant to risk variants; (ii) gene module, containing BIP-related candidate risk genes from various sources and (iii) analysis module, providing a simple and user-friendly interface to analyze one's own data. We also conducted extensive analyses, including functional SNP annotation, integration (including summary-data-based Mendelian randomization and transcriptome-wide association studies), co-expression, gene expression, tissue expression, protein-protein interaction and brain expression quantitative trait loci analyses, thus shedding light on the genetic causes of BIP. Finally, we developed a graphical browser with powerful search tools to facilitate data navigation and access. The dbBIP provides a comprehensive resource for BIP genetic research as well as an integrated analysis platform for researchers and can be accessed online at http://dbbip.xialab.info. Database URL: http://dbbip.xialab.info. <<<
翻译
859.
颜林林 (2022-07-04 20:59):
#paper doi:10.1038/s41467-022-31236-0, Nature Communications, 2022, A convolutional neural network highlights mutations relevant to antimicrobial resistance in Mycobacterium tuberculosis. 本文建立了一套CNN(卷积神经网络)模型,从2万多个结核分枝杆菌的测序数据中,使用18个根据先验知识挑选的与其耐药性相关的基因座,将基因座的整个序列作为输入,以此来预测耐药性。结果显示,该CNN模型性能超过了目前其他基于传统机器学习方法和非卷积的常规神经网络方法。而且,由于深度学习方法提取了序列中的隐含特征信息,可以有效帮助预测未知突变对耐药性的影响。
IF:14.700Q1 Nature communications, 2022-07-02. DOI: 10.1038/s41467-022-31236-0 PMID: 35780211 PMCID:PMC9250494
Abstract:
Long diagnostic wait times hinder international efforts to address antibiotic resistance in M. tuberculosis. Pathogen whole genome sequencing, coupled with statistical and machine learning models, offers a promising solution. However, … >>>
Long diagnostic wait times hinder international efforts to address antibiotic resistance in M. tuberculosis. Pathogen whole genome sequencing, coupled with statistical and machine learning models, offers a promising solution. However, generalizability and clinical adoption have been limited by a lack of interpretability, especially in deep learning methods. Here, we present two deep convolutional neural networks that predict antibiotic resistance phenotypes of M. tuberculosis isolates: a multi-drug CNN (MD-CNN), that predicts resistance to 13 antibiotics based on 18 genomic loci, with AUCs 82.6-99.5% and higher sensitivity than state-of-the-art methods; and a set of 13 single-drug CNNs (SD-CNN) with AUCs 80.1-97.1% and higher specificity than the previous state-of-the-art. Using saliency methods to evaluate the contribution of input sequence features to the SD-CNN predictions, we identify 18 sites in the genome not previously associated with resistance. The CNN models permit functional variant discovery, biologically meaningful interpretation, and clinical applicability. <<<
翻译
860.
颜林林 (2022-07-03 00:04):
#paper doi:10.1002/ajmg.c.31987 American Journal of Medical Genetics, 2022, Genetic testing and glomerular hematuria - A nephrologist's perspective. 这篇综述介绍了Alport综合征(一种遗传性肾炎)的诊断和早期治疗方法的演变。该疾病表现为血尿,但并非急性外伤引起,而是与慢性炎症相关,且具有遗传性。该疾病发现于1920年,但直至2003年才被报道有药物可以进行治疗(之前只能选择透析和肾移植)。长期的临床病例积累和观察研究,确定了该疾病的遗传性,以及定位出COL4A3、COL4A4和COL4A5这三个基因与该疾病相关。由于血尿的原因很多,Alport综合征也存在各种不同程度症状的谱系分布,因此其诊断也需要开展对上述三个基因的突变检测。基因检测方法早期使用Sanger(一代测序),后来改为使用NGS(新一代测序,或者称为二代测序),无论哪种方法,都存在费用高昂等问题。在临床肾病专家的角度,会通过显微镜观察尿液中血细胞的形态等特征,帮助确定血尿的来源是否为肾小球,并综合考虑患者个体因素,确定是采取基因检测方法,或是肾活检方法。各种检测方法都并不完美,需要通过彼此互补来帮助进行疾病确诊。诸如对三个基因的检测,在NGS时代可以开展全外显子测序,不仅可能发现这三个基因上从未被报道过的难以判断致病性的突变,也可能发现与此疾病相关的其他基因突变,这些突变的解读,则需要依赖于遗传咨询师的辅助配合。这篇综述中展示的临床诊治路径(及其演化),反映了对这些信息的综合利用,以及从使患者受益的角度,该以何种顺序来组合不同的检测方法。
Abstract:
Alport syndrome is an inherited disorder of the kidneys that results from variants in three collagen IV genes-COL4A3, COL4A4, and COL4A5. Early diagnosis and pharmacologic intervention can delay the progression … >>>
Alport syndrome is an inherited disorder of the kidneys that results from variants in three collagen IV genes-COL4A3, COL4A4, and COL4A5. Early diagnosis and pharmacologic intervention can delay the progression of chronic kidney disease and the onset of kidney failure in patients with Alport syndrome. This article describes the evolution of approaches to the diagnosis and early treatment of Alport syndrome. <<<
翻译
回到顶部