当前共找到 1194 篇文献分享,本页显示第 961 - 980 篇。
961.
半面阳光
(2022-07-26 14:25):
#paper DOI: 10.1073/pnas.2019768118, 2021 Feb 2;118(5):e2019768118. Genome-wide detection of cytosine methylation by single molecule real-time sequencing. 这篇文章并非一篇最新发表的文献,是香港中文大学卢煜明团队于2021年发表在PANS上一篇研究文献。因为近期在一个学术会议上听到卢煜明教授介绍了这篇文献有关的研究结果,因此拿来研读。这篇文章的核心内容是利用PacBio的SMRT三代测序技术和卷积神经网络来检测DNA的甲基化。胞嘧啶的甲基化修饰,5-Methylcytosine (5mC) 是表观修饰中最重要的一种类型。应用比较广泛的检测CpG测序方法是亚硫酸盐测序(BS-seq)。但是BS-seq有一些不足之处,比如亚硫酸盐会导致DNA降解、还会将DNA序列中非甲基化的胞嘧啶(C)转化为胸腺嘧啶(T),影响后续的比对;而原始序列中C->T的点突变则又无法被亚硫酸盐所修饰。因此,在这篇文献中,作者采用单分子实时测序(Single molecular rea-time sequencing, SMRT sequencing)技术,开发了一个直接检测5mC的方法。这个方法将SMRT测序中的两个关键信息作为输入数据,结合卷积神经网络(CNN)构建了一个称为Holistic Kinetic (HK)Model 的检测方法。关键输入数据包括两个:一是SMRT测序中DNA聚合酶的动态信号(包括单个碱基发出荧光信号的时间和两个连续碱基之间的间隔时间),二是“序列背景”信息,即待检测的一段固定长度的DNA序列信息,这段固定长度的序列被称为一个“检测窗口”。作者首先用全基因组扩增的方法构建了一个非甲基化的数据集(阴性数据集,所有序列几乎都没有甲基化),同时用M.SssI 转甲基酶处理DNA样本构建了一个甲基化(阳性数据集, M.SssI 能够对双链DNA上的所有CpG位点进行甲基化);接着从这两个数据集中各取出一半数据来训练卷积神经网络,剩下的数据用于验证HK model的检测效果。结果显示,用HK model来区分甲基化状态的AUC最高达到了0.97。全基因组范围内在单碱基分辨率水平上检测5mC的敏感性和特异性分别达到90%和94%。研究结果还发现通过调节检测窗口大小和测序深度能够改变HK模型的检测效果。为了平衡下游数据分析与准确性之间的关系,最后选定21nt作为检测窗口的默认值,将10×作为测序深度的默认值。后续,作者采用一段人和小鼠杂交序列验证了HK模型在检测“杂合甲基化”序列(即同一段序列中同时包括甲基化和非甲基化的CpG )的可行性。此外,作者还对BS-seq的检测效果和HK model的检测效果进行了简单的比较研究。看这篇文献的感受一方面是工作量大,二是体现了作者对分子生物学的理论知识和测序技术特点的充分理解和应用。另外,这篇文献的整体研究框架和卢煜明团队以往的研究在思维上有着一脉相承的感觉,都体现了透彻地理解基本理论、灵活地运用测序技术来解决临床检测的难题。
IF:9.400Q1
Proceedings of the National Academy of Sciences of the United States of America,
2021-02-02.
DOI: 10.1073/pnas.2019768118
PMID: 33495335
Abstract:
5-Methylcytosine (5mC) is an important type of epigenetic modification. Bisulfite sequencing (BS-seq) has limitations, such as severe DNA degradation. Using single molecule real-time sequencing, we developed a methodology to directly …
>>>
5-Methylcytosine (5mC) is an important type of epigenetic modification. Bisulfite sequencing (BS-seq) has limitations, such as severe DNA degradation. Using single molecule real-time sequencing, we developed a methodology to directly examine 5mC. This approach holistically examined kinetic signals of a DNA polymerase (including interpulse duration and pulse width) and sequence context for every nucleotide within a measurement window, termed the holistic kinetic (HK) model. The measurement window of each analyzed double-stranded DNA molecule comprised 21 nucleotides with a cytosine in a CpG site in the center. We used amplified DNA (unmethylated) and M.SssI-treated DNA (methylated) (M.SssI being a CpG methyltransferase) to train a convolutional neural network. The area under the curve for differentiating methylation states using such samples was up to 0.97. The sensitivity and specificity for genome-wide 5mC detection at single-base resolution reached 90% and 94%, respectively. The HK model was then tested on human-mouse hybrid fragments in which each member of the hybrid had a different methylation status. The model was also tested on human genomic DNA molecules extracted from various biological samples, such as buffy coat, placental, and tumoral tissues. The overall methylation levels deduced by the HK model were well correlated with those by BS-seq ( = 0.99; < 0.0001) and allowed the measurement of allele-specific methylation patterns in imprinted genes. Taken together, this methodology has provided a system for simultaneous genome-wide genetic and epigenetic analyses.
<<<
翻译
962.
颜林林
(2022-07-25 07:28):
#paper doi:10.1038/s41380-022-01661-0 Molecular Psychiatry, 2022, The serotonin theory of depression: a systematic umbrella review of the evidence. 这是一篇meta分析,而且还是一篇阴性结果的报道,按照很多“业内人”的观点,这样的“水文”是不屑一顾或羞于启齿的。本文研究血清素(serotonin,即5-羟色胺)是否与抑郁症病因有关。这是一个流行于大多数公众和专业研究人员的观点,人们普遍认为血清素降低与抑郁症有关。本文采取了“伞式”审查(umbrella review)方法,纳入多个不同领域对血清素系统进行的大量研究,以便为结论提供可及的最高证据等级支持。涵盖的六个领域分别是:(1) 血清素及其代谢物5-HIAA(5-羟吲哚乙酸)是否在抑郁症患者体液中含量更低;(2) 抑郁症患者的血清素受体是否表达水平更低;(3) 血清素转运蛋白(SERT)是否抑郁症患者中表达更高;(4) 色氨酸(5-羟色胺的前体)耗竭是否会导致抑郁症;(5) 抑郁症患者的 SERT 基因是否表达更高;(6) 抑郁症患者的SERT基因与压力之间是否存在相互作用。本文研究在 PROSPERO 注册(CRD42020207203),共纳入 17 项研究:12 项系统评价和meta分析(systematic reviews and meta-analyses),1 项协作meta分析(collaborative meta-analysis),1 项大型队列研究的meta分析(meta-analysis of large cohort studies),1 项系统评价和综述(systematic review and narrative synthesis),1 项遗传关联研究(genetic association study)和 1 项伞式审查(umbrella review)。最终在六个领域问题上,分别以各自可及的最大样本量(从数百到数万),否定了血清素活性标志物与抑郁症之间的关联,并建议“it is time to acknowledge that the serotonin theory of depression is not empirically substantiated(是时候承认抑郁症的血清素理论并没有经验实证)”。可见,能够明确下一个阴性结论(否定结论),也是相当不容易的。
Abstract:
The serotonin hypothesis of depression is still influential. We aimed to synthesise and evaluate evidence on whether depression is associated with lowered serotonin concentration or activity in a systematic umbrella …
>>>
The serotonin hypothesis of depression is still influential. We aimed to synthesise and evaluate evidence on whether depression is associated with lowered serotonin concentration or activity in a systematic umbrella review of the principal relevant areas of research. PubMed, EMBASE and PsycINFO were searched using terms appropriate to each area of research, from their inception until December 2020. Systematic reviews, meta-analyses and large data-set analyses in the following areas were identified: serotonin and serotonin metabolite, 5-HIAA, concentrations in body fluids; serotonin 5-HT receptor binding; serotonin transporter (SERT) levels measured by imaging or at post-mortem; tryptophan depletion studies; SERT gene associations and SERT gene-environment interactions. Studies of depression associated with physical conditions and specific subtypes of depression (e.g. bipolar depression) were excluded. Two independent reviewers extracted the data and assessed the quality of included studies using the AMSTAR-2, an adapted AMSTAR-2, or the STREGA for a large genetic study. The certainty of study results was assessed using a modified version of the GRADE. We did not synthesise results of individual meta-analyses because they included overlapping studies. The review was registered with PROSPERO (CRD42020207203). 17 studies were included: 12 systematic reviews and meta-analyses, 1 collaborative meta-analysis, 1 meta-analysis of large cohort studies, 1 systematic review and narrative synthesis, 1 genetic association study and 1 umbrella review. Quality of reviews was variable with some genetic studies of high quality. Two meta-analyses of overlapping studies examining the serotonin metabolite, 5-HIAA, showed no association with depression (largest n = 1002). One meta-analysis of cohort studies of plasma serotonin showed no relationship with depression, and evidence that lowered serotonin concentration was associated with antidepressant use (n = 1869). Two meta-analyses of overlapping studies examining the 5-HT receptor (largest n = 561), and three meta-analyses of overlapping studies examining SERT binding (largest n = 1845) showed weak and inconsistent evidence of reduced binding in some areas, which would be consistent with increased synaptic availability of serotonin in people with depression, if this was the original, causal abnormaly. However, effects of prior antidepressant use were not reliably excluded. One meta-analysis of tryptophan depletion studies found no effect in most healthy volunteers (n = 566), but weak evidence of an effect in those with a family history of depression (n = 75). Another systematic review (n = 342) and a sample of ten subsequent studies (n = 407) found no effect in volunteers. No systematic review of tryptophan depletion studies has been performed since 2007. The two largest and highest quality studies of the SERT gene, one genetic association study (n = 115,257) and one collaborative meta-analysis (n = 43,165), revealed no evidence of an association with depression, or of an interaction between genotype, stress and depression. The main areas of serotonin research provide no consistent evidence of there being an association between serotonin and depression, and no support for the hypothesis that depression is caused by lowered serotonin activity or concentrations. Some evidence was consistent with the possibility that long-term antidepressant use reduces serotonin concentration.
<<<
翻译
963.
白义民
(2022-07-24 17:54):
#paper 《宁玛派龙钦巴研究》,龙钦巴是藏密大圆满教法ati-yoga的集大成者,这篇文章从其修行历程传记和他的著述两部分对龙钦巴做了概略的介绍,与一般的语焉不详的密宗表述不同,这篇博士论文从民族宗教学的学术角度比较准确,直白的概论了大圆满教法。对大圆满修行感兴趣的人而言,在学术性文章的指引下,可以避免少走弯路。
964.
颜林林
(2022-07-24 05:55):
#paper doi:10.1186/s12864-022-08762-8 BMC Genomics, 2022, Poly(a) selection introduces bias and undue noise in direct RNA-sequencing. 全转录组测序实验中,在初始的RNA提取环节后,经常会使用poly-A筛选方法,来富集mRNA。本文使用ONT平台,开展直接RNA测序(direct RNA-sequencing),并对同一样本,平行地采取使用和不适用poly-A筛选的方法。最终结果说明,省略该环节是合适的,虽然这么做可能轻微降低文库复杂度,但它能更有效避免该筛选环节带来的其他弊端,如需要更多RNA起始量、容易倾向地筛选出具有更长poly-A尾巴的mRNA、会导致差异表达基因也受到影响而更不稳定等。
Abstract:
BACKGROUND: Genome-wide RNA-sequencing technologies are increasingly critical to a wide variety of diagnostic and research applications. RNA-seq users often first enrich for mRNA, with the most popular enrichment method being …
>>>
BACKGROUND: Genome-wide RNA-sequencing technologies are increasingly critical to a wide variety of diagnostic and research applications. RNA-seq users often first enrich for mRNA, with the most popular enrichment method being poly(A) selection. In many applications it is well-known that poly(A) selection biases the view of the transcriptome by selecting for longer tailed mRNA species.RESULTS: Here, we show that poly(A) selection biases Oxford Nanopore direct RNA sequencing. As expected, poly(A) selection skews sequenced mRNAs toward longer poly(A) tail lengths. Interestingly, we identify a population of mRNAs (> 10% of genes' mRNAs) that are inconsistently captured by poly(A) selection due to highly variable poly(A) tails, and demonstrate this phenomenon in our hands and in published data. Importantly, we show poly(A) selection is dispensable for Oxford Nanopore's direct RNA-seq technique, and demonstrate successful library construction without poly(A) selection, with decreased input, and without loss of quality.CONCLUSIONS: Our work expands the utility of direct RNA-seq by validating the use of total RNA as input, and demonstrates important technical artifacts from poly(A) selection that inconsistently skew mRNA expression and poly(A) tail length measurements.
<<<
翻译
965.
颜林林
(2022-07-23 22:05):
#paper doi:10.1101/2022.07.21.500999 bioRxiv, 2022, High-resolution de novo structure prediction from primary sequence. 这篇预发表的文章,开发了一个工具,OmegaFold,可以基于单个蛋白的一级序列信息,预测三级结构。现在主流的方法,都需要依赖演化信息,即通过多序列比对作为辅助,进行蛋白质折叠结构的预测。而本文认为,蛋白从被翻译合成出来后,就会经历从一级序列自动折叠成为三级结构,因而这些演化信息对于结构预测而言并非必要。本文采取的深度模型,会依赖于一组预训练模型,帮助识别出一级序列中哪些氨基酸更为重要(即赋予不同的注意力),并采取基于BERT的语言模型技术,帮助进行蛋白质折叠的模型训练。最终实现的方法,可以有效解决孤儿蛋白(即当前结构数据库中缺乏其他可供参考的相近蛋白)的结构预测问题,且与AlphaFold等工具相比,在准确度上又有显著提升。
bioRxiv,
2022.
DOI: 10.1101/2022.07.21.500999
Abstract:
Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such …
>>>
Recent breakthroughs have used deep learning to exploit evolutionary information in multiple sequence alignments (MSAs) to accurately predict protein structures. However, MSAs of homologous proteins are not always available, such as with orphan proteins and fast-evolving proteins like antibodies, and a protein typically folds in a natural setting from its primary amino acid sequence into its three-dimensional structure, suggesting that evolutionary information and MSAs should not be necessary to predict a protein's folded form. Here, we introduce OmegaFold, the first computational method to successfully predict high-resolution protein structure from a single primary sequence alone. Using a new combination of a protein language model that allows us to make predictions from single sequences and a geometry-inspired transformer model trained on protein structures, OmegaFold outperforms RoseTTAFold and achieves similar prediction accuracy to AlphaFold2 on recently released structures. OmegaFold enables accurate predictions on orphan proteins that do not belong to any functionally characterized protein family and antibodies that tend to have noisy MSAs due to fast evolution. Our study fills a much-needed structure prediction gap and brings us a step closer to understanding protein folding in nature.
<<<
翻译
966.
颜林林
(2022-07-22 00:00):
#paper doi:10.1056/NEJMe2207902 The New England Journal of Medicine, 2022, Setting the Benchmark for KRAS(G12C)-Mutated NSCLC. 这是一篇社论(Editorial),介绍了该期杂志上关于KRYSTAL-1二期临床试验的结果报道(doi:10.1056/NEJMoa2204619)。该临床试验的主角,是一种KRAS G12C抑制剂,阿达格拉西布(Adagrasib),其在此次临床试验中表现不错,对经过化疗与免疫治疗的携带KRAS G12C突变的患者,生存评估的指标(ORR、PFS和OS等),与此前另一个获批药物,索托拉西布(sotorasib)非常接近。这篇社论由此推测,这两个药物在机制上可能存在很大的重叠。此外,两个药物在代谢和动力学方面的差异(如穿越血脑屏障、在体内的半衰期等),则又为两个药物未来在选用时可采取的差异化,提供了方向提示。
Abstract:
No abstract available.
967.
颜林林
(2022-07-21 00:29):
#paper doi:10.1186/s13059-022-02726-7 Genome Biology, 2022, Integration of single-cell multi-omics data by regression analysis on unpaired observations. 受技术条件限制,绝大多数的单细胞多组学研究,其实都很难在同一细胞上同时检测多个不同组学。本文针对这个问题,基于“相似表达的靶基因的调控基因也相似”的直观认识和假设,采用回归分析方法,对scRNA-seq和ATAC-seq数据之间的关系进行关联和推断,使非配对的scRNA-seq和ATAC-seq实验(即并非同一细胞,而是在不同细胞上分别开展了这两项检测)中,可以通过其中一项数据(如ATAC-seq的染色质开放信息)去推断对应的被调控基因的表达。该方法在模拟数据和实测数据上进行评估,可以达到很高的准确度(与eQTL mapping进行对比,结果高度一致)。这为更好利用当前积累的大量非配对单细胞数据,提供了方法学上的支持。
IF:10.100Q1
Genome biology,
2022-07-19.
DOI: 10.1186/s13059-022-02726-7
PMID: 35854350
PMCID:PMC9295346
通过对未配对观察值的回归分析整合单细胞多组学数据
Abstract:
Despite recent developments, it is hard to profile all multi-omics single-cell data modalities on the same cell. Thus, huge amounts of single-cell genomics data of unpaired observations on different cells …
>>>
Despite recent developments, it is hard to profile all multi-omics single-cell data modalities on the same cell. Thus, huge amounts of single-cell genomics data of unpaired observations on different cells are generated. We propose a method named UnpairReg for the regression analysis on unpaired observations to integrate single-cell multi-omics data. On real and simulated data, UnpairReg provides an accurate estimation of cell gene expression where only chromatin accessibility data is available. The cis-regulatory network inferred from UnpairReg is highly consistent with eQTL mapping. UnpairReg improves cell type identification accuracy by joint analysis of single-cell gene expression and chromatin accessibility data.
<<<
翻译
968.
颜林林
(2022-07-20 07:49):
#paper doi:10.1101/2022.07.17.500374 bioRxiv, 2022, Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations. 这是一个“认真地做一件小事”的例子。在做基因组分析时,我们经常遭遇“究竟该用hg19还是hg38”的纠结,有时候不得不并行地分别使用两个参考基因组来进行两次差不多的分析,以避免由于使用liftOver之类的基因组坐标转换工具带来的信息丢失。这篇文章针对这个小小的(甚至不那么常见的)痛点,在兼容现有VCF格式的情况下,使其在同一个结果文件中带上两套基因组坐标,不仅不影响现有工具的使用,而且可以随时从中进行所需基因组坐标的提取。想法很简单,实现也不难,但却的确是有效解决了某些实际操作的问题。
bioRxiv,
2022.
DOI: 10.1101/2022.07.17.500374
Abstract:
We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented …
>>>
We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented in the Genozip platform, DVCF enables bioinformatics pipelines to seamlessly operate across two coordinate systems by leveraging the system most advantageous to each pipeline step, simplifying bioinformatics workflows and reducing file generation and associated data storage burden. Moreover, our benchmarking of Genozip DVCF shows that it produces more complete, less erroneous, and less biased translations across coordinate systems than two widely used alternative tools (i.e., LiftoverVcf and CrossMap).
<<<
翻译
969.
张德祥
(2022-07-19 18:49):
#paper https://doi.org/10.48550/arXiv.2207.04630 On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence 马毅的这篇论文已经有公众号报道过了,马毅结合自己的之前的两个工作,LDR 数据压缩及闭环生成模型的深度网络,将压缩和闭环生成提炼为简约和自洽的智能原则,本论文继续提出了更多通用性的想法,并扩展到3d视觉及强化学习并预测对神经科学及高级智能的影响。
arXiv,
2022.
DOI: 10.48550/arXiv.2207.04630
Abstract:
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in …
>>>
Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that address two fundamental questions regarding Intelligence: what to learn and how to learn, respectively. We believe the two principles are the cornerstones for the emergence of Intelligence, artificial or natural. While these two principles have rich classical roots, we argue that they can be stated anew in entirely measurable and computable ways. More specifically, the two principles lead to an effective and efficient computational framework, compressive closed-loop transcription, that unifies and explains the evolution of modern deep networks and many artificial intelligence practices. While we mainly use modeling of visual data as an example, we believe the two principles will unify understanding of broad families of autonomous intelligent systems and provide a framework for understanding the brain.
<<<
翻译
970.
颜林林
(2022-07-19 00:21):
#paper doi:10.1002/humu.24440 Human Mutation, 2022, Multi-omics analysis reveals multiple mechanisms causing Prader-Willi like syndrome in a family with a X;15 translocation. 这篇文章报道了一个患有PWS(Prader-Willi syndrome)遗传病的家庭,以及对其致病基因进行发现和确认的过程。PWS是一种神经发育疾病,且属于教科书级别的遗传病,因为它由一个遗传印记基因区域的变异所导致。所谓遗传印记,即该等位基因会记住其来源是父方或母方,并只在其中一方来源的染色体上的该基因才会表达。PWS就是与15q11.2区域相关,通常是该区域基因的父源拷贝缺失导致疾病。这篇文章报道的家庭,两位女儿都表现出该疾病相关症状(肥胖、智力障碍等),其母亲是携带者(存在一个15号染色体与X染色体的易位突变,translocation)。在本文中,分别使用了核型分析(karyotype)、FISH(染色体原位荧光杂交)、甲基化敏感的MLPA、短序列WGS、10x linked read WGS、转录组测序、ddPCR等方法,各方法都对应解决了在该遗传调查过程中要解决的某个环节的问题,最终确认了该致病基因,以及解释和推论出两个女儿患者的不同发病机制:一个是在15号染色体该区域表现为单亲二体(Uniparental disomy,UPD),另一个则是在印记基因上丧失了印记特性,即两条染色体上都能同时表达该SNRPN基因。对于遗传病研究人员或者从事遗传咨询工作的人员,这篇文章的整个研究过程,涉及的技术众多,逻辑条理清晰,非常具有学习价值。
Abstract:
Prader-Willi syndrome (PWS; MIM# 176270) is a neurodevelopmental disorder caused by the loss of expression of paternally imprinted genes within the PWS region located on 15q11.2. It is usually caused …
>>>
Prader-Willi syndrome (PWS; MIM# 176270) is a neurodevelopmental disorder caused by the loss of expression of paternally imprinted genes within the PWS region located on 15q11.2. It is usually caused by either maternal uniparental disomy of chromosome 15 (UPD15) or 15q11.2 recurrent deletion(s). Here, we report a healthy carrier of a balanced X;15 translocation and her two daughters, both with the karyotype 45,X,der(X)t(X;15)(p22;q11.2),-15. Both daughters display symptoms consistent with haploinsufficiency of the SHOX gene and PWS. We explored the architecture of the derivative chromosomes and investigated effects on gene expression in patient-derived neural cells. First, a multiplex ligation-dependent probe amplification methylation assay was used to determine the methylation status of the PWS-region revealing maternal UPD15 in daughter 2, explaining her clinical symptoms. Next, short read whole genome sequencing and 10X genomics linked read sequencing was used to pinpoint the exact breakpoints of the translocation. Finally, we performed transcriptome sequencing on neuroepithelial stem cells from the mother and from daughter 1 and observed biallelic expression of genes in the PWS region (including SNRPN) in daughter 1. In summary, our multi-omics analysis highlights two different PWS mechanisms in one family and provide an example of how structural variation can affect imprinting through long-range interactions.
<<<
翻译
971.
颜林林
(2022-07-18 06:00):
#paper doi:10.1101/2022.07.14.500036 bioRxiv, 2022, Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics. 单细胞转录组测序数据分析中,需要对批次效应影响进行去除。这通常是对原本高维的数据进行降维,使其在更容易反映出数据结构特征的低维空间上,根据批次信息对数据进行矫正。这个过程很容易导致具有生物学意义的数据特征被误伤,而这样的生物学差异正是我们进行单细胞测序所要研究的对象。针对如何去除批次效应影响,以及如何保留生物学相关数据差异,这两个原本互相矛盾的目标,通常被单细胞测序分析工具根据其各自策略原则的不同,会被选取其中之一作为优先目标进行优化。在本文中,作者通过引入一种名为帕累托多任务学习(Pareto MTL)的多目标优化技术,使综合评估并权衡与两者有关的多种不同指标,以获得整体更优的目的。在这个过程中,还基于神经网络方法,提出一种名为交互信息神经估计(Mutual Information Neural Estimation,MINE)的指标,来帮助该平衡点的选取。文章使用了TM-MARROW和MACAQUE-RETINA等公共数据集,对方法进行了评估,并展示了MINE的效果,确实优于常用的MMD方法。
bioRxiv,
2022.
DOI: 10.1101/2022.07.14.500036
Abstract:
Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low …
>>>
Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low dimensional latent representations better to understand overall structure in the data. In this work, we build upon scVI, a powerful deep generative model which can learn biologically meaningful latent representations, but which has limited explicit control of batch effects. Rather than prioritizing batch effect removal over conservation of biological variation, or vice versa, our goal is to provide a bird eye view of the trade-offs between these two conflicting objectives. Specifically, using the well established concept of Pareto front from economics and engineering, we seek to learn the entire trade-off curve between conservation of biological variation and removal of batch effects. A multi-objective optimisation technique known as Pareto multi-task learning (Pareto MTL) is used to obtain the Pareto front between conservation of biological variation and batch effect removal. Our results indicate Pareto MTL can obtain a better Pareto front than the naive scalarization approach typically encountered in the literature. In addition, we propose to measure batch effect by applying a neural-network based estimator called Mutual Information Neural Estimation (MINE) and show benefits over the more standard Maximum Mean Discrepancy (MMD) measure. The Pareto front between conservation of biological variation and batch effect removal is a valuable tool for researchers in computational biology. Our results demonstrate the efficacy of applying Pareto MTL to estimate the Pareto front in conjunction with applying MINE to measure the batch effect.
<<<
翻译
972.
钟鸣
(2022-07-18 02:00):
#paper doi:10.1073/pnas.0404172101 Proc Natl Acad Sci U S A, 2004, Genomic analysis of Bacteroides fragilis reveals extensive DNA inversions regulating cell surface adaptation 脆弱拟杆菌(BF)和多形拟杆菌(BT)都是人体肠道内的细菌,但生态位以及致病性不同:BF附着于黏膜表面,具有致病,是致病力最强的拟杆菌;BT存在于结肠内,无致病性。这里,作者比较了二者的基因组,侧重于基因组倒位。
作者首先比较了荚膜基因座,因为荚膜是BF的主要毒力因子。比较发现,BF中有9个荚膜基因座,BT中则是7个。拟杆菌的荚膜基因座前的启动子序列两侧有IR(反转重复序列),IR的存在使得启动子在某些情况下发生翻转,翻转后的启动子序列随即失效,进而导致下游基因不被转录和表达,拟杆菌借此机制产生不同类型的荚膜(相变),以逃避宿主的免疫杀伤。对BT和BF荚膜基因座的启动子分析表明,BF的9个荚膜基因座都是可以翻转的,且翻转都是由丝氨酸型重组酶mpi介导的,这表明BF的9个荚膜基因座的翻转是全局调控的。而BT的7个荚膜基因座中只有4个是可翻转的,而且是分别被4个不同的酪氨酸型重组酶介导的,这与BF形成鲜明对比。
BF的荚膜多糖可以诱导形成脓肿,这种毒性离不开荚膜多糖重复单元中带正电荷的游离氨基和负电荷基团的存在。分析发现BF中有4种荚膜基因座可以产生同时带有游离氨基和负电荷集团的荚膜多糖,而BT中只有1种,这种差异可能与BF具有更高的毒力有关。
作者还预测了荚膜基因座之外的区域的翻转事件,在BF中鉴定了31个可翻转的区域并通过PCR证实了。其中代表性的是SusC/SusD(同源)基因,其产物定位于细菌表面,将环境中的淀粉多糖分解成单糖并转运到细菌内部提供营养。SusC/SusD(同源)基因的可变表达受7种倒位机制调节,这是已鉴定的全部倒位调节机制类型。作者使用图示简洁形象的解释了这7种调节机制,十分复杂且有趣。
总而言之,BF中广泛存在的DNA倒位调节了细菌的毒力调控和营养利用系统等生物功能的调控,且不同致病性物种间的机制有异同。
Abstract:
Bacteroides are predominant human colonic commensals, but the principal pathogenic species, Bacteroides fragilis (BF), lives closely associated with the mucosal surface, whereas a second major species, Bacteroides thetaiotaomicron (BT), concentrates …
>>>
Bacteroides are predominant human colonic commensals, but the principal pathogenic species, Bacteroides fragilis (BF), lives closely associated with the mucosal surface, whereas a second major species, Bacteroides thetaiotaomicron (BT), concentrates within the colon. We find corresponding differences in their genomes, based on determination of the genome sequence of BF and comparative analysis with BT. Both species have acquired two mechanisms that contribute to their dominance among the colonic microbiota: an exceptional capability to use a wide range of dietary polysaccharides by gene amplification and the capacity to create variable surface antigenicities by multiple DNA inversion systems. However, the gene amplification for polysaccharide assimilation is more developed in BT, in keeping with its internal localization. In contrast, external antigenic structures can be changed more systematically in BF. Thereby, at the mucosal surface, where microbes encounter continuous attack by host defenses, BF evasion of the immune system is favored, and its colonization and infectious potential are increased.
<<<
翻译
973.
钟鸣
(2022-07-17 02:20):
#paper doi:10.1128/JB.00933-10 J Bacteriol, 2011, Adhesive Activity of the Haemophilus Cryptic Genospecies Cha Autotransporter Is Modulated by Variation in Tandem Peptide Repeats 嗜血杆菌隐匿基因种(当时被认为是流感嗜血杆菌的一个型,现在已被命名为昆蒂尼嗜血杆菌)定植于女性生殖道,分娩时可感染胎儿并引发临床症状。Cha是存在于该物种中的保守的基因,编码三聚体自转运黏附素(TAA),促进该菌定植。TAA家族高度模块化,由N端信号肽、中间的乘客区和C端的外膜区组成。N端信号肽帮助TAA转运到周质区,在周质区,C末端三聚化并插入外膜,从而促进中间乘客结构域暴露于膜外,行使粘附功能。作者此前发现不同菌株间Cha基因长度略有不同,且Cha基因内部有84bp片段重复,他们猜想该片段的重复次数导致Cha基因长度不同,于是他们在不同菌株间使用South Blot验证了这一猜想。与此同时,他们发现拥有不同长度Cha的菌株,粘附效率也不同。他们猜想Cha蛋白的长度可能与粘附效率有关。于是作者分离了不同Cha长度的突变体,这些突变体重的84bp片段重复0次到100次不等。细胞粘附实验表明,84bp片段重复次数越多,粘附效率就越低。为了明确粘附效率的改变是否因Cha表达效率而变化,他们使用qPCR检查了不同突变体间Cha的mRNA含量,结果表明没有差异,这说明Cha的粘附效率不是蛋白丰度介导的,而是蛋白结构介导的。
此时,作者使用电镜观察了不同Cha长度的菌株的Cha蛋白形态,他们发现84bp片段重复数越多,则Cha越长,这间接证明了Cha结构变化影响细菌粘附效率。
接下来,为了探索Cha基因发挥粘附功能的结构域,作者对Cha基因做了不同长度的截短,构建了另一批突变株。对这些突变株的粘附效率进行测试,他们发现Cha的N端400aa的区域是Cha发挥粘附作用的充要条件,而84bp重复片段对Cha的粘附不是必须的。(注意,在这里测试粘附效率的方法是管沉降实验,即观察细菌在液体培养基中培养多长时间后沉底。因为有研究表明细菌的黏附素不仅介导细菌-宿主的粘附,也介导细菌-细菌间的粘附。)
最后,作者想知道Cha介导细菌-细菌粘附的机制具体是如何实现的,是不同菌株间Cha蛋白相互结合,还是Cha蛋白与另一个细菌的另一个蛋白结合?为此,他们分别测试了低粘附率菌株和高低粘附率菌株共培养这两种情况下的菌落聚集情况(菌落聚集也是本文用用于测量细菌间粘附能力的实验),结果表明Cha基因是自缔合的。
本研究通过一系列具有不同长度Cha基因的突变体,证明了84bp片段的重复数导致Cha基因长度不同,进而导致细菌粘附能力不同,且Cha越长,粘附能力越差。
Abstract:
The Haemophilus cryptic genospecies is an important cause of maternal genital tract and neonatal systemic infections and initiates infection by colonizing the genital or respiratory epithelium. In recent work, we …
>>>
The Haemophilus cryptic genospecies is an important cause of maternal genital tract and neonatal systemic infections and initiates infection by colonizing the genital or respiratory epithelium. In recent work, we identified a unique Haemophilus cryptic genospecies protein called Cha, which mediates efficient adherence to genital and respiratory epithelia. The Cha adhesin belongs to the trimeric autotransporter family and contains an N-terminal signal peptide, an internal passenger domain that harbors adhesive activity, and a C-terminal membrane anchor domain. The passenger domain in Cha contains clusters of YadA-like head domains and neck motifs as well as a series of tandem 28-amino-acid peptide repeats. In the current study, we report that variation in peptide repeat number gradually modulates Cha adhesive activity, associated with a direct effect on the length of Cha fibers on the bacterial cell surface. The N-terminal 404 residues of the Cha passenger domain mediate binding to host cells and also facilitate bacterial aggregation through intermolecular Cha-Cha binding. As the tandem peptide repeats expand, the Cha fiber becomes longer and Cha adherence activity decreases. The expansion and contraction of peptide repeats represent a novel mechanism for modulating adhesive capacity, potentially balancing the need of the organism to colonize the genital and respiratory tracts with the ability to attach to alternative substrates, disperse within the host, or evade the host immune system.
<<<
翻译
974.
钟鸣
(2022-07-16 15:00):
#paper doi:10.7717/peerj.12272 PeerJ, 2021, Helicobacter pylori virulence factors: relationship between genetic variability and phylogeographic origin。幽门螺杆菌妇孺皆知,其感染全世界一半以上的人,并与慢性胃炎和胃癌相关。众所周知细菌致病有赖于毒力因子,在这里作者从135个全基因组中提取了幽门螺杆菌的7类、87个毒力因子(VF)并比较。
他们从4个角度做比较了毒力因子基因:拷贝数、基因大小(长度)、共线性、相似性,并根据保守性将87个毒力因子划分成了3类:高度保守、中度保守和低度保守。脲酶是高度保守VFs的代表,通过把尿素代谢成氨来中和胃酸,为细菌在胃中的存活提供便利。典型的低度保守毒力因子是黏附素,体现为高水平的重组,主要是基因倒位。基因倒位可能引发“位置效应,从而影响基因的表达。但是不同于其他基因,黏附素基因的倒位与致病表型/地理起源没有相关性。
作者还根据87个毒力因子的相似性将135个基因组划分成了a、b、c、d共4个单系群,a组更易引起胃炎和消化性溃疡,d组更易引发患胃癌和胃淋巴瘤。b组主要来源中东,c组主要来源于东亚。本文分析思路简单明确,分析结果为幽门螺杆菌的致病基因进化提供了富有洞察力的见解。
此外,作者的分析还表明,约34%的基因在基因组自动注释时被错误注释,这与早期上传的基因组注释结果不够准确、但又作为参考数据对后来的基因组注释,引起了错误的传播和渗透(以讹传讹),因此他们建议原核生物的基因注释应使用半手动的方式。
Abstract:
BACKGROUND: Helicobacter pylori is a pathogenic bacteria that colonize the gastrointestinal tract from human stomachs and causes diseases including gastritis, peptic ulcers, gastric lymphoma (MALT), and gastric cancer, with a …
>>>
BACKGROUND: Helicobacter pylori is a pathogenic bacteria that colonize the gastrointestinal tract from human stomachs and causes diseases including gastritis, peptic ulcers, gastric lymphoma (MALT), and gastric cancer, with a higher prevalence in developing countries. Its high genetic diversity among strains is caused by a high mutation rate, observing virulence factors (VFs) variations in different geographic lineages. This study aimed to postulate the genetic variability associated with virulence factors present in the Helicobacter pylori strains, to identify the relationship of these genes with their phylogeographic origin.METHODS: The complete genomes of 135 strains available in NCBI, from different population origins, were analyzed using bioinformatics tools, identifying a high rate; as well as reorganization events in 87 virulence factor genes, divided into seven functional groups, to determine changes in position, number of copies, nucleotide identity and size, contrasting them with their geographical lineage and pathogenic phenotype.RESULTS: Bioinformatics analyses show a high rate of gene annotation errors in VF. Analysis of genetic variability of VFs shown that there is not a direct relationship between the reorganization and geographic lineage. However, regarding the pathogenic phenotype demonstrated in the analysis of many copies, size, and similarity when dividing the strains that possess and not the cag pathogenicity island (cagPAI), having a higher risk of developing gastritis and peptic ulcer was evidenced. Our data has shown that the analysis of the overall genetic variability of all VFs present in each strain of H. pylori is key information in understanding its pathogenic behavior.
<<<
翻译
975.
颜林林
(2022-07-15 00:05):
#paper doi:10.3390/ijms23137446 International Journal of Molecular Sciences, 2022, Identification of Spliceogenic Variants beyond Canonical GT-AG Splice Sites in Hereditary Cancer Genes. 位于外显子边界附近的点突变,可能会影响基因表达的剪接形式,这在遗传病诊断和咨询过程中,是重要的信息。然而,大多数情况下,这类突变只能通过既往报道和计算工具预测来进行判定,而在美国医学遗传学和基因组学学会和分子病理学协会(ACMG/AMP)变异分类指南中,计算方法得到的结果,通常只能作为意义不确定的突变(VUS)。本文研究纳入了732例携带此类潜在可能影响RNA剪接的VUS突变的患者,涉及APC、ATM、FH、LZTR1、MSH6、PALB2、RAD51C和TP53基因,采用多重PCR方法,在RNA水平上进行了检测,以验证这些VUS所造成的影响。对于检测结果,本文逐一进行了生物学功能的分析与解读,以确定相应突变是否致病。最终对50%的VUS突变重新进行了分类,25%降级成为可能良性,25%升级成为可能致病。
IF:4.900Q2
International journal of molecular sciences,
2022-Jul-04.
DOI: 10.3390/ijms23137446
PMID: 35806449
Abstract:
Pathogenic/likely pathogenic variants in susceptibility genes that interrupt RNA splicing are a well-documented mechanism of hereditary cancer syndromes development. However, if RNA studies are not performed, most of the variants …
>>>
Pathogenic/likely pathogenic variants in susceptibility genes that interrupt RNA splicing are a well-documented mechanism of hereditary cancer syndromes development. However, if RNA studies are not performed, most of the variants beyond the canonical GT-AG splice site are characterized as variants of uncertain significance (VUS). To decrease the VUS burden, we have bioinformatically evaluated all novel VUS detected in 732 consecutive patients tested in the routine genetic counseling process. Twelve VUS that were predicted to cause splicing defects were selected for mRNA analysis. Here, we report a functional characterization of 12 variants located beyond the first two intronic nucleotides using RNAseq in , , , , , , , and genes. Based on the analysis of mRNA, we have successfully reclassified 50% of investigated variants. 25% of variants were downgraded to likely benign, whereas 25% were upgraded to likely pathogenic leading to improved clinical management of the patient and the family members.
<<<
翻译
976.
颜林林
(2022-07-14 21:57):
#paper doi:10.1126/science.abl9283 Science, 2022, Substitution mutational signatures in whole-genome–sequenced cancers in the UK population. 这篇今年四月发表在《Science》上的文章,被最新一期《Cancer Cell》所推荐(doi:10.1016/j.ccell.2022.05.011)。这些年做大规模人群做全基因组测序(WGS)的文章并不少见,时至今日仍能发表于顶刊,其创新点及意义,大概还是值得关注和了解下的。本文的入组病例样本来自Genomics England (GEL) 100,000 Genomes Project (100kGP),共计12,222个肿瘤样本(来自11,585位个体)的WGS,在分析得到与肿瘤发生的突变特征后,又在另外两个大型独立队列(来自国际癌症基因组联盟 (ICGC) 的 3001 例原发性癌症和来自 Hartwig 医学基金会的 3417 例转移性癌症)中进行了验证。本文重点关注由WGS分析得到的单碱基替换 (SBS) 和双碱基替换 (DBS) 特征,并建立了一个名为 Signature Fit Multi-Step (FitMS) 的计算框架。该方法用来区分哪些特征是各不同癌种中常见的,而哪些是罕见的、仅出现在特定癌种或器官。而通过对组织特异性特征进行聚类分析,并将其组合起来形成一组参考特征,帮助进行机制和病因的解释。从所解决的问题及方法看,似乎并无特别重大的创新,因此初步推断,之所以能跻身顶刊,与其超大人群及数据量,以及相应的工作量(参见长达94页的补充材料),还是密不可分的。
Abstract:
Whole-genome sequencing (WGS) permits comprehensive cancer genome analyses, revealing mutational signatures, imprints of DNA damage and repair processes that have arisen in each patient's cancer. We performed mutational signature analyses …
>>>
Whole-genome sequencing (WGS) permits comprehensive cancer genome analyses, revealing mutational signatures, imprints of DNA damage and repair processes that have arisen in each patient's cancer. We performed mutational signature analyses on 12,222 WGS tumor-normal matched pairs, from patients recruited via the UK National Health Service. We contrasted our results to two independent cancer WGS datasets, the International Cancer Genome Consortium (ICGC) and Hartwig Foundation, involving 18,640 WGS cancers in total. Our analyses add 40 single and 18 double substitution signatures to the current mutational signature tally. Critically, we show for each organ, that cancers have a limited number of 'common' signatures and a long tail of 'rare' signatures. We provide a practical solution for utilizing this concept of common versus rare signatures in future analyses.
<<<
翻译
977.
颜林林
(2022-07-13 00:46):
#paper doi:10.1093/bib/bbac221 Briefings in Bioinformatics, 2022, A comprehensive benchmarking of WGS-based deletion structural variant callers. 这是一篇工具比较的方法学文章,针对基于全基因组测序数据鉴定结构变异(SV,structural variant)的工具,而且仅限定缺失(deletion)类型的SV。文章使用了瓶中基因组(genome-in-a-bottle)的结构变异集合,以及经PCR实验进行过验证的小鼠模型的结构变异集合,作为金标准,以便准确计算出每个工具的灵敏度、特异度等性能指标。评价结果反映了过去类似工作的表现:不同工具的表现之间的确差异很大,也确有一些工具在平衡灵敏度和特异度时表现不错。最终文章给出了相应的建议,即针对不同长度的缺失类型结构变异,相应推荐使用的工具。本文中规中矩,做得也算细致。比较有意思的是,在SV工具选择时的吐槽:排除需要配对样本的工具、排除只能检测很小片段变异的工具、排除仅支持长读长测序数据的工具,最终筛选出61个合适的工具,然而测试只使用了15或14个(分别针对小鼠和人的数据),只因为:其他工具都装不上!我个人也深有同感,姑且不说那些不舍得开放源码提供他人使用者,即使开源的,很多工具也并不容易被正常使用起来,需要阅读其源码并手工debug才能用起来的工具,并不罕见。
IF:6.800Q1
Briefings in bioinformatics,
2022-07-18.
DOI: 10.1093/bib/bbac221
PMID: 35753701
PMCID:PMC9294411
Abstract:
Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora …
>>>
Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories.
<<<
翻译
978.
颜林林
(2022-07-12 00:03):
#paper doi:10.1016/j.gpb.2022.04.009 Genomics, Proteomics & Bioinformatics, 2022, N6-methyladenosine and Its Implications in Viruses. 这是一篇关于m6A的综述。m6A是哺乳动物的mRNA上最常见的碱基修饰,而本文侧重于与病毒相关的m6A修饰的研究。这篇综述先概述了m6A的基本知识,包括m6A修饰碱基的占比及分布、进行m6A修饰或去修饰的调控蛋白,以及m6A在生物体中发挥的功能(如影响mRNA剪接、出核、翻译、降解等)。然后,又从技术角度,介绍检测该m6A修饰的不同实验方法。之后,进入正题,叙述这些年在各类病毒上开展的m6A相关研究,涉及SV40、乙肝、疱疹、HIV、丙肝、寨卡、登革热和新冠等病毒。从这些综述结果,可以看到m6A参与了各种各样的生物学活动。而在不同病毒中,m6A有时甚至行使着完全相反的功能。可见m6A更像是涉及底层机制过程的存在,而由它在基因调控网络中所处的时空位置不同,展示出不同的功能,而且,似乎万事都与之相关。m6A是近几年的研究热点,各类与之相关的数据挖掘层出不穷,大概也与这种“底层”且“普遍”的特性相关。对m6A的深入研究,有助于了解它对病毒复制等生命周期过程的影响,并为开发治疗病毒性疾病的药物提供基础研究支持,这很符合当前疫情时代之所需。
Genomics, proteomics & bioinformatics,
2023-08.
DOI: 10.1016/j.gpb.2022.04.009
PMID: 35835441
PMCID:PMC10787122
N6-甲基腺苷及其在病毒中的意义
Abstract:
N-methyladenine (mA) is the most abundant RNA modification in mammalian messenger RNAs (mRNAs), which participates in and regulates many important biological activities, such as tissue development and stem cell differentiation. …
>>>
N-methyladenine (mA) is the most abundant RNA modification in mammalian messenger RNAs (mRNAs), which participates in and regulates many important biological activities, such as tissue development and stem cell differentiation. Due to an improved understanding of mA, researchers have discovered that the biological function of mA can be linked to many stages of mRNA metabolism and that mA can regulate a variety of complex biological processes. In addition to its location on mammalian mRNAs, mA has been identified on viral transcripts. mA also plays important roles in the life cycle of many viruses and in viral replication in host cells. In this review, we briefly introduce the detection methods of mA, the mA-related proteins, and the functions of mA. We also summarize the effects of mA-related proteins on viral replication and infection. We hope that this review provides researchers with some insights for elucidating the complex mechanisms of the epitranscriptome related to viruses, and provides information for further study of the mechanisms of other modified nucleobases acting on processes such as viral replication. We also anticipate that this review can stimulate collaborative research from different fields, such as chemistry, biology, and medicine, and promote the development of antiviral drugs and vaccines.
<<<
翻译
N-甲基腺嘌呤 (mA) 是哺乳动物信使 RNA (mRNA) 中最丰富的 RNA 修饰,参与并调节许多重要的生物活动,如组织发育和干细胞分化。由于对 mA 的理解有所提高,研究人员发现 mA 的生物学功能可以与 mRNA 代谢的许多阶段相关联,并且 mA 可以调节各种复杂的生物过程。除了在哺乳动物 mRNA 上的位置外,mA 还在病毒转录本上被发现。mA在许多病毒的生命周期和病毒在宿主细胞中的复制中也起着重要作用。本文简要介绍了mA的检测方法、mA相关蛋白以及mA的功能。我们还总结了mA相关蛋白对病毒复制和感染的影响。我们希望这篇综述能为研究人员提供一些见解,以阐明与病毒相关的表观转录组的复杂机制,并为进一步研究其他修饰的核碱基作用于病毒复制等过程的机制提供信息。我们还预计,这篇综述可以促进化学、生物学和医学等不同领域的合作研究,并促进抗病毒药物和疫苗的开发。
979.
钟鸣
(2022-07-11 12:18):
#paper DOI: 10.1128/IAI.00963-06 Infection and immunity, 2007, Analysis of Bartonella adhesin A expression reveals differences between various B. henselae strains. 汉氏巴尔通体的BadA基因编码分子量340kDa的黏附素,是该物种重要的毒力因子。奇怪的是,在体外多次传代后,这个基因就不表达了。为探索可能的机制,作者分析了5株菌的BadA基因序列及启动子区域。他们发现,BadA的N端和C端是相同的,启动子区域也是相同的,他们认为BadA的表达缺失不是终止突变造成的,也不是启动子区域突变造成的,他们认为存在其他调控方式。
Abstract:
Bartonella henselae causes cat scratch disease and the vasculoproliferative disorders bacillary angiomatosis and peliosis hepatis in humans. One of the best known pathogenicity factors of B. henselae is Bartonella adhesin …
>>>
Bartonella henselae causes cat scratch disease and the vasculoproliferative disorders bacillary angiomatosis and peliosis hepatis in humans. One of the best known pathogenicity factors of B. henselae is Bartonella adhesin A (BadA), which is modularly constructed, consisting of head, neck/stalk, and membrane anchor domains. BadA is important for the adhesion of B. henselae to extracellular-matrix proteins and endothelial cells (ECs). In this study, we analyzed different B. henselae strains for BadA expression, autoagglutination, fibronectin (Fn) binding, and adhesion to ECs. We found that the B. henselae strains Marseille, ATCC 49882, Freiburg 96BK3 (FR96BK3), FR96BK38, and G-5436 express BadA. Remarkably, BadA expression was lacking in a B. henselae ATCC 49882 variant, in strains ATCC 49793 and Berlin-1, and in the majority of bacteria of strain Berlin-2. Adherence of B. henselae to ECs and Fn reliably correlated with BadA expression. badA was present in all tested strains, although the length of the gene varied significantly due to length variations of the stalk region. Sequencing of the promoter, head, and membrane anchor regions revealed only minor differences that did not correlate with BadA expression, apart from strain Berlin-1, in which a 1-bp deletion led to a frameshift in the head region of BadA. Our data suggest that, apart from the identified genetic modifications (frameshift deletion and recombination), other so-far-unknown regulatory mechanisms influence BadA expression. Because of variations between and within different B. henselae isolates, BadA expression should be analyzed before performing infection experiments with B. henselae.
<<<
翻译
980.
颜林林
(2022-07-11 00:41):
#paper doi:10.1101/2022.07.09.499321 bioRxiv, 2022, A Draft Human Pangenome Reference. 这应该又是一篇重磅文章,在bioRxiv上提前预发表出来。三十多家顶级单位合作,作者名单即使在使用“Human Pangenome Reference Consortium”做了浓缩后依然很长,包含不少让人熟知的名字,他们在过去这些年里曾反复出现在基因组学的各重磅文章中,比如其中就包含李恒这位大神,他赫然是通讯作者之一。全文篇幅长达97页(不含另外39页的补充材料),也反映出这项工作的体量重大。众所周知,我们一直在使用的人类参考基因组,其实来自最早的七八个人,他们的基因组,对于全人类的基因库而言,是很难相信有足够代表性的。于是这些年来,随着大量基因组数据的积累,参考基因组一直在更新迭代,打了一个又一个补丁。这篇文章所提出的“泛基因组参考(pangenome reference)”可以被认为是又一个重大改进和新版本发布,甚至可能这是接近“一劳永逸”的关键改进。它整合了多达47个个体基因组,这些个体基因组完成了定相位(phased)和二倍体组装(diploid assemblies)。且通过先前诸如HapMap、千人基因组等人类群体基因组研究的积累,确定了这47个个体的基因组差异足够大,能够涵盖超过 99% 的预期序列,并且在结构和碱基对水平上的准确率超过 99%。超长的篇幅中,详细展示了这套新参考基因组的完整构建过程,甚至精确到详细的命令行及参数,是非常值得仔细学习的。
bioRxiv,
2022.
DOI: 10.1101/2022.07.09.499321
Abstract:
The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover …
>>>
The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence and are more than 99% accurate at the structural and base-pair levels. Based on alignments of the assemblies, we generated a draft pangenome that captures known variants and haplotypes, reveals novel alleles at structurally complex loci, and adds 119 million base pairs of euchromatic polymorphic sequence and 1,529 gene duplications relative to the existing reference, GRCh38. Roughly 90 million of the additional base pairs derive from structural variation. Using our draft pangenome to analyze short-read data reduces errors when discovering small variants by 34% and boosts the detected structural variants per haplotype by 104% compared to GRCh38-based workflows, and by 34% compared to using previous diversity sets of genome assemblies.
<<<
翻译