当前共找到 1063 篇文献分享,本页显示第 541 - 560 篇。
541.
张德祥 (2023-01-03 19:36):
#paper https://doi.org/10.24963/ijcai.2020/243 NeurASP: Embracing Neural Networks into Answer Set Programming 通过将神经网络输出视为答案集程序中原子事实的概率分布, NeurASP  提供了一种简单有效的方法来集成子神经网络和符号计算。 推理可 以帮助识别违反语义约束的感知错误,这反过来可以使感知更加稳健。例如, 用于对象检测的神经网络可能会返回一个边界框及其分类“汽车”,但可 能不清楚它是真车还是玩具车。可以通过应用关于与周围物体的关系的推 理和使用常识知识来进行区分。或者当不清楚附着在汽车上的圆形物体是 轮子还是甜甜圈时,推理者可以根据常识得出结论,它更有可能是轮子。
Abstract:
We present NeurASP, a simple extension of answer set programs by embracing neural networks. By treating the neural network output as the probability distribution over atomic facts in answer set … >>>
We present NeurASP, a simple extension of answer set programs by embracing neural networks. By treating the neural network output as the probability distribution over atomic facts in answer set programs, NeurASP provides a simple and effective way to integrate sub-symbolic and symbolic computation. We demonstrate how NeurASP can make use of a pre-trained neural network in symbolic computation and how it can improve the neural network's perception result by applying symbolic reasoning in answer set programming. Also, NeurASP can make use of ASP rules to train a neural network better so that a neural network not only learns from implicit correlations from the data but also from the explicit complex semantic constraints expressed by the rules. <<<
翻译
542.
洪媛媛 (2023-01-03 17:45):
#paper https://doi.org/10.1186/s13073-022-01141-8. Genome Medicine (2022) 14:138. CRAG: de novo characterization of cell-free DNA fragmentation hotspots in plasma whole-genome sequencing. 该研究基于低深度全基因组测序(~1X),使用IFS(整合cfDNA覆盖度和片段大小)和CRAG算法(概率模型分析和背景噪音的区分度)挖掘cfDNA片段化热点区域,发现这些热点区域集中在开发染色质区,利用这些热点区域可以进行癌症的早筛和溯源。在训练集、验证集和独立测试集的AUC表现都不错。
IF:10.400Q1 Genome medicine, 2022-12-08. DOI: 10.1186/s13073-022-01141-8 PMID: 36482487
Abstract:
The fine-scale cell-free DNA fragmentation patterns in early-stage cancers are poorly understood. We developed a de novo approach to characterize the cell-free DNA fragmentation hotspots from plasma whole-genome sequencing. Hotspots … >>>
The fine-scale cell-free DNA fragmentation patterns in early-stage cancers are poorly understood. We developed a de novo approach to characterize the cell-free DNA fragmentation hotspots from plasma whole-genome sequencing. Hotspots are enriched in open chromatin regions, and, interestingly, 3'end of transposons. Hotspots showed global hypo-fragmentation in early-stage liver cancers and are associated with genes involved in the initiation of hepatocellular carcinoma and associated with cancer stem cells. The hotspots varied across multiple early-stage cancers and demonstrated high performance for the diagnosis and identification of tissue-of-origin in early-stage cancers. We further validated the performance with a small number of independent case-control-matched early-stage cancer samples. <<<
翻译
543.
龙海晨 (2023-01-02 13:45):
#paper Rosen D B, Murphy E A, Gejman R S, et al. Cytokine response over the course of COVID-19 infection in pregnant women[J]. Cytokine, 2022, 154: 155894. PMID: 35490452 PMCID: PMC9035355 DOI: 10.1016/j.cyto.2022.155894 这是研究新冠的文章。发表于2022年。样本是2020年3月到4月纽约市医院的。说明一下,那时候的新冠还不是现在低毒的奥密克戎,是最早的新冠病毒。文章研究的是妊娠期孕妇感染新冠后的血清情况。还有针对相应的细胞因子的治疗。回想2020年初新闻上自媒体上好多是在新冠上黑美国。其实公共防疫各国国情不同。美国的情况把人关房子里老百姓不答应。但对于高精尖层面的研究,2020年3月的时候咱们的核酸检测都没普及。美国治疗方法都到了对应不同人群和不同细胞因子的研究。文章对比分析了新冠阴性和阳性的孕妇细胞因子的水平。以及阳性孕妇不同感染时期各种细胞因子的水平。发现晚期妊娠妇女的细胞因子谱随感染的时间进程而变化,并与临床严重程度相关。
IF:3.700Q2 Cytokine, 2022-06. DOI: 10.1016/j.cyto.2022.155894 PMID: 35490452
Abstract:
OBJECTIVE: To study how severity and progression of coronavirus disease (COVID-19) affect cytokine profiles in pregnant women.MATERIALS AND METHODS: 69 third-trimester, pregnant women were tested for COVID-19 infection and SARS-CoV-2 … >>>
OBJECTIVE: To study how severity and progression of coronavirus disease (COVID-19) affect cytokine profiles in pregnant women.MATERIALS AND METHODS: 69 third-trimester, pregnant women were tested for COVID-19 infection and SARS-CoV-2 specific IgM and IgG antibodies. Patients were stratified according to SARS-CoV-2 Reverse Transcriptase-PCR (RT-PCR) status and serology (IgM and IgG) status. Cytokines G-CSF, HGF, IL-18, IL-1Ra, IL-2Ra, IL-8, and IP-10 were measured via ELISA. Retrospective chart review for COVID-19 symptoms and patient vitals was conducted, and cytokine levels were compared between SARS-CoV-2 positive and negative cohorts, by seronegative and seropositive infection, by time course since onset of infection, and according to NIH defined clinical severity.RESULTS: IL-18, IL-1Ra, and IP-10 increased in the 44 RT-PCR positive pregnant women compared to the 25 RT-PCR negative pregnant controls. Elevated cytokine levels were found in early infections, defined by positive RT-PCR and seronegative status, and higher cytokine levels were also associated with more severe disease. By IgM seroconversion, IL-8 and IP-10 returned to levels seen in uninfected patients, while IL-18 levels remained significantly elevated.CONCLUSION: Cytokine profiles of third-trimester pregnant women vary with the time course of infection and are correlated with clinical severity. <<<
翻译
544.
颜林林 (2023-01-01 22:47):
#paper doi:10.1186/s13059-022-02816-6 Genome Biology, 2022, Structural variant analysis of a cancer reference cell line sample using multiple sequencing technologies. 结构变异(SV)检测一直是基因组研究中充满挑战的一项工作。本文来自SEQC2(Sequencing Quality Control Phase 2)consortium。通过来自同一捐献者的乳腺癌组织及对照样本(外周血白细胞),分别构建了细胞系,作为研究材料。分别使用Illumina短读长测序、10x linked-reads测序、PacBio 和 Nanopore 长读长测序,以及 Hi-C测序,由此整合并最终鉴定出1788个SV。之后,又使用PCR方法、芯片方法、Bionano光学图谱、RNA-seq鉴别融合断点等独立的技术方法,对其中一部分结果进行验证,并评估了各技术平台对SV鉴定的性能。文章最终输出了一套SV参考集合,可用于各类SV方法的基准评估。
IF:10.100Q1 Genome Biology, 2022. DOI: 10.1186/s13059-022-02816-6
Abstract:
Abstract Background The cancer genome is commonly altered with thousands of structural rearrangements including insertions, deletions, translocation, inversions, duplications, and copy number variations. Thus, structural variant (SV) characterization plays a … >>>
Abstract Background The cancer genome is commonly altered with thousands of structural rearrangements including insertions, deletions, translocation, inversions, duplications, and copy number variations. Thus, structural variant (SV) characterization plays a paramount role in cancer target identification, oncology diagnostics, and personalized medicine. As part of the SEQC2 Consortium effort, the present study established and evaluated a consensus SV call set using a breast cancer reference cell line and matched normal control derived from the same donor, which were used in our companion benchmarking studies as reference samples. Results We systematically investigated somatic SVs in the reference cancer cell line by comparing to a matched normal cell line using multiple NGS platforms including Illumina short-read, 10X Genomics linked reads, PacBio long reads, Oxford Nanopore long reads, and high-throughput chromosome conformation capture (Hi-C). We established a consensus SV call set of a total of 1788 SVs including 717 deletions, 230 duplications, 551 insertions, 133 inversions, 146 translocations, and 11 breakends for the reference cancer cell line. To independently evaluate and cross-validate the accuracy of our consensus SV call set, we used orthogonal methods including PCR-based validation, Affymetrix arrays, Bionano optical mapping, and identification of fusion genes detected from RNA-seq. We evaluated the strengths and weaknesses of each NGS technology for SV determination, and our findings provide an actionable guide to improve cancer genome SV detection sensitivity and accuracy. Conclusions A high-confidence consensus SV call set was established for the reference cancer cell line. A large subset of the variants identified was validated by multiple orthogonal methods. <<<
翻译
545.
cellsarts (2023-01-01 00:01):
#paper 古生菌信号肽酶DOI 10.1099/mic.0.2006/003087-0;Microbiology (2007), 153, 305–314 古生菌构成了生命的第三个领域,不同于 细菌和真核生物。最初被认为只生活在极端环境中,古细菌 物种自此被发现种类繁多,种类繁多栖息地,它们在生态系统中扮演着重要的角色 。信号肽酶是蛋白质分泌途径中的重要酶。在古生菌中,I型信号 肽酶,负责从大部分分泌的信号肽中切割分泌信号肽 蛋白质和类前肽肽酶信号肽酶负责处理信号肽 像前鞭毛蛋白和各种糖结合蛋白一样的前鞭毛蛋白 识别。此外,古菌的信号肽肽酶,负责信号的降解。这些酶似乎 具有真核和细菌的镶嵌特征,同时也具有独特的古菌 特征。综述总结了关于这些酶的最新知识, 包括它们的细胞功能、催化机制以及在其中的分布和保存 古细菌物种。将这些酶与它们的细菌酶和真核酶进行比较 对应物和独特的古菌特征突出。
IF:1.300Q4 Microbiology, 2007. DOI: 10.1099/mic.0.2006/003087-0
Abstract:
Signal peptidases are vital enzymes in the protein secretion pathway. In Archaea, type I signal peptidase, responsible for the cleavage of secretory signal peptides from the majority of secreted proteins, … >>>
Signal peptidases are vital enzymes in the protein secretion pathway. In Archaea, type I signal peptidase, responsible for the cleavage of secretory signal peptides from the majority of secreted proteins, and prepilin peptidase-like signal peptidase, responsible for processing signal peptides from prepilin-like proteins like the preflagellins and various sugar-binding proteins, have been identified. In addition, the archaeal signal peptide peptidase, responsible for degradation of signal peptides after their removal from precursor proteins, has been characterized. These enzymes seem to have a mosaic of eukaryal and bacterial characteristics, and also possess unique archaeal traits. In this review, the most current knowledge with regard to these enzymes is summarized, including their cellular function, catalytic mechanism and distribution and conservation among archaeal species. Comparisons are drawn of these enzymes to their bacterial and eukaryal counterparts, and unique archaeal features highlighted. <<<
翻译
546.
小W (2023-01-01 00:00):
#paper doi: 10.1016/j.cell.2022.11.016. Epub 2022 Dec 13. Engineered cell entry links receptor biology with single-cell genomics 1.本文开发了一个模块化病毒展示和递送平台(ENTER),通过向靶细胞中递送配体,以解码细胞间配体-受体相互作用,并将配体-受体的相互作用与细胞状态联系起来,可以系统地对TCR-pMHC、抗体抗原、共刺激配体受体和BCR在内的相互作用进行展示。pMHC结果显示该病毒递送平台比mhc四聚体检测抗原特异性T细胞更敏感,在添加高滴度病毒(40 ng p24)时,ENTER能够检测到低至10.8 mM的TCR亲和力。ENTER能够通过抗原特异性递送自杀基因在T或B细胞池中选择性地耗尽一个T或B淋巴细胞克隆,或递送对抗细胞死亡受体使抗原特异性T细胞选择性存活,其可能在筛选免疫原性抗原或精英TCR,用于疫苗开发或癌症免疫治疗的合理设计;筛选靶向病毒抗原的BCR,促进治疗性抗体的开发;恢复耗竭的抗肿瘤T细;避免免疫相关的不良事件;杀死自身反应性T细胞或B细胞以治疗自身免疫疾病等方向发挥作用。2.ENTER平台与单细胞RNA-seq结合开发了ENTER-seq,捕获每个液滴中病毒RNA上的MHC肽信息,绘制TCR库和同源HLA抗原肽的相互作用。
IF:45.500Q1 Cell, 2022-12-22. DOI: 10.1016/j.cell.2022.11.016 PMID: 36516854
Abstract:
Cells communicate with each other via receptor-ligand interactions. Here, we describe lentiviral-mediated cell entry by engineered receptor-ligand interaction (ENTER) to display ligand proteins, deliver payloads, and record receptor specificity. We … >>>
Cells communicate with each other via receptor-ligand interactions. Here, we describe lentiviral-mediated cell entry by engineered receptor-ligand interaction (ENTER) to display ligand proteins, deliver payloads, and record receptor specificity. We optimize ENTER to decode interactions between T cell receptor (TCR)-MHC peptides, antibody-antigen, and other receptor-ligand pairs. A viral presentation strategy allows ENTER to capture interactions between B cell receptor and any antigen. We engineer ENTER to deliver genetic payloads to antigen-specific T or B cells to selectively modulate cellular behavior in mixed populations. Single-cell readout of ENTER by RNA sequencing (ENTER-seq) enables multiplexed enumeration of antigen specificities, TCR clonality, cell type, and states of individual T cells. ENTER-seq of CMV-seropositive patient blood samples reveals the viral epitopes that drive effector memory T cell differentiation and inter-clonal vs. intra-clonal phenotypic diversity targeting the same epitope. ENTER technology enables systematic discovery of receptor specificity, linkage to cell fates, and antigen-specific cargo delivery. <<<
翻译
547.
王昊 (2022-12-31 23:57):
#paper https://arxiv.org/abs/2111.08687v2 Jing Shao, Siyu Chen, Yangguang Li, et al. 2021. INTERN: A New Learning Paradigm Towards General Vision. 视觉基础模型的论文。“书生”(INTERN),旨在系统化解决当下人工智能视觉领域中存在的任务通用、场景泛化和数据效率等一系列瓶颈问题。“书生”由七大模块组成,包括通用视觉数据系统、通用视觉网络结构、通用视觉评测基准三个基础设施模块,以及区分上下游的四个训练阶段模块。多个阶段中学习到了很强的泛化能力。其可以在26个数据集上实现CV中的四类任务,仅使用10%的训练数据进行微调,性能便优于全套数据训练的对应模型。
Abstract:
Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society. However, down the road, a … >>>
Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society. However, down the road, a key challenge awaits us, that is, our capability of meeting rapidly-growing scenario-specific demands is severely limited by the cost of acquiring a commensurate amount of training data. This difficult situation is in essence due to limitations of the mainstream learning paradigm: we need to train a new model for each new scenario, based on a large quantity of well-annotated data and commonly from scratch. In tackling this fundamental problem, we move beyond and develop a new learning paradigm named INTERN. By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability. We evaluate our model on 26 well-known datasets that cover four categories of tasks in computer vision. In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data, often by a significant margin. This is an important step towards a promising prospect where such a model with general vision capability can dramatically reduce our reliance on data, thus expediting the adoption of AI technologies. Furthermore, revolving around our new paradigm, we also introduce a new data system, a new architecture, and a new benchmark, which, together, form a general vision ecosystem to support its future development in an open and inclusive manner. See project website at this https URL . <<<
翻译
548.
Ricardo (2022-12-31 23:50):
#paper http://dx.doi.org/10.1016/j.media.2015.04.005 Construction of 4D high-definition cortical surface atlases of infants: Methods and applications 在神经影像学中,皮层表面图谱在空间归一化、分析、可视化以及个体和不同研究结果的比较中发挥着重要作用。然而,现有的为成人创建的皮层表面图谱并不适合出生后头两年的婴儿大脑,这是出生后高度折叠的大脑皮层结构和功能发育最活跃的时期。因此非常需要婴儿时期的大脑皮层表面的时空图谱集,但目前仍缺乏精细的早期动态脑发育图谱。为了弥补这一重大差距,作者利用团队开发的婴儿皮层表面分析计算管道和自己获得的纵向MRI数据集,基于35名健康婴儿的202个系列MRI扫描,构建了第一个时空(4D)高清皮层表面地图集,用于七个时间点的动态发育研究,包括1、3、6、9、12、18和24个月龄。
IF:10.700Q1 Medical image analysis, 2015-Oct. DOI: 10.1016/j.media.2015.04.005 PMID: 25980388
Abstract:
In neuroimaging, cortical surface atlases play a fundamental role for spatial normalization, analysis, visualization, and comparison of results across individuals and different studies. However, existing cortical surface atlases created for … >>>
In neuroimaging, cortical surface atlases play a fundamental role for spatial normalization, analysis, visualization, and comparison of results across individuals and different studies. However, existing cortical surface atlases created for adults are not suitable for infant brains during the first two postnatal years, which is the most dynamic period of postnatal structural and functional development of the highly-folded cerebral cortex. Therefore, spatiotemporal cortical surface atlases for infant brains are highly desired yet still lacking for accurate mapping of early dynamic brain development. To bridge this significant gap, leveraging our infant-dedicated computational pipeline for cortical surface-based analysis and the unique longitudinal infant MRI dataset acquired in our research center, in this paper, we construct the first spatiotemporal (4D) high-definition cortical surface atlases for the dynamic developing infant cortical structures at seven time points, including 1, 3, 6, 9, 12, 18, and 24 months of age, based on 202 serial MRI scans from 35 healthy infants. For this purpose, we develop a novel method to ensure the longitudinal consistency and unbiasedness to any specific subject and age in our 4D infant cortical surface atlases. Specifically, we first compute the within-subject mean cortical folding by unbiased groupwise registration of longitudinal cortical surfaces of each infant. Then we establish longitudinally-consistent and unbiased inter-subject cortical correspondences by groupwise registration of the geometric features of within-subject mean cortical folding across all infants. Our 4D surface atlases capture both longitudinally-consistent dynamic mean shape changes and the individual variability of cortical folding during early brain development. Experimental results on two independent infant MRI datasets show that using our 4D infant cortical surface atlases as templates leads to significantly improved accuracy for spatial normalization of cortical surfaces across infant individuals, in comparison to the infant surface atlases constructed without longitudinal consistency and also the FreeSurfer adult surface atlas. Moreover, based on our 4D infant surface atlases, for the first time, we reveal the spatially-detailed, region-specific correlation patterns of the dynamic cortical developmental trajectories between different cortical regions during early brain development. <<<
翻译
549.
na na na (2022-12-31 23:50):
#paper,Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data(2018),DOI:10.1093/bioinformatics/bty026. 分享一篇算法工具类的文章,FSQN(feature specific quantile normalization);该方法主要是处理了 RNA-seq平台 转录组测序数据 和 芯片平台转录组测序数据的标准化问题。这个问题在做公共数据分析的时候尤其重要,通常的办法例如取log2,z-score以及用中位数做矫正等方法虽然可以在一定程度行把数据分布拉到一个区间上,但起分布依然是不一致的,导致在做机器学习建模的时候往往跨平台效果较差,该文章讨论了不同平台间批次产生的原因,并从应用角度入手,不仅比较了现有方法的劣势,也推出了FSQN的方法,该方法在测试数据集上,基于常见的分类器模型,实现了RNA-seq平台 98%的准确度和芯片平台97%准确度。还方法作者提供了R包:https://github.com/jenniferfranks/FSQN。我做过测试,通过PCA可以看到去批次效果较好,但未能实现文章中机器学习模型的高准确度,因此平台间数据的去批次方法和机器学习跨平台使用依然是一个可研究的方向,扩展思维的话,在RNA-seq和Nanostrign之间,RNA-seq和单细胞测序之间,芯片和Nanostrign之间都可以从数据矫正的角度出发去开发去批次的工具。
Abstract:
Motivation: Molecular subtypes of cancers and autoimmune disease, defined by transcriptomic profiling, have provided insight into disease pathogenesis, molecular heterogeneity and therapeutic responses. However, technical biases inherent to different gene … >>>
Motivation: Molecular subtypes of cancers and autoimmune disease, defined by transcriptomic profiling, have provided insight into disease pathogenesis, molecular heterogeneity and therapeutic responses. However, technical biases inherent to different gene expression profiling platforms present a unique problem when analyzing data generated from different studies. Currently, there is a lack of effective methods designed to eliminate platform-based bias. We present a method to normalize and classify RNA-seq data using machine learning classifiers trained on DNA microarray data and molecular subtypes in two datasets: breast invasive carcinoma (BRCA) and colorectal cancer (CRC).Results: Multiple analyses show that feature specific quantile normalization (FSQN) successfully removes platform-based bias from RNA-seq data, regardless of feature scaling or machine learning algorithm. We achieve up to 98% accuracy for BRCA data and 97% accuracy for CRC data in assigning molecular subtypes to RNA-seq data normalized using FSQN and a support vector machine trained exclusively on DNA microarray data. We find that maximum accuracy was achieved when normalizing RNA-seq datasets that contain at least 25 samples. FSQN allows comparison of RNA-seq data to existing DNA microarray datasets. Using these techniques, we can successfully leverage information from existing gene expression data in new analyses despite different platforms used for gene expression profiling.Availability and implementation: FSQN has been submitted as an R package to CRAN. All code used for this study is available on Github (https://github.com/jenniferfranks/FSQN).Contact: michael.l.whitfield@dartmouth.edu.Supplementary information: Supplementary data are available at Bioinformatics online. <<<
翻译
550.
Arwen (2022-12-31 23:45):
#paper https://doi.org/10.1038/s41380-022-01924-w Inflammation and cognition in severe mental illness: patterns of covariation and subgroups 免疫/炎症通路失调与认知障碍之间的潜在关系已在严重精神疾病 (SMI) 中提出,例如精神分裂症和双相谱系障碍。 然而,外周炎症/免疫相关标志物与认知领域之间的多变量关系尚不清楚,许多研究并未考虑认知功能和炎症/免疫状态的个体差异。 本研究旨在调查炎症/免疫相关标记物与认知域之间的协方差模式,并进一步阐明大型 SMI 和健康对照 (HC) 队列中的异质性 (SZ = 343, BD = 289, HC = 770)。 应用典型相关分析 (CCA) 来识别综合选择的认知域和炎症/免疫标记之间的最大协变模式。 发现,较差的语言学习和精神运动处理速度与更高水平的白细胞介素 18 系统细胞因子和 β 防御素 2 相关,反映出先天免疫激活增强,与 HC 相比,SMI 中的这种模式增强。 对 CCA 识别的协方差模式应用层次聚类揭示了一个以 HC 为主(24% SZ、45% BD、74% HC)的高认知-低免疫失调亚组和一个主要由 SMI 患者组成的低认知-高免疫失调亚组( 76% SZ,55% BD,26% HC)。 这些亚组在智商、受教育年限、年龄、CRP、BMI(所有组)、功能水平、症状和抗精神病药的限定日剂量 (DDD)(SMI 队列)方面存在差异。 研究结果表明:在一部分患有严重精神疾病的个体中,认知障碍与先天免疫失调之间存在联系。 研究启发:多变量方法与异质性思想的结合可借鉴
IF:9.600Q1 Molecular psychiatry, 2023-03. DOI: 10.1038/s41380-022-01924-w PMID: 36577840
Abstract:
A potential relationship between dysregulation of immune/inflammatory pathways and cognitive impairment has been suggested in severe mental illnesses (SMI), such as schizophrenia (SZ) and bipolar (BD) spectrum disorders. However, multivariate … >>>
A potential relationship between dysregulation of immune/inflammatory pathways and cognitive impairment has been suggested in severe mental illnesses (SMI), such as schizophrenia (SZ) and bipolar (BD) spectrum disorders. However, multivariate relationships between peripheral inflammatory/immune-related markers and cognitive domains are unclear, and many studies do not account for inter-individual variance in both cognitive functioning and inflammatory/immune status. This study aimed to investigate covariance patterns between inflammatory/immune-related markers and cognitive domains and further elucidate heterogeneity in a large SMI and healthy control (HC) cohort (SZ = 343, BD = 289, HC = 770). We applied canonical correlation analysis (CCA) to identify modes of maximum covariation between a comprehensive selection of cognitive domains and inflammatory/immune markers. We found that poor verbal learning and psychomotor processing speed was associated with higher levels of interleukin-18 system cytokines and beta defensin 2, reflecting enhanced activation of innate immunity, a pattern augmented in SMI compared to HC. Applying hierarchical clustering on covariance patterns identified by the CCA revealed a high cognition-low immune dysregulation subgroup with predominantly HC (24% SZ, 45% BD, 74% HC) and a low cognition-high immune dysregulation subgroup predominantly consisting of SMI patients (76% SZ, 55% BD, 26% HC). These subgroups differed in IQ, years of education, age, CRP, BMI (all groups), level of functioning, symptoms and defined daily dose (DDD) of antipsychotics (SMI cohort). Our findings suggest a link between cognitive impairment and innate immune dysregulation in a subset of individuals with severe mental illness. <<<
翻译
551.
林海onrush (2022-12-31 23:26):
#paper,A Data-driven Sequential Localization Framework for Big Telco Data,IEEE Transactions on Knowledge and Data Engineering(2021),DOI: 10.1109/TKDE.2019.2961657 通讯基础设施的迅速发展带来了巨大的MR数据的累积。这些数据被移动物体生成,当连接到数据服务时被存储。地图标记或局部化这样的MR数据被认为对通讯和交通网络优化有很大的影响。为了在学习过程中处理数据密集型工作负载,华为诺亚团队使用物化视图以实现高效的在线本地化和轻量级索引技术用于周期性参数调优,以提高效率和可扩展性。真实数据的结果表明,与最先进的解决方案相比,该解决方案将中位数定位误差提高了 58.8%。 重点勾画:文章简要介绍了隐马尔可夫模型(HMM),该模型捕获了两种类型的随机过程之间的联系:未观察到的状态转换过程和由每个未观察到状态的可观察变量组成的观察过程。首先进行了几个实验来验证以下问题:机器学习单点定位模型的有效性,排放和转移概率解决方案的有效性,以及顺序定位系统与最新基线相比的性能。设计实验来展示提出的索引技术的效率,以及参数调整对系统性能的影响。提出了一个数据驱动的框架,用于电信数据的顺序定位,并配备了一套全面的机器学习和数据管理技术。与最新的序列定位方法相比,作者提出的框架在中值误差方面实现了58.8%的改进,使解决方案在准确性和可采用性方面具有优势;提出了有效的数据访问和索引方法,以支持学习过程中涉及的数据密集型计算。
Abstract:
The proliferation of telco networks and mobile terminals brings the accumulation of tremendous amounts of measure report(MR) data at a rapid pace. The MR data is generated by mobile objects … >>>
The proliferation of telco networks and mobile terminals brings the accumulation of tremendous amounts of measure report(MR) data at a rapid pace. The MR data is generated by mobile objects while connecting to data services and is stored in backend data centers. To geo-tag or localize such MR data is believed to have a profound effect on the analytics and optimizations of telco and traffic networks. However, MR records are of noisy and partial observations regarding to mobile objects' geo-locations and hence pose challenges to accurate telco data localization. There have been quite a few attempts. Single-point localization methods map a MR record to a location, but come out with limited accuracies due to the ignorance of spatiotemporal coherence of successive MR records. Recent efforts on sequential localization techniques alleviate this by mapping a sequence of MR records to a trajectory. However, existing solutions are often with assumptions on specific models, e.g., mobility and signal strength distributions, or priori knowledge on topology space, e.g., road networks, limiting the deployment in practice. To this end, we propose a data-driven framework to tackle the challenges in sequential telco localization. We solely use raw MR records and a public third-party GPS dataset for the learning of the correlations between mobile objects' locations and MR records, requiring no model assumptions and priori knowledge. To handle the data-intensive workloads during the learning process, we use materialized views for efficient online localization and light-weighted indexing techniques for periodical parameters tuning, in order to improve the efficiency and scalability. Results on real data show that our solution achieves 58.8 percent improvement in median localization errors compared with state-of-art sequential localization techniques that require hypothesis models and priori knowledge, making our solution superior in terms of effectiveness, efficiency, and employability. <<<
翻译
552.
白鸟 (2022-12-31 23:22):
#paper https://doi.org/10.1016/j.csbj.2020.06.012 Computational and Structural Biotechnology Journal 2020. Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation. 关注点:这是一篇关于单细胞ATAC-seq分析的综述文章,比较系统地从数据的预处理到生成科学假设的过程进行了详细方法论的说明和基准测试,使用适当的软件工具和数据库,提供有价值的分析方法指导。 研究背景:与人类复杂性状相关的大多数遗传变异位于基因组非编码区域。因此,了解基因型到表型之间的生物学机理机制的研究,大多涉及基因表达的表观遗传调控。开放染色质区域的全基因组图谱可以通过顺式和反式调控元件与性状相关序列变异的关联分析,促进顺式和跨式调控元件的功能分析。ATAC-seq测序 技术,转座酶可及染色质分析被认为是染色质可及性全基因组分析中最容易获得且最具成本效益的策略。 研究不足:目前,还开发了单细胞 ATAC-seq (scATAC-seq) 技术,来研究不同异质细胞群的组织样本中细胞类型特异性染色质的可及性差异。但是,由于 scATAC-seq 数据的固有特性,高噪声和稀疏性,很难准确提取生物信号并设计有效的生物学假设。为了克服 scATAC-seq 数据分析中的这些限制,过去几年研究者开发了一些新的方法和软件工具。然而,scATAC-seq 数据分析的最佳和标准分析流程并未达成共识。 内容大纲:1.阐述scATAC-seq 分析工作流程:数据的预处理,测序read的预处理->过滤掉低质量细胞或双细胞->生成细胞-特征矩阵->多样本的批次校正和数据整合->数据转换,包括归一化->降维、可视化和聚类。以上跟scrna-seq的步骤很相似,又有其特殊性。2.scATAC-seq生成科学假设的下游分析:包括细胞类型注释,染色质可及性动力学研究,基于TF motif,基于基因,增强子,基因-疾病相关遗传变异的研究促进假说的生成。以阐明顺式调控元件(例如启动子和增强子)与反式调控元件(例如转录因子 (TF))之间的网络。还可以使用 scATAC-seq 数据分析基因活性和遗传变异的可及性。3.多模态分析:scATAC-seq 可以与单细胞 RNA 测序 (scRNA-seq) 数据 和其他组学数据相结合,用于多组学研究。这种综合多模态分析将有助于识别参与疾病进展的关键调节因子,这些调节因子通常是潜在的治疗靶点和诊断生物标志物。
Abstract:
Most genetic variations associated with human complex traits are located in non-coding genomic regions. Therefore, understanding the genotype-to-phenotype axis requires a comprehensive catalog of functional non-coding genomic elements, most of … >>>
Most genetic variations associated with human complex traits are located in non-coding genomic regions. Therefore, understanding the genotype-to-phenotype axis requires a comprehensive catalog of functional non-coding genomic elements, most of which are involved in epigenetic regulation of gene expression. Genome-wide maps of open chromatin regions can facilitate functional analysis of cis- and trans-regulatory elements via their connections with trait-associated sequence variants. Currently, Assay for Transposase Accessible Chromatin with high-throughput sequencing (ATAC-seq) is considered the most accessible and cost-effective strategy for genome-wide profiling of chromatin accessibility. Single-cell ATAC-seq (scATAC-seq) technology has also been developed to study cell type-specific chromatin accessibility in tissue samples containing a heterogeneous cellular population. However, due to the intrinsic nature of scATAC-seq data, which are highly noisy and sparse, accurate extraction of biological signals and devising effective biological hypothesis are difficult. To overcome such limitations in scATAC-seq data analysis, new methods and software tools have been developed over the past few years. Nevertheless, there is no consensus for the best practice of scATAC-seq data analysis yet. In this review, we discuss scATAC-seq technology and data analysis methods, ranging from preprocessing to downstream analysis, along with an up-to-date list of published studies that involved the application of this method. We expect this review will provide a guideline for successful data generation and analysis methods using appropriate software tools and databases for the study of chromatin accessibility at single-cell resolution. <<<
翻译
553.
洪媛媛 (2022-12-31 23:21):
#paper doi: 10.1158/2159-8290.CD-22-0659. Cancer Discov 2022. Detecting liver cancer using cell-free DNA fragmentomes. 利用cfDNA片段大小(473个5MB的基因组区域的片段)、基因组不稳定性和转录因子结合区域覆盖度进行HCC早筛,方法是低深度全基因组测序~2.6X,使用机器学习方法分析健康人 VS HCC病人特异性98%,灵敏度88%;高风险人群 VS HCC病人特异性80%,灵敏度85%。并且在独立队列进行了验证。
IF:29.700Q1 Cancer discovery, 2023-03-01. DOI: 10.1158/2159-8290.CD-22-0659 PMID: 36399356
Abstract:
Liver cancer is a major cause of cancer mortality worldwide. Screening individuals at high risk, including those with cirrhosis and viral hepatitis, provides an avenue for improved survival, but current … >>>
Liver cancer is a major cause of cancer mortality worldwide. Screening individuals at high risk, including those with cirrhosis and viral hepatitis, provides an avenue for improved survival, but current screening methods are inadequate. In this study, we used whole-genome cell-free DNA (cfDNA) fragmentome analyses to evaluate 724 individuals from the United States, the European Union, or Hong Kong with hepatocellular carcinoma (HCC) or who were at average or high-risk for HCC. Using a machine learning model that incorporated multifeature fragmentome data, the sensitivity for detecting cancer was 88% in an average-risk population at 98% specificity and 85% among high-risk individuals at 80% specificity. We validated these results in an independent population. cfDNA fragmentation changes reflected genomic and chromatin changes in liver cancer, including from transcription factor binding sites. These findings provide a biological basis for changes in cfDNA fragmentation in patients with liver cancer and provide an accessible approach for noninvasive cancer detection. <<<
翻译
554.
大勇 (2022-12-31 23:13):
# paper Aversive memory formation in humans involves an amygdala-hippocampus phase code,2022,nature communication,https://doi.org/10.1038/s41467-022-33828-2 我们对于情绪性事件一般都会有一个更深刻的记忆,这一机制被认为是由于杏仁核调节了海马活动而导致的,然而这两个脑区间是如何交互的,其又是通过怎样一种神经动态的机制来影响记忆的并不清楚,本文作者利用颅内记录,发现成功编码的情绪记忆会伴随杏仁核theta相位与海马gamma振荡及神经元放电的耦合,随后记得和不记得的情绪刺激之间的相位差转化为一个时间段,形成了杏仁核和下游海马伽马之间的一致性滞后。这些结果揭示了一种机制,杏仁核 theta 相位协调瞬态杏仁核-海马伽马相干性以促进厌恶记忆编码。杏仁核可以传递情绪记忆的内容到其他脑区从而调节其他认知功能。
IF:14.700Q1 Nature communications, 2022-10-27. DOI: 10.1038/s41467-022-33828-2 PMID: 36302909
Abstract:
Memory for aversive events is central to survival but can become maladaptive in psychiatric disorders. Memory enhancement for emotional events is thought to depend on amygdala modulation of hippocampal activity. … >>>
Memory for aversive events is central to survival but can become maladaptive in psychiatric disorders. Memory enhancement for emotional events is thought to depend on amygdala modulation of hippocampal activity. However, the neural dynamics of amygdala-hippocampal communication during emotional memory encoding remain unknown. Using simultaneous intracranial recordings from both structures in human patients, here we show that successful emotional memory encoding depends on the amygdala theta phase to which hippocampal gamma activity and neuronal firing couple. The phase difference between subsequently remembered vs. not-remembered emotional stimuli translates to a time period that enables lagged coherence between amygdala and downstream hippocampal gamma. These results reveal a mechanism whereby amygdala theta phase coordinates transient amygdala -hippocampal gamma coherence to facilitate aversive memory encoding. Pacing of lagged gamma coherence via amygdala theta phase may represent a general mechanism through which the amygdala relays emotional content to distant brain regions to modulate other aspects of cognition, such as attention and decision-making. <<<
翻译
555.
张浩彬 (2022-12-31 23:07):
#paper doi:10.1145/3447548.3467401 A transformer-based framework for multivariate time series representation learning 1.多头transformer可以对应到时间序列的多周期。 2.  在通用框架中:原始数据先进行投影并加入位置信息得到第一次引入位置的编码 3.  只用transformer的编码器提取特征,而不适用解码器,使得其更能适应各种下游任务 4.  另外由于transformer对顺序不敏感,因此模型也将位置编码到输入向量 5.  对于变长数据的处理,本文使用任意值掩码进行填充,并为填充位置的注意力分数提供了一个很大的负值迫使忽略填充位置(这个掩码是初始值,后续是否有可能更新到非负值?) 6.  掩码的实际应用了一定的技巧。另外对掩码的预测实际上就将其变为了一个非时间序列问题,而是一个nlp的填空问题 7.  预训练模型:对于多变量的时间序列,对于每个变量随机独立地屏蔽一段子序列。而在损失函数中,仅考虑对被屏蔽段的损失。 8.  模型最后的任务是回归和分类。但是回归并不是用于对未来时间的预测,而是类似于利用房屋的气压,湿度,风速数据预测房屋的当天能耗,使用的是MSE。分类任务则是使用交叉熵 9.  下游任务似乎只是简单的全连接层 10.  模型的比较对象是reocket,lstm,xgb--这个比较就有点差强人意了
Abstract:
We present a novel framework for multivariate time series representation learning based on the transformer encoder architecture. The framework includes an unsupervised pre-training scheme, which can offer substantial performance benefits … >>>
We present a novel framework for multivariate time series representation learning based on the transformer encoder architecture. The framework includes an unsupervised pre-training scheme, which can offer substantial performance benefits over fully supervised learning on downstream tasks, both with but even without leveraging additional unlabeled data, i.e., by reusing the existing data samples. Evaluating our framework on several public multivariate time series datasets from various domains and with diverse characteristics, we demonstrate that it performs significantly better than the best currently available methods for regression and classification, even for datasets which consist of only a few hundred training samples. Given the pronounced interest in unsupervised learning for nearly all domains in the sciences and in industry, these findings represent an important landmark, presenting the first unsupervised method shown to push the limits of state-of-the-art performance for multivariate time series regression and classification. <<<
翻译
556.
负负 (2022-12-31 23:02):
#paper doi: 10.1109/ICCV.2019.00452. Dmytro Kotovenko et al., 2019, Content and Style Disentanglement for Artistic Style Transfer. 该项工作使用了一种生成对抗网络框架用来提取艺术油画作品中的内容(content)特征和风格(特征),并将这些特征应用在了艺术作品的风格迁移。除了生成对抗网络常用的损失函数之外(例如,MSE for Generator、 log(p)+log(1-q) for Discriminator),该团队在训练模型时考虑到了Triplet Loss —— 简单来说:如果存在梵高的两幅艺术作品A和B,以及莫奈的一幅作品C,那么在style encoder所编码的latent space下A应该离B更近,但离C更远,换句话说此时A样本作为一个“锚点”,编码器试图拉近B和A的距离而疏远C和A的距离;同理,Content编码器也通过这种Triplet loss的方式进行学习。虽然艺术风格迁移的问题已经提出了很长时间,但这篇文章的创新点在于,他提出的模型不仅生成了质量更高、更生动形象的作品,而且还在这一过程中学习到了不同艺术家的创作理念、创作风格,编码器学习到的“Style”这一抽象概念在latent space下是平滑的,能够较好地完成不同艺术家作品之间的风格迁移。
Abstract:
Artists rarely paint in a single style throughout their career. More often they change styles or develop variations of it. In addition, artworks in different styles and even within one … >>>
Artists rarely paint in a single style throughout their career. More often they change styles or develop variations of it. In addition, artworks in different styles and even within one style depict real content differently: while Picasso's Blue Period displays a vase in a blueish tone but as a whole, his Cubist works deconstruct the object. To produce artistically convincing stylizations, style transfer models must be able to reflect these changes and variations. Recently many works have aimed to improve the style transfer task, but neglected to address the described observations. We present a novel approach which captures particularities of style and the variations within and separates style and content. This is achieved by introducing two novel losses: a fixpoint triplet style loss to learn subtle variations within one style or between different styles and a disentanglement loss to ensure that the stylization is not conditioned on the real input photo. In addition the paper proposes various evaluation methods to measure the importance of both losses on the validity, quality and variability of final stylizations. We provide qualitative results to demonstrate the performance of our approach. <<<
翻译
557.
半面阳光 (2022-12-31 22:55):
#paper doi: 10.1097/GIM.0b013e3182217a3a, Genetics in Medicine, 2011, American College of Medical Genetics standards and guidelines for interpretation and reporting of postnatal constitutional copy number variants. 临床CNV变异解读的早期指南。这篇指南是目前广泛使用的2019版CNV解读指南的“前传”。这个版本的指南中清晰地界定了进行临床解读的CNV的定义和范围,并且提出了在解读时,要区分CNV的“致病性”和“临床意义”(也就是表型的对应和关联)这两个维度的信息。虽然在将近10年后的2019年发布了将CNV致病性解读进行半定量评分的重大更新,但是这篇指南的意义同样重要,对临床场景的CNV解读的流程、信息证据的搜集整理评估、报告的撰写发布、技术平台的局限性等等关键问题都进行了清晰地讨论和分析,是一篇承上启下的指南。
Abstract:
Genomic microarrays used to assess DNA copy number are now recommended as first-tier tests for the postnatal evaluation of individuals with intellectual disability, autism spectrum disorders, and/or multiple congenital anomalies. … >>>
Genomic microarrays used to assess DNA copy number are now recommended as first-tier tests for the postnatal evaluation of individuals with intellectual disability, autism spectrum disorders, and/or multiple congenital anomalies. Application of this technology has resulted in the discovery of widespread copy number variation in the human genome, both polymorphic variation in healthy individuals and novel pathogenic copy number imbalances. To assist clinical laboratories in the evaluation of copy number variants and to promote consistency in interpretation and reporting of genomic microarray results, the American College of Medical Genetics has developed the following professional guidelines for the interpretation and reporting of copy number variation. These guidelines apply primarily to evaluation of constitutional copy number variants detected in the postnatal setting. <<<
翻译
558.
muton (2022-12-31 22:43):
#paper doi: https://doi.org/10.1101/2022.10.03.510672 Human hippocampal ripples signal encoding of episodic memories biorixv 2022 海马尖波涟漪是在哺乳动物电生理中发现的一个很特别具有代表性的成分,最开始是在小鼠研究中被发现,随着人类脑电记录的发展,颅内记录的出现让研究尖波涟漪在人类中变为现实,以往在人类的研究中更多关注于ripple和记忆提取之间的关系,很少研究在编码信息,尤其是单个项目时ripple的作用,本文则填补了这一空白,通过124名被试的情景记忆任务表现,作者发现虽然在MTL等重要脑区能够发现高频信号的随后记忆效应,但ripple并未表现出差异,但令人新奇的是ripple会在记忆item在编码时间上相近或语义相近的item时表现出更频繁的发放,也被称为一种聚类效应,并且这一现象在编码和提取阶段都能够被发现,这种现象可能代表了一种对于记忆的保留,有助于预测和提取记忆。本篇文章对于探究ripple这一脑电成分在人类情景记忆中的功能有重要提示。
Abstract:
AbstractRecent human electrophysiology work has uncovered the presence of high frequency oscillatory events, termed ripples, during awake behavior. This prior work focuses on ripples in the medial temporal lobe (MTL) … >>>
AbstractRecent human electrophysiology work has uncovered the presence of high frequency oscillatory events, termed ripples, during awake behavior. This prior work focuses on ripples in the medial temporal lobe (MTL) during memory retrieval. Few studies, however, investigate ripples during item encoding. Many studies have found neural activity during encoding that predicts later recall, termed subsequent memory effects (SMEs), but it is unclear if ripples during encoding also predict subsequent recall. Detecting ripples in 124 neurosurgical participants performing an episodic memory task, we find insignificant ripple SMEs in any MTL region, even as these regions exhibit robust high frequency activity (HFA) SMEs. Instead, hippocampal ripples increase during encoding of items leading to recall of temporally or semantically associated items, a phenomenon known as clustering. This subsequent clustering effect (SCE) arises specifically when hippocampal ripples occur during both encoding and retrieval, suggesting that ripples mediate the encoding and future reinstatement of episodic memories. <<<
翻译
559.
(2022-12-31 22:41):
#paper Differential gene expression in dairy cows under negative energy balance and ketosis: A systematic review and meta-analysis. J Dairy Sci. 2021 Jan;104(1):602-615. doi: 10.3168/jds.2020-18883. 为了评估负能量平衡(NEB)、亚临床和临床酮症奶牛肝脏中差异基因表达的模式,该文筛选了NEB和临床和亚临床酮症期间基因表达的同行评审和相关文章(其中考虑到血浆β-羟基丁酸盐水平),创建维恩图以整合系统综述中获得的数据,并使用官方基因名称进行基因本体富集分析,确定了三种重要的代谢途径与NEB和亚临床和临床酮症相关。基因网络分析揭示了34个与脂肪酸转运和脂肪酸代谢功能相关的基因之间的共表达相互作用。在标记的QTL中,鉴定出9个与酮症相关的QTL。基因表达和GWAS数据的整合为奶牛NEB和亚临床和临床酮症的遗传背景提供了额外的理解。
IF:3.700Q2 Journal of dairy science, 2021-Jan. DOI: 10.3168/jds.2020-18883 PMID: 33189279
Abstract:
Development of ketosis in high-producing dairy cows contributes to several animal health issues and highlights the need for a better understanding of the genetic basis of metabolic diseases. To evaluate … >>>
Development of ketosis in high-producing dairy cows contributes to several animal health issues and highlights the need for a better understanding of the genetic basis of metabolic diseases. To evaluate the pattern of differential gene expression in the liver of cows under negative energy balance (NEB), and under subclinical and clinical ketosis, a meta-analysis of gene expression and genome-wide association studies results was performed. An initial systematic review identified 118 articles based on the key words "cow," "liver," "negative energy balance," "ketosis," "expression," "qPCR," "microarray," "proteomic," "RNA-Seq," and "GWAS." After further screening for only peer-reviewed and pertinent articles for gene expression during NEB and clinical and subclinical ketosis (considering plasma levels of β-hydroxybutyrate), 20 articles were included in the analysis. From the systematic review, 430 significant SNPs identified by genome-wide association studies (GWAS) were assigned to genes reported in gene expression studies by considering chromosome and base pair positions in the ARS-UCD 1.2 bovine assembly. Venn diagrams were created to integrate the data obtained in the systematic review, and Gene Ontology enrichment analysis was carried out using official gene names. A QTL enrichment analysis was also performed to identify potential positional candidate loci. Twenty-four significant SNPs were located within the coordinates of differentially expressed genes located on chromosomes 2, 3, 6, 9, 11, 14, 27, and 29. Three significant metabolic pathways were associated with NEB and subclinical and clinical ketosis. In addition, 2 important genes, PPARA (peroxisome proliferator activated receptor alpha) and ACACA (acetyl-coenzyme A carboxylase α), were identified, which were differentially expressed in the 3 metabolic conditions. The PPARA gene is involved in the regulation of lipid metabolism and fatty liver disease and the ACACA gene encodes an enzyme that catalyzes the carboxylation of acetyl-coenzyme A to malonyl-coenzyme A, which is a rate-limiting step in fatty acid synthesis. Gene network analysis revealed co-expression interactions among 34 genes associated with functions involving fatty acid transport and fatty acid metabolism. For the annotated QTL, 9 QTL were identified for ketosis. The genes FN1 (fibronectin 1) and PTK2 (protein tyrosine kinase 2), which are mainly involved in cell adhesion and formation of extracellular matrix constituents, were enriched for QTL previously associated with the trait "ketosis" on chromosome 2 and for the trait "milk iron content" on chromosome 14, respectively. This integration of gene expression and GWAS data provides an additional understanding of the genetic background of NEB and subclinical and clinical ketosis in dairy cattle. Thus, it is a useful approach to identify biological mechanisms underlying these metabolic conditions in dairy cattle. <<<
翻译
560.
小擎子 (2022-12-31 22:08):
#paper doi: 10.1016/j.xgen.2022.100179. Cell Genom, 2022, Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor。COSMIC网站推荐的检测肿瘤特征的工具,文献主推SigProfilerExtractor,也测试了另外13种检测肿瘤特征的工具。文献做了广泛的数据模拟,添加不同的噪声干扰,比较不同工具的算法表现,做了肿瘤特征提取较为可靠的benchmarking。
IF:11.100Q1 Cell genomics, 2022-Nov-09. DOI: 10.1016/j.xgen.2022.100179 PMID: 36388765
Abstract:
Mutational signature analysis is commonly performed in cancer genomic studies. Here, we present SigProfilerExtractor, an automated tool for extraction of mutational signatures, and benchmark it against another 13 bioinformatics tools … >>>
Mutational signature analysis is commonly performed in cancer genomic studies. Here, we present SigProfilerExtractor, an automated tool for extraction of mutational signatures, and benchmark it against another 13 bioinformatics tools by using 34 scenarios encompassing 2,500 simulated signatures found in 60,000 synthetic genomes and 20,000 synthetic exomes. For simulations with 5% noise, reflecting high-quality datasets, SigProfilerExtractor outperforms other approaches by elucidating between 20% and 50% more true-positive signatures while yielding 5-fold less false-positive signatures. Applying SigProfilerExtractor to 4,643 whole-genome- and 19,184 whole-exome-sequenced cancers reveals four novel signatures. Two of the signatures are confirmed in independent cohorts, and one of these signatures is associated with tobacco smoking. In summary, this report provides a reference tool for analysis of mutational signatures, a comprehensive benchmarking of bioinformatics tools for extracting signatures, and several novel mutational signatures, including one putatively attributed to direct tobacco smoking mutagenesis in bladder tissues. <<<
翻译
回到顶部