当前共找到 1086 篇文献分享,本页显示第 721 - 740 篇。
721.
颜林林
(2022-09-25 15:32):
#paper doi:10.1101/2022.09.20.22280143 medRxiv, 2022, Whole-Genome Promoter Profiling of Plasma Cell-Free DNA Exhibits Predictive Value for Preterm Birth. 这篇文章试图从孕期母亲外周血cfDNA中发现早产相关生物标志物。对20例足月与20例早产的入组孕产妇进行全基因组测序,以及相应胎盘和外周血的全转录组测序,从中找到差异表达基因,并与外周血cfDNA中相应基因上游调控序列的覆盖深度进行关联,由此得到的特征,在2590例孕产妇(2072足月对518早产)的NIPT数据中进行验证,并预期此检测将为当前NIPT服务提供更多附加价值。这是一篇预发表文章,其摘要仅仅提及最后的两千多例的模型及性能,与正文整体逻辑还是有一定区别的,显然其文章逻辑还需要再继续打磨,不过这套数据及结果还是挺值得关注下的。
medRxiv,
2022.
DOI: 10.1101/2022.09.20.22280143
Abstract:
Preterm birth (PTB) occurs in around 11% of all births worldwide, resulting in significant morbidity and mortality for both mothers and offspring. Identification of pregnancies at risk of preterm birth …
>>>
Preterm birth (PTB) occurs in around 11% of all births worldwide, resulting in significant morbidity and mortality for both mothers and offspring. Identification of pregnancies at risk of preterm birth in early pregnancy may help improve intervention and reduce its incidence. However, there exist few methods for PTB prediction developed with large sample size, high throughput screening and validation in independent cohorts. Here, we established a large scale, multi center, and case control study that included 2,590 pregnancies (2,072 full term and 518 preterm pregnancies) from three independent hospitals to develop a preterm birth classifier. We implemented whole genome sequencing on their plasma cfDNA and then their promoter profiling (read depth spanning from -1 KB to +1 KB around the transcriptional start site) was analyzed. Using three machine learning models and two feature selection algorithms, classifiers for predicting preterm delivery were developed. Among them, a classifier based on the support vector machine model and backward algorithm, named PTerm (Promoter profiling classifier for preterm prediction), exhibited the largest AUC value of 0.878 (0.852-0.904) following LOOCV cross validation. More importantly, PTerm exhibited good performance in three independent validation cohorts and achieved an overall AUC of 0.849 (0.831-0.866). Taken together, PTerm could be based on current noninvasive prenatal test (NIPT) data without changing its procedure or adding detection cost, which can be easily adapted for preclinical tests.
<<<
翻译
722.
颜林林
(2022-09-23 22:56):
#paper doi:10.1371/journal.pgen.1010404 PLOS Genetics, 2022, Analysis of low-level somatic mosaicism reveals stage and tissue-specific mutational features in human development. 这篇文章纳入了来自190人的498个样本,包括神经疾病患者、脑肿瘤患者和健康对照,样本类型包括外周血及脑、心脏、肝脏等组织,对这些样本进行配对的全外显子测序(平均深度~500x),研究各样本的体细胞突变,以及它们在不同组织类型和不同发育阶段的分布情况,以及突变特征差异。对这些突变,还采取Sanger和靶向扩增超高深度测序等方法进行验证,对于突变在不同类型细胞的分布,也使用了流式细胞术进行了验证。分析方法上都比较常规,但作为一套数百例不同组织部位的深度全外显子数据,以及它所描述的体细胞突变的分布,还是比较有重分析挖掘的价值的。
Abstract:
Most somatic mutations that arise during normal development are present at low levels in single or multiple tissues depending on the developmental stage and affected organs. However, the effect of …
>>>
Most somatic mutations that arise during normal development are present at low levels in single or multiple tissues depending on the developmental stage and affected organs. However, the effect of human developmental stages or mutations of different organs on the features of somatic mutations is still unclear. Here, we performed a systemic and comprehensive analysis of low-level somatic mutations using deep whole-exome sequencing (average read depth ~500×) of 498 multiple organ tissues with matched controls from 190 individuals. Our results showed that early clone-forming mutations shared between multiple organs were lower in number but showed higher allele frequencies than late clone-forming mutations [0.54 vs. 5.83 variants per individual; 6.17% vs. 1.5% variant allele frequency (VAF)] along with less nonsynonymous mutations and lower functional impacts. Additionally, early and late clone-forming mutations had unique mutational signatures that were distinct from mutations that originated from tumors. Compared with early clone-forming mutations that showed a clock-like signature across all organs or tissues studied, late clone-forming mutations showed organ, tissue, and cell-type specificity in the mutation counts, VAFs, and mutational signatures. In particular, analysis of brain somatic mutations showed a bimodal occurrence and temporal-lobe-specific signature. These findings provide new insights into the features of somatic mosaicism that are dependent on developmental stage and brain regions.
<<<
翻译
723.
徐炳祥
(2022-09-22 22:58):
#paper doi: 10.1186/s13059-022-02757-0 Genome Biology, 2022, Genetic regulation of RNA splicing in human pancreatic islets。在胰岛细胞中存在的非编码编译影响了细胞转录组,从而在I型和II型糖尿病发病过程中可能扮演重要角色。本文在由399名患者组成的队列中分析了一类特殊的常见基因组变异(sQTL,splicing QTL,那些能可变剪接事件的QTL)。sQTL 的靶基因不同于eQTL,暗示着两类QTL可能独立发挥作用。作者识别了一批新的与sQTL关联的I型和II型糖尿病风险基因。作者据此认为胰岛细胞中的可变剪接事件是重要的糖尿病风险因素。
IF:10.100Q1
Genome biology,
2022-09-15.
DOI: 10.1186/s13059-022-02757-0
PMID: 36109769
PMCID:PMC9479353
人胰岛 RNA 剪接的遗传调控
Abstract:
BACKGROUND: Non-coding genetic variants that influence gene transcription in pancreatic islets play a major role in the susceptibility to type 2 diabetes (T2D), and likely also contribute to type 1 …
>>>
BACKGROUND: Non-coding genetic variants that influence gene transcription in pancreatic islets play a major role in the susceptibility to type 2 diabetes (T2D), and likely also contribute to type 1 diabetes (T1D) risk. For many loci, however, the mechanisms through which non-coding variants influence diabetes susceptibility are unknown.RESULTS: We examine splicing QTLs (sQTLs) in pancreatic islets from 399 human donors and observe that common genetic variation has a widespread influence on the splicing of genes with established roles in islet biology and diabetes. In parallel, we profile expression QTLs (eQTLs) and use transcriptome-wide association as well as genetic co-localization studies to assign islet sQTLs or eQTLs to T2D and T1D susceptibility signals, many of which lack candidate effector genes. This analysis reveals biologically plausible mechanisms, including the association of T2D with an sQTL that creates a nonsense isoform in ERO1B, a regulator of ER-stress and proinsulin biosynthesis. The expanded list of T2D risk effector genes reveals overrepresented pathways, including regulators of G-protein-mediated cAMP production. The analysis of sQTLs also reveals candidate effector genes for T1D susceptibility such as DCLRE1B, a senescence regulator, and lncRNA MEG3.CONCLUSIONS: These data expose widespread effects of common genetic variants on RNA splicing in pancreatic islets. The results support a role for splicing variation in diabetes susceptibility, and offer a new set of genetic targets with potential therapeutic benefit.
<<<
翻译
背景: 影响胰岛基因转录的非编码遗传变异在 2 型糖尿病 (T2D) 的易感性中起主要作用,也可能导致 1 型糖尿病 (T1D) 风险。然而,对于许多基因座,非编码变异影响糖尿病易感性的机制尚不清楚。
结果: 我们检查了 399 例人类供体胰岛中的剪接 QTL (sQTL),并观察到常见的遗传变异对在胰岛生物学和糖尿病中具有成熟作用的基因剪接具有广泛影响。同时,我们分析表达 QTL (eQTL) 并使用转录组范围的关联以及遗传共定位研究将胰岛 sQTL 或 eQTL 分配给 T2D 和 T1D 易感信号,其中许多信号缺乏候选效应基因。该分析揭示了生物学上合理的机制,包括 T2D 与 sQTL 的关联,该 sQTL 在 ERO1B 中产生无义亚型,ERO1B 是 ER 应激和胰岛素原生物合成的调节因子。扩展的 T2D 风险效应基因列表揭示了过度表达的通路,包括 G 蛋白介导的 cAMP 产生的调节因子。sQTL 的分析还揭示了 T1D 易感性的候选效应基因,例如 DCLRE1B、衰老调节因子和 lncRNA MEG3。
结论: 这些数据揭示了常见遗传变异对胰岛 RNA 剪接的广泛影响。结果支持剪接变异在糖尿病易感性中的作用,并提供了一组具有潜在治疗益处的新遗传靶点。
724.
张浩彬
(2022-09-21 11:01):
#paper https://doi.org/10.48550/arXiv.2106.00750
Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding
21年ICLR论文,时间序列对比学习
代码:https://github.com/sanatonek/TNC_ representation_learning
样本的选择思想是,认为领域内的信号是相似的,领域外的信号是需要区分的
正样本的选择:邻域的信号都是服从某个高斯分布,均值为t*,方差是窗口大小和邻域长度.领域内是正样本正样本。如果确定邻域,使用ADF检验。
负样本:不在邻域内的就是负样本,但是这一点,作者在损失函数里进一步优化了
损失函数:作者认为,不在一个领域不能都认为是负样本,因为时序问题具有周期性,因此应该把它归为正无标记样本(即正类和负类混合)。在处理上,根据PU学习的一些经验,它在上面的负样本中引入权重,同时进入损失函数。、
数据:总共3个数据:1个模拟数据(4个类别,HMM生成),1个医疗临床房颤数据(MIT-BIH,特点是类别交替进行,类别非常不平衡,少量个体(人)具体非常长的数据),1个人类活动数据(UCI-HAR数据)
下游任务:聚类与分类,其中主要目标是为了尽可能比较表征学习,因此对于同一任务,不同的模型都用了相同的,并且简单的编码器结构。由于不同数据集特点不一样,因此不同任务的编码器不同。
聚类用了简单的kmeans;分类用了简单的knn;本文的TNC都取得了最好的结果
arXiv,
2021.
DOI: 10.48550/arXiv.2106.00750
Abstract:
Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for …
>>>
Time series are often complex and rich in information but sparsely labeled and therefore challenging to model. In this paper, we propose a self-supervised framework for learning generalizable representations for non-stationary time series. Our approach, called Temporal Neighborhood Coding (TNC), takes advantage of the local smoothness of a signal's generative process to define neighborhoods in time with stationary properties. Using a debiased contrastive objective, our framework learns time series representations by ensuring that in the encoding space, the distribution of signals from within a neighborhood is distinguishable from the distribution of non-neighboring signals. Our motivation stems from the medical field, where the ability to model the dynamic nature of time series data is especially valuable for identifying, tracking, and predicting the underlying patients' latent states in settings where labeling data is practically impossible. We compare our method to recently developed unsupervised representation learning approaches and demonstrate superior performance on clustering and classification tasks for multiple datasets.
<<<
翻译
725.
颜林林
(2022-09-21 07:48):
#paper doi:10.1002/humu.24458 Human Mutation, 2022, A survey of current methods to detect and genotype inversions. 倒位(inversion)是基因组上一类特殊的变异,越来越多的技术方法可以对其进行发现和鉴定,也因此发现该事件广泛存在于不同物种的基因组中。这篇综述从技术角度,分别介绍了PCR、NGS序列比对、单倍体型识别、模板链测序(template‐strand sequencing,Strand‐seq)、光学图谱(optical mapping,Bionano)及基因组组装这六类方法对倒位的鉴定,以及相应方法所取得的研究进展。
Abstract:
Polymorphic inversions are ubiquitous in humans and they have been linked to both adaptation and disease. Following their discovery in Drosophila more than a century ago, inversions have proved to …
>>>
Polymorphic inversions are ubiquitous in humans and they have been linked to both adaptation and disease. Following their discovery in Drosophila more than a century ago, inversions have proved to be more elusive than other structural variants. A wide variety of methods for the detection and genotyping of inversions have recently been developed: multiple techniques based on selective amplification by PCR, short- and long-read sequencing approaches, principal component analysis of small variant haplotypes, template strand sequencing, optical mapping, and various genome assembly methods. Many methods apply complex wet lab protocols or increasingly refined bioinformatic analyses. This review is an attempt to provide a practical summary and comparison of the methods that are in current use, with a focus on metrics such as the maximum size of segmental duplications at inversion breakpoints that each method can tolerate, the size range of inversions that they recover, their throughput, and whether the locations of putative inversions must be known beforehand.
<<<
翻译
726.
颜林林
(2022-09-20 06:54):
#paper doi:10.1002/humu.24465 Human Mutations, 2022, Long-read sequencing for molecular diagnostics in constitutional genetic disorders. 这是一篇关于使用三代长读长测序进行遗传病基因检测的综述,来自费城儿童医院。文章列举了其医院提供的耳聋基因检测的例子,来说明在实践中整合使用多种不同检测技术,实现检测上百个基因不同类型疾病相关突变的需求。此外,也通过实例,系统地分析了诸如重复片段、假基因、同一基因发生多个距离较远突变(需要进行phasing,即定相)等可能造成检测结果误判的问题,以及长读长测序技术如何解决相应问题。三代测序用于遗传基因检测,目前最大瓶颈在于所积累的证据和人群数据,但这正好是时间可以逐步积累并解决的。从这篇文章展示的这些几乎只能使用长读长相关技术才能解决的问题案例,可以预期不久的未来将迎来一批相应的长读长测序基因检测方法的落地应用。
Abstract:
Long-read sequencing (LRS) has been around for more than a decade, but widespread adoption of the technology has been slow due to the perceived high error rates and high sequencing …
>>>
Long-read sequencing (LRS) has been around for more than a decade, but widespread adoption of the technology has been slow due to the perceived high error rates and high sequencing cost. This is changing due to the recent advancements to produce highly accurate sequences and the reducing costs. LRS promises significant improvement over short read sequencing in four major areas: (1) better detection of structural variation (2) better resolution of highly repetitive or nonunique regions (3) accurate long-range haplotype phasing and (4) the detection of base modifications natively from the sequencing data. Several successful applications of LRS have demonstrated its ability to resolve molecular diagnoses where short-read sequencing fails to identify a cause. However, the argument for increased diagnostic yield from LRS remains to be validated. Larger cohort studies may be required to establish the realistic boundaries of LRS's clinical utility and analytical validity, as well as the development of standards for clinical applications. We discuss the limitations of the current standard of care, and contrast with the applications and advantages of two major LRS platforms, PacBio and Oxford Nanopore, for molecular diagnostics of constitutional disorders, and present a critical argument about the potential of LRS in diagnostic settings.
<<<
翻译
727.
颜林林
(2022-09-19 22:00):
#paper doi:10.1038/s41598-022-17585-2 Scientific Reports, 2022, Recursive integration of synergised graph representations of multi‑omics data for cancer subtypes identification. 随着高通量测序技术在不同组学水平上的应用,肿瘤研究也早已进入多组学研究阶段。如何将多组学高维数据进行有效整合,一直是一项有挑战的工作。与此相关的方法学研发工作,大多聚焦于单组学数据的各类降维和特征提取。本文开发了一个名为RISynG(Recursive Integration of Synergised Graph-representations)的方法,通过从原始的组学数据中提取Gramian和Laplacian两个表征矩阵(representation matrices),使整合不同组学之间更加有效。相比过去大多数将多组学数据进行简单串联堆叠的方式,能够取得更好的分类效果,实现基于肿瘤多组学数据(如TCGA)进行肿瘤分型。
Abstract:
Cancer subtypes identification is one of the critical steps toward advancing personalized anti-cancerous therapies. Accumulation of a massive amount of multi-platform omics data measured across the same set of samples …
>>>
Cancer subtypes identification is one of the critical steps toward advancing personalized anti-cancerous therapies. Accumulation of a massive amount of multi-platform omics data measured across the same set of samples provides an opportunity to look into this deadly disease from several views simultaneously. Few integrative clustering approaches are developed to capture shared information from all the views to identify cancer subtypes. However, they have certain limitations. The challenge here is identifying the most relevant feature space from each omic view and systematically integrating them. Both the steps should lead toward a global clustering solution with biological significance. In this respect, a novel multi-omics clustering algorithm named RISynG (Recursive Integration of Synergised Graph-representations) is presented in this study. RISynG represents each omic view as two representation matrices that are Gramian and Laplacian. A parameterised combination function is defined to obtain a synergy matrix from these representation matrices. Then a recursive multi-kernel approach is applied to integrate the most relevant, shared, and complementary information captured via the respective synergy matrices. At last, clustering is applied to the integrated subspace. RISynG is benchmarked on five multi-omics cancer datasets taken from The Cancer Genome Atlas. The experimental results demonstrate RISynG's efficiency over the other approaches in this domain.
<<<
翻译
728.
张德祥
(2022-09-19 19:40):
#paper https://doi.org/10.48550/arXiv.2206.00426 Semantic Probabilistic Layers for Neuro-Symbolic Learning 论文为结构化输出预测设计了一个预测层,可以嵌入神经网络中,保证预测与标签约束一致,通过建模复杂的相关性和约束,结合了概率推理和逻辑推理。是现在唯一满足六个条件的实现。(概率性,高表达力,保证逻辑约束一致,通用-支持各种约束的形式语言表达,模块化嵌入神经网络端对端训练,高效的线性时间);核心是论文通过带约束的概率线路来实现。应用:路径规划(有障碍物、水路等限制),层级多标签训练等。
arXiv,
2022.
DOI: 10.48550/arXiv.2206.00426
Abstract:
We design a predictive layer for structured-output prediction (SOP) that can be plugged into any neural network guaranteeing its predictions are consistent with a set of predefined symbolic constraints. Our …
>>>
We design a predictive layer for structured-output prediction (SOP) that can be plugged into any neural network guaranteeing its predictions are consistent with a set of predefined symbolic constraints. Our Semantic Probabilistic Layer (SPL) can model intricate correlations, and hard constraints, over a structured output space all while being amenable to end-to-end learning via maximum likelihood. SPLs combine exact probabilistic inference with logical reasoning in a clean and modular way, learning complex distributions and restricting their support to solutions of the constraint. As such, they can faithfully, and efficiently, model complex SOP tasks beyond the reach of alternative neuro-symbolic approaches. We empirically demonstrate that SPLs outperform these competitors in terms of accuracy on challenging SOP tasks including hierarchical multi-label classification, pathfinding and preference learning, while retaining perfect constraint satisfaction.
<<<
翻译
729.
哪有情可长
(2022-09-18 20:36):
#paper 'Green revolution' genes encode mutant gibberellin response modulators, Nature 1999 Jul 15;400(6741):256-61. doi: 10.1038/22307.
绿色革命是将在拟南芥中发现的矮杆的基因引用到作物中,降低了水稻、小麦等作物的株高,然后加上水肥等配套设施开始完善,从而使得作物产量增加,也降低了作物成熟后期大风和降雨导致的倒伏减产。进而使得矮杆基因在’绿色革命‘中得以应用。作物中的矮杆突变主要是由于该类基因突变后,导致对GA(赤霉素)不敏感,反应异常导致的。该文主要从拟南芥、水稻、玉米、小麦中的矮杆基因的基因结构,蛋白功能以及突变位点的差异导致的表型的差异变化。分析物种之间矮杆基因的共线性、矮杆基因中发现的SH2 domain结构。且赤霉素信号转导在单子叶和双子叶植物中非常相似,可能涉及SH2 domian与磷酸化酪氨酸残基的相互作用。
该作者首先在1993年在拟南芥中发现了一个GAI的基因,该基因是负调控赤霉素(GA)信号通路的一个基因。获取拟南芥中该基因突变体,后再1997年又发了一篇关于拟南芥GAI基因的文章,后续他又在小麦中进行研究,发现无论是双子叶植物还是单子叶植物,该类基因的功能是同源性较好的基因。
Abstract:
World wheat grain yields increased substantially in the 1960s and 1970s because farmers rapidly adopted the new varieties and cultivation methods of the so-called ‘green revolution’1,2,3,4. The new varieties are …
>>>
World wheat grain yields increased substantially in the 1960s and 1970s because farmers rapidly adopted the new varieties and cultivation methods of the so-called ‘green revolution’1,2,3,4. The new varieties are shorter, increase grain yield at the expense of straw biomass, and are more resistant to damage by wind and rain3,4. These wheats are short because they respond abnormally to the plant growth hormone gibberellin. This reduced response to gibberellin is conferred by mutant dwarfing alleles at one of two Reduced height-1 (Rht-B1 and Rht-D1) loci4,5. Here we show that Rht-B1/Rht-D1 and maize dwarf-8 (d8)6,7 are orthologues of the Arabidopsis Gibberellin Insensitive (GAI) gene8,9. These genes encode proteins that resemble nuclear transcription factors and contain an SH2-like10 domain, indicating that phosphotyrosine may participate in gibberellin signalling. Six different orthologous dwarfing mutant alleles encode proteins that are altered in a conserved amino-terminal gibberellin signalling domain. Transgenic rice plants containing a mutant GAI allele give reduced responses to gibberellin and are dwarfed, indicating that mutant GAI orthologues could be used to increase yield in a wide range of crop species.
<<<
翻译
730.
颜林林
(2022-09-16 23:18):
#paper doi:10.1016/j.molcel.2022.08.019 Molecular Cell, 2022, Developmental and housekeeping transcriptional programs in Drosophila require distinct chromatin remodelers. 这篇文章吸引到我,是因为浏览它时,我看到了两个词,“Drosophila(果蝇)”和“auxin(植物生长素)”,于是很好奇这两者是怎么联系起来的。过去在生物专业课上,就听说过植物生长素在植物研究领域中的至尊江湖地位。这篇文章提及一项技术“auxin-inducible degradation (AID)”,源自2009年的一篇Nature Methods文章(doi:10.1038/nmeth.1401),该技术通过为目标蛋白加入一段特定序列,使得在有植物生长素的情况下,能引发蛋白泛素化降解机制,从而可以人为控制蛋白的降解过程。由于泛素化降解是一个广泛存在于不同物种的机制,这项技术就可以应用于非植物的各种生物体系中。本文通过这项技术,对果蝇的看家基因(house keeping gene)和发育基因(developmental gene)进行了研究,前者普遍表达于所有类型细胞,后者则只在特定组织器官类型的细胞中表达。通过人为控制相应基因的蛋白降解,揭示了两类基因在染色质重塑(chromatin remodelling)及其他相关特征上的差异。
Abstract:
Gene transcription is a highly regulated process in all animals. In Drosophila, two major transcriptional programs, housekeeping and developmental, have promoters with distinct regulatory compatibilities and nucleosome organization. However, it …
>>>
Gene transcription is a highly regulated process in all animals. In Drosophila, two major transcriptional programs, housekeeping and developmental, have promoters with distinct regulatory compatibilities and nucleosome organization. However, it remains unclear how the differences in chromatin structure relate to the distinct regulatory properties and which chromatin remodelers are required for these programs. Using rapid degradation of core remodeler subunits in Drosophila melanogaster S2 cells, we demonstrate that developmental gene transcription requires SWI/SNF-type complexes, primarily to maintain distal enhancer accessibility. In contrast, wild-type-level housekeeping gene transcription requires the Iswi and Ino80 remodelers to maintain nucleosome positioning and phasing at promoters. These differential remodeler dependencies relate to different DNA-sequence-intrinsic nucleosome affinities, which favor a default ON state for housekeeping but a default OFF state for developmental gene transcription. Overall, our results demonstrate how different transcription-regulatory strategies are implemented by DNA sequence, chromatin structure, and remodeler activity.
<<<
翻译
731.
张德祥
(2022-09-16 09:57):
#paper DOI:https://doi.org/10.1016/j.ijar.2021.09.012 Strudel: A fast and accurate learner of structured-decomposable probabilistic circuits
Probabilistic circuits (PCs)将概率分布表示为计算图,并添加图结构属性保证推理计算效率。
结构化可分解是一个吸引人的属性。
它能够有效和精确地计算复杂逻辑公式的概率,并可用于在缺失数据的情况下推理某些预测模型的预期输出。
本文提出一种简单、快速、准确的结构化可分解 PCs 学习算法 Strudel: STRUctured-DEcomposable Learner,从数据中直接学习概率计算图网络。
Abstract:
Probabilistic circuits (PCs) represent a probability distribution as a computational graph. Enforcing structural properties on these graphs guarantees that several inference scenarios become tractable. Among these properties, structured decomposability is …
>>>
Probabilistic circuits (PCs) represent a probability distribution as a computational graph. Enforcing structural properties on these graphs guarantees that several inference scenarios become tractable. Among these properties, structured decomposability is a particularly appealing one: it enables the efficient and exact computations of the probability of complex logical formulas, and can be used to reason about the expected output of certain predictive models under missing data. This paper proposes Strudel, a simple, fast and accurate learning algorithm for structured-decomposable PCs. Compared to prior work for learning structured-decomposable PCs, Strudel delivers more accurate single PC models in fewer iterations, and dramatically scales learning when building ensembles of PCs. It achieves this scalability by exploiting another structural property of PCs, called determinism, and by sharing the same computational graph across mixture components. We show these advantages on standard density estimation benchmarks and challenging inference scenarios.
<<<
翻译
732.
张德祥
(2022-09-16 09:36):
#paper
URL: http://starai.cs.ucla.edu/papers/ProbCirc20.pdf
Probabilistic circuits: A unifying framework for tractable probabilistic models
概率模型是现代机器学习(ML)和人工智能(AI)的核心。
事实上,概率论为在不确定性存在的情况下做出决策提供了一个原则性的、几乎普遍采用的机制。例如,在机器学习中,我们假设我们的数据来自未知的概率分布;
许多机器学习任务简化为简单地执行概率推理。类似地,许多形式的基于模型的人工智能寻求直接将支配我们周围世界的机制表示为某种形式的概率分布。
难怪 ML 中的许多注意力都放在从数据中学习分布上。我们将越来越多的表达性概率模型作为密度估计器,这些模型越来越接近产生数据的分布
但是之前模型及深度学习中的模型效率都不高,而且还不准确,我们要开发理论上可靠的模型且推理时间可控。且富有表现力,现在可以用统一的模型Probabilistic Circuits来处理。
Probabilistic Circuits特点:Probabilistic Circuits就是神经网络,而且是分层混合网络模型
2020.
Abstract:
No abstract available.
733.
颜林林
(2022-09-15 22:35):
#paper doi:10.1002/humu.24455 Human Mutation, 2022, de novo variant calling identifies cancer mutation signatures in the 1000 Genomes Project. 本文开发了一种能利用GPU加速、基于trio(一家三口,父母两人及一个子女)全基因组测序数据、检测新发突变(de novo variant)的工具。并使用该工具重新分析了三个大规模trio人群数据,三个人群分别是Simons Simplex Collection(SSC)、Simons Foundation Powering Autism Research(SPARK)和千人基因组(1000 Genomes Project,1000G),其样本类型分别为外周血、唾液和细胞系。结果发现细胞系的新发突变数量和特征,明显不符合预期。通过对1000G中的这些新发突变的特征分析,发现它们与B细胞淋巴瘤相似,从而推断其大多应为细胞系制备过程(即EBV处理)中引入的artifacts。
Abstract:
Detection of de novo variants (DNVs) is critical for studies of disease-related variation and mutation rates. To accelerate DNV calling, we developed a graphics processing units-based workflow. We applied our …
>>>
Detection of de novo variants (DNVs) is critical for studies of disease-related variation and mutation rates. To accelerate DNV calling, we developed a graphics processing units-based workflow. We applied our workflow to whole-genome sequencing data from three parent-child sequenced cohorts including the Simons Simplex Collection (SSC), Simons Foundation Powering Autism Research (SPARK), and the 1000 Genomes Project (1000G) that were sequenced using DNA from blood, saliva, and lymphoblastoid cell lines (LCLs), respectively. The SSC and SPARK DNV callsets were within expectations for number of DNVs, percent at CpG sites, phasing to the paternal chromosome of origin, and average allele balance. However, the 1000G DNV callset was not within expectations and contained excessive DNVs that are likely cell line artifacts. Mutation signature analysis revealed 30% of 1000G DNV signatures matched B-cell lymphoma. Furthermore, we found variants in DNA repair genes and at Clinvar pathogenic or likely-pathogenic sites and significant excess of protein-coding DNVs in IGLL5; a gene known to be involved in B-cell lymphomas. Our study provides a new rapid DNV caller for the field and elucidates important implications of using sequencing data from LCLs for reference building and disease-related projects.
<<<
翻译
734.
颜林林
(2022-09-14 05:52):
#paper doi:10.1002/humu.24460 Human Mutation, 2022, CIC missense variants contribute to susceptibility for spina bifida. 既往研究发现,叶酸摄入对于神经系统发育具有重要作用,其缺乏可能导致神经管缺陷(Neural tube defects,NTDs)这样的严重先天畸形。本文应该是从另一项研究出发,由入组的140例散发脊柱裂(spina bifida)病例,进行的全基因组测序结果中,发现8例CIC基因的罕见错义突变。通过近缘物种间序列保守性,确认了这些突变可能存在重要作用。在细胞系中通过质粒转染和叶酸缺乏培养等实验,引入野生型或携带上述突变的CIC基因的质粒,通过免疫荧光观察突变对表达量和亚细胞定位的影响。此外,还使用Western、qPCR等方法,对CIC所调控的基因的表达进行测定,确认了所发现的CIC突变,确实会对相关通路造成影响。这是一篇用湿实验方法对所发现基因突变功能进行验证的典型研究。
Abstract:
Neural tube defects (NTDs) are congenital malformations resulting from abnormal embryonic development of the brain, spine, or spinal column. The genetic etiology of human NTDs remains poorly understood despite intensive …
>>>
Neural tube defects (NTDs) are congenital malformations resulting from abnormal embryonic development of the brain, spine, or spinal column. The genetic etiology of human NTDs remains poorly understood despite intensive investigation. CIC, homolog of the Capicua transcription repressor, has been reported to interact with ataxin-1 (ATXN1) and participate in the pathogenesis of spinocerebellar ataxia type 1. Our previous study demonstrated that CIC loss of function (LoF) variants contributed to the cerebral folate deficiency syndrome by downregulating folate receptor 1 (FOLR1) expression. Given the importance of folate transport in neural tube formation, we hypothesized that CIC variants could contribute to increased risk for NTDs by depressing embryonic folate concentrations. In this study, we examined CIC variants from whole-genome sequencing (WGS) data of 140 isolated spina bifida cases and identified eight missense variants of CIC gene. We tested the pathogenicity of the observed variants through multiple in vitro experiments. We determined that CIC variants decreased the FOLR1 protein level and planar cell polarity (PCP) pathway signaling in a human cell line (HeLa). In a murine cell line (NIH3T3), CIC loss of function variants downregulated PCP signaling. Taken together, this study provides evidence supporting CIC as a risk gene for human NTD.
<<<
翻译
735.
颜林林
(2022-09-13 07:21):
#paper doi:10.1016/j.vaccine.2022.08.036 Vaccine, 2022, Serious adverse events of special interest following mRNA COVID-19 vaccination in randomized trials in adults. 这篇文章跟进了Pfizer和Moderna两家公司的新冠RNA疫苗的三期临床试验,针对其报出的严重不良反应进行二次分析,确认各自疫苗相对于安慰剂所增加的风险比值。该结果提示应该进行更加详尽正式的利弊分析。而文末也再次呼吁要求公开受试者级别的相关数据,以保证临床试验的透明度和各类评估分析得以正确进行。
Abstract:
INTRODUCTION: In 2020, prior to COVID-19 vaccine rollout, the Brighton Collaboration created a priority list, endorsed by the World Health Organization, of potential adverse events relevant to COVID-19 vaccines. We …
>>>
INTRODUCTION: In 2020, prior to COVID-19 vaccine rollout, the Brighton Collaboration created a priority list, endorsed by the World Health Organization, of potential adverse events relevant to COVID-19 vaccines. We adapted the Brighton Collaboration list to evaluate serious adverse events of special interest observed in mRNA COVID-19 vaccine trials.METHODS: Secondary analysis of serious adverse events reported in the placebo-controlled, phase III randomized clinical trials of Pfizer and Moderna mRNA COVID-19 vaccines in adults (NCT04368728 and NCT04470427), focusing analysis on Brighton Collaboration adverse events of special interest.RESULTS: Pfizer and Moderna mRNA COVID-19 vaccines were associated with an excess risk of serious adverse events of special interest of 10.1 and 15.1 per 10,000 vaccinated over placebo baselines of 17.6 and 42.2 (95 % CI -0.4 to 20.6 and -3.6 to 33.8), respectively. Combined, the mRNA vaccines were associated with an excess risk of serious adverse events of special interest of 12.5 per 10,000 vaccinated (95 % CI 2.1 to 22.9); risk ratio 1.43 (95 % CI 1.07 to 1.92). The Pfizer trial exhibited a 36 % higher risk of serious adverse events in the vaccine group; risk difference 18.0 per 10,000 vaccinated (95 % CI 1.2 to 34.9); risk ratio 1.36 (95 % CI 1.02 to 1.83). The Moderna trial exhibited a 6 % higher risk of serious adverse events in the vaccine group: risk difference 7.1 per 10,000 (95 % CI -23.2 to 37.4); risk ratio 1.06 (95 % CI 0.84 to 1.33). Combined, there was a 16 % higher risk of serious adverse events in mRNA vaccine recipients: risk difference 13.2 (95 % CI -3.2 to 29.6); risk ratio 1.16 (95 % CI 0.97 to 1.39).DISCUSSION: The excess risk of serious adverse events found in our study points to the need for formal harm-benefit analyses, particularly those that are stratified according to risk of serious COVID-19 outcomes. These analyses will require public release of participant level datasets.
<<<
翻译
736.
颜林林
(2022-09-11 23:59):
#paper doi:10.1101/2022.09.09.453067 bioRxiv, 2022, HexSE: Simulating evolution in overlapping reading frames. 重叠基因是在病毒(质粒)中发现的一种有趣现象,即同一段核酸序列,因为翻译蛋白质的起始位置不同(即阅读框不同)导致形成不同蛋白。到目前为止的研究,发现在许多物种中都存在此现象。本文通过分析序列演化速率,来从积累的大量已被测序的基因组数据中,寻找这样的重叠基因。其基本假设是,如果存在重叠基因,则相应序列上受到的演化选择压力会有所不同,于是在结果上呈现出不同的演化速率。这是个很有意思的思路和研究课题。
bioRxiv,
2022.
DOI: 10.1101/2022.09.09.453067
Abstract:
Motivation: Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where …
>>>
Motivation: Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may provide a mechanism to increase the information content of compact genomes. The presence of overlapping reading frames (OvRFs) can skew estimates of selection based on the rates of non-synonymous and synonymous substitutions, since a substitution that is synonymous in one reading frame may be non-synonymous in another, and vice versa. Results: To understand the impact of OvRFs on molecular evolution, we implemented a versatile simulation model of nucleotide sequence evolution along a phylogeny with an arbitrary distribution of reading frames. We use a custom data structure to track the substitution rates at every nucleotide site, which is determined by the stationary nucleotide frequencies, transition bias, and the distribution of selection biases (dN/dS) in the respective reading frames. Availability and implementation: Our simulation model is implemented in the Python scripting language. All source code is released under the GNU General Public License (GPL) version 3, and is available at https://github.com/PoonLab/HexSE.
<<<
翻译
737.
song
(2022-09-09 09:04):
#paper https://doi.org/10.48550/arXiv.2206.13236 Pruned RNN-T for fast, memory-efficient ASR training
来自于小米新一代kaldi团队。RNN-T是目前端到端语音识别的主流范式之一,是目前流式解码模型中表现最好和最易工业化部署的,缺点是训练时内存比其他主流模型占用内存至少高一个数量级。究其原因是因为比其他模型如CTC和attention模型的内存多了一个解码器的输出帧数,U,导致的。U值一般在几十到几百之间。本文提出了一种在不降低模型性能的情况下对模型进行剪枝以降低U值的方法。该团队首先发现在RNN-T loss计算过程中,并不是每个计算节点都参与进了计算过程中。计算节点的数量和输出帧数U成正比,只要选择并只保留对模型训练有作用的计算节点便可减少模型内存提高模型训练速度。在计算梯度过程中,只有中间一段连续的计算节点参与进训练之中,根据不同的常见,这个连续节点数,S,为4或5。在实验中,训练时间达到之前sota的约十六分之一,内存占用达到之前的约五分之一,模型性能仅降了0.05%。个人尝试下来,仅用4张V100已经较少的调参便可完全重现并部署。中小型公司将sota模型应用于产品之中的成本和人力将大大减少
arXiv,
2022.
DOI: 10.48550/arXiv.2206.13236
Abstract:
The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the …
>>>
The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the drawbacks of RNN-T is that its loss function is relatively slow to compute, and can use a lot of memory. Excessive GPU memory usage can make it impractical to use RNN-T loss in cases where the vocabulary size is large: for example, for Chinese character-based ASR. We introduce a method for faster and more memory-efficient RNN-T loss computation. We first obtain pruning bounds for the RNN-T recursion using a simple joiner network that is linear in the encoder and decoder embeddings; we can evaluate this without using much memory. We then use those pruning bounds to evaluate the full, non-linear joiner network.
<<<
翻译
738.
魏魏魏
(2022-09-07 11:47):
#paper doi:10.1007/s10802-010-9396-z. Journal of Abnormal Child Psychology, (2010), Mother and Adolescent Reports of Associations Between Child Behavior Problems and Mother-Child Relationship Qualities: Separating Shared Variance from Individual Variance. 基于共同命运模型(Common Fate Model, CFM)的研究很少,所以看到了2010年的文献,只为更好地学习这种方法。共同命运模型很适合研究夫妻、母子和父子关系这种双方成员共享生活环境的人,即双方受到共同的环境变量影响,在一些变量上双方具有相似性。关系中的双方都需要在相关变量上报告自己的情况,这样就形成了配对数据(dyadic data),而且,双方的数据会存在依存性(interdependence),这也就打破了传统的相关分析需要变量各自独立的假设前提,此时共同命运模型可以解决这个问题。再有,传统研究只考察了单个被试在自变量和结果变量上的情况,这可能会出现因数据有共同来源而导致的共同方法变异(Common method variance),这会使最终结果的变异被夸大或缩小,也会影响我们对实际情况的准确认识。此时,这个模型也很有优势,因为它引进了另一个关系被试的情况,使得数据的来源多元化。基于共同命运模型的分析除了考察单个被试内变量的相关情况,也考察了被试间在同样的变量上的相关情况,还考察了关系水平上自变量与结果变量的相关情况。在这个过程中,关系双方共享因素带来的变异被分解了出来,帮助人们更好地了解了自变量与因变量的真实关系。当前研究考察了青少年行为问题与母子关系品质的关系,在两个变量上,母子双方有共同的认识,彼此间也会存在差异。基于共同命运模型,该研究同时考察了母子在相同变量上的情况。在具体分析中,除了考察子女报告的变量间的相关情况,也考察了母亲报告的变量间的相关情况,还从母子关系水平上分析了变量间的相关,并同时在模型中分别分析了母子在变量间的相关情况,并比较了多个相关系数之间在大小上的差异情况。最终发现了不同于基于传统研究的发现结果。
Abstract:
This study contrasts results from different correlational methods for examining links between mother and child (N = 72 dyads) reports of early adolescent (M = 11.5 years) behavior problems and …
>>>
This study contrasts results from different correlational methods for examining links between mother and child (N = 72 dyads) reports of early adolescent (M = 11.5 years) behavior problems and relationship negativity and support. Simple (Pearson) correlations revealed a consistent pattern of statistically significant associations, regardless of whether scores came from the same reporter or from different reporters. When correlations between behavior problems and relationship quality differed, within-reporter correlations were always greater in magnitude than between-reporter correlations. Dyadic (common fate) analyses designed for interdependent data decomposed within-reporter correlations into variance shared across reporters (dyadic correlations) and variance unique to specific reporters (individual correlations). Dyadic correlations were responsible for most associations between adolescent behavior problems and relationship negativity; after partitioning variance shared across reporters, no individual correlations emerged as statistically significant. In contrast, adolescent behavior problems were linked to relationship support via both shared variance and variance unique to maternal perceptions. Dyadic analyses provide a parsimonious alternative to multiple contrasts in instances when identical measures have been collected from multiple reporters. Findings from these analyses indicate that same-reporter variance bias should not be assumed in the absence of dyadic statistical analyses.
<<<
翻译
739.
马斯克齊
(2022-09-04 21:57):
#paper doi:10.19695/j.cnki.cn12-1369,2022,论人工智能在大学校园的重要应用。随着人工智能技术的不断发展以及疫情大背景下,校园学习如何与时代接轨列出人工智能在校园的一些应用,智能教学,智慧图书馆,智慧校园生活等都会有不一样的体验,同时对未来的教学改革产生重要影响。
数字技术与应用,
2022.
DOI: 10.19695/j.cnki.cn12-1369.2022.07.22
Abstract:
近年来,随着全球经济的快速发展,计算机科学技术的迅猛进步和发展以及高校智慧管理对于科学技术应用的迫切需求,物联网应用、大数据管理、5G和云计算等前言技术都逐渐在高校智慧管理上得到了体现和普及。而人工智能技术作为当前在高校管理中最受青睐的高科技技术,已在教学研究管理、在线教育和学生校园生活优化等各个方面得到了深度应用并在取得显著成效。本文介绍人工智能现阶段在高校校园管理中的重要应用,以供参考。
>>>
近年来,随着全球经济的快速发展,计算机科学技术的迅猛进步和发展以及高校智慧管理对于科学技术应用的迫切需求,物联网应用、大数据管理、5G和云计算等前言技术都逐渐在高校智慧管理上得到了体现和普及。而人工智能技术作为当前在高校管理中最受青睐的高科技技术,已在教学研究管理、在线教育和学生校园生活优化等各个方面得到了深度应用并在取得显著成效。本文介绍人工智能现阶段在高校校园管理中的重要应用,以供参考。
<<<
翻译
740.
张德祥
(2022-09-01 22:03):
#paper https://doi.org/10.48550/arXiv.2208.11970 Understanding Diffusion Models: A Unified Perspective ;最近大火的视频生成模型 dall-e 等背后都是diffusion 模型,这篇论文细致的讲解了diffusion模型的来龙去脉,从ELBO 到VAE 到hierarchical VAE 到diffusion 模型,及diffusion模型的三个视角及diffusion模型的局限,整篇论文公式推导清晰易读是了解diffusion模型的好资料。
arXiv,
2022.
DOI: 10.48550/arXiv.2208.11970
Abstract:
Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we …
>>>
Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.
<<<
翻译