#paper doi: https://www.nature.com/articles/s41576-023-00586-w Best practices for single-cell analysis across modalities. Nature review genetics,2023. 这篇综述文章来自Fabian Theis组, 是一篇极好的单细胞分析指导文章。文章涵盖了几种不同的技术(scRNA-seq, scATAC-seq, scTCR/BCR, spatial transcriptomics), 对于每一种技术路线,介绍了完整的分析流程和目前最好的处理方法,例如scRNA, 介绍了原始数据处理、数据过滤和去杂,标准化和批次效应去除,降维聚类分型,拟时序分析和RNA速率分析,差异基因分析,细胞组成分析和细胞通讯分析等等。对于每一个步骤,文章会总结当前的最佳实践(如果有其他文章做过基准测试)或者给出分析建议(如果目前还没有基准测试的工作)。鉴于当前单细胞分析领域各种方法层出不穷,这篇文章提供了一个很好的指导总结,非常推荐做单细胞分析的朋友阅读。
Recent advances in single-cell technologies have enabled high-throughput molecular profiling of cells across modalities and locations. Single-cell transcriptomics data can now be complemented by chromatin accessibility, surface protein expression, adaptive immune receptor repertoire profiling and spatial information. The increasing availability of single-cell data across modalities has motivated the development of novel computational methods to help analysts derive biological insights. As the field grows, it becomes increasingly difficult to navigate the vast landscape of tools and analysis steps. Here, we summarize independent benchmarking studies of unimodal and multimodal single-cell analysis across modalities to suggest comprehensive best-practice workflows for the most common analysis steps. Where independent benchmarks are not available, we review and contrast popular methods. Our article serves as an entry point for novices in the field of single-cell (multi-)omic analysis and guides advanced users to the most recent best practices. <<<
#paper Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models doi: https://doi.org/10.1101/2022.12.09.519842 这篇文章提出了一种全新的蛋白质设计方法,叫做rf diffusion,它使用深度生成学习生成全新的蛋白质结构。文章主要使用的是 diffusion model,考虑到蛋白质骨架的复杂几何性质以及氨基酸序列-结构的复杂关系,蛋白质生成任务一直以来的挑战很大。这篇工作 使用diffusion model的思路如下:1.使用RoseTTAFold作为去噪网络,考虑到RoseTTA本来就是baker组用来做蛋白质设计的(更多的是基于物理的),这个去噪网络的选择还是很巧妙的;2.整个加噪去噪过程主要针对alpha碳原子的坐标进行,因此rf diffusion的思路是先对骨架结构进行生成的;3.然后full 的protein structure是通过backbone tracking的技术来实现的,这个过程可以理解为基于一些几何约束、bond的长度角度参数等等为已经预测的alpha碳原子添加缺失的bond和原子,4.侧链是通过rotamer实现的,rotamer是一个已经对 每个氨基酸残基做了预先计算的库,它可以为你选择符合能量最优的构象的侧链结构。 因此整个蛋白质生成的过程可以认为是深度生成模型+物理约束+后处理(预先计算)来实现的。当然,这篇工作也做了很多的实验对设计进行验证。baker组在之后使用了rfdiffusion做了后续的一些设计工作,包括De novo design of high-affinity protein binders to bioactive helical peptides这个工作,并在不久前开源了rf diffusion的代码,也有很多蛋白质设计的研究人员开始大量尝试 基于rfdiffusion的设计,并尝试进行湿实验的验证,因此这绝对是一篇开创性的工作,值得各位小伙伴关注。
AbstractThere has been considerable recent progress in designing new proteins using deep learning methods1–9. Despite this progress, a general deep learning framework for protein design that enables solution of a wide range of design challenges, includingde novobinder design and design of higher order symmetric architectures, has yet to be described. Diffusion models10,11have had considerable success in image and language generative modeling but limited success when applied to protein modeling, likely due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding, and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold Diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of new designs. In a manner analogous to networks which produce images from user-specified inputs, RFdiffusionenables the design of diverse, complex, functional proteins from simple molecular specifications. <<<
#paper doi:10.1109/TNB.2023.3254514 IEEE transactions on nanobioscience, 2023, RBS: A Rotational Coding Based on Blocking Strategy for DNA Storage. 利用DNA作为介质研发数据存储方案,是近几年的热点之一,许多研究所和公司都竞相开展,但投入和进展却层次不齐。这也是我个人比较感兴趣的方向之一,因此关注到最近刚发表出来的这篇文章,顺便点评一下。虽然这篇文章并不算多出彩,也没有什么重大突破,但它是一篇纯算法的概念验证工作,不涉及到分子实验,倒是比较适合我这种业余感兴趣者效仿。用DNA介质存储数据,面临各种现实问题,比如GC含量需要限制在一定范围,过高或过低的GC含量,都会在合成和测序上导致问题,再比如不能有连续重复片段等。也因此,对数据进行DNA字母的编码,不能简单随便设置某种一一对应规则,而需要同时考虑各类分子特性限制。本文提出了一种数据编解码算法RBS,并使用文本、图片数据测试,评估诸如GC含量、重复片段数量、汉明距离、自由能等,以确认该算法用于DNA存储的可行性和效率。
The data volume of global information has grown exponentially in recent years, but the development of silicon-based memory has entered a bottleneck period. Deoxyribonucleic acid (DNA) storage is drawing attention owing to its advantages of high storage density, long storage time, and easy maintenance. However, the base utilization and information density of existing DNA storage methods are insufficient. Therefore, this study proposes a rotational coding based on blocking strategy (RBS) for encoding digital information such as text and images in DNA data storage. This strategy satisfies multiple constraints and produces low error rates in synthesis and sequencing. To illustrate the superiority of the proposed strategy, it was compared and analyzed with existing strategies in terms of entropy value change, free energy size, and Hamming distance. The experimental results show that the proposed strategy has higher information storage density and better coding quality in DNA storage, so it will improve the efficiency, practicality, and stability of DNA storage. <<<
#paper Single-cell transcriptomics dissects hematopoietic cell destruction and T-cell engagement in aplastic anemia. Blood. 2021. 研究背景:再生障碍性贫血 (AA) 是一种T细胞介导的造血系统自身免疫性疾病,表现为造血干细胞和祖细胞 (HSPC) 的严重耗竭。异常活化的T淋巴细胞攻击自身造血干/祖细胞(HSPC)是再生障碍性贫血(AA)发病重要的机制。 研究难点:受限于技术和 HSPC 在骨髓衰竭背景下的稀疏性。AA患者骨髓残留HSPC细胞数量极少,精细剖析骨髓损伤后HSPC各组分的病理变化及T淋巴细胞免疫打击HSPC的分子机制比较困难。 样本类型:健康供体(healthy donors,n = 8)+ 非重度再生障碍性贫血患者 (non-SAA, n = 19) + 重度再生障碍性贫血患者 (SAA,  n = 4 );另加 药物处理组:免疫抑制治疗(IST)后患者 样本取样:骨髓及外周血中分选出CD34+造血干/祖细胞和CD4+/CD8+ T淋巴细胞 实验技术:STRT-Seq(高测序深度) + Smart-seq2 研究思路:不同疾病/健康组 -> 流式分选细胞 - > CD34+造血干/祖细胞和CD4+/CD8+ T淋巴细胞->单细胞测序(STRT-Seq + Smart-seq2)->定义了9类HSPC细胞亚群->基因表达和转录调控网络分析 研究结果: ① STRT-seq克服骨髓残留造血干细胞和祖细胞HSPC数量不足的限制,对AA患者的HSPC和T细胞进行分析,分别获得了2,385个HSPC和4,081个CD4+/CD8+ T细胞的单细胞转录组,定义了9类HSPCs细胞亚群,首次绘制了AA血液病理图谱,揭示了AA发病,特别是恶性转化的新机制。 ② AA中残留的HSPC在基因表达和转录调控网络中表现出谱系特异性的改变,提示存在谱系选择性造血损伤。 ③ 综合分析HSPC和T细胞的基因表达,确定了细胞类型特异性配体-受体相互作用是AA中免疫攻击的关键分子介质。 ④ 通过追踪免疫抑制治疗(IST)后的患者,发现HSPCs和T淋巴细胞的基因表达没有完全恢复到正常水平,甚至接近治疗前的状态,这可能是AA患者需要长期维持免疫抑制治疗的主要原因之一。
IF:21.000Q1 Blood, 2021-07-08. DOI: 10.1182/blood.2020008966 PMID: 33763704
Aplastic anemia (AA) is a T cell-mediated autoimmune disorder of the hematopoietic system manifested by severe depletion of the hematopoietic stem and progenitor cells (HSPCs). Nonetheless, our understanding of the complex relationship between HSPCs and T cells is still obscure, mainly limited by techniques and the sparsity of HSPCs in the context of bone marrow failure. Here we performed single-cell transcriptome analysis of residual HSPCs and T cells to identify the molecular players from patients with AA. We observed that residual HSPCs in AA exhibited lineage-specific alterations in gene expression and transcriptional regulatory networks, indicating a selective disruption of distinct lineage-committed progenitor pools. In particular, HSPCs displayed frequently altered alternative splicing events and skewed patterns of polyadenylation in transcripts related to DNA damage and repair, suggesting a likely role in AA progression to myelodysplastic syndromes. We further identified cell type-specific ligand-receptor interactions as potential mediators for ongoing HSPCs destruction by T cells. By tracking patients after immunosuppressive therapy (IST), we showed that hematopoiesis remission was incomplete accompanied by IST insensitive interactions between HSPCs and T cells as well as sustained abnormal transcription state. These data collectively constitute the transcriptomic landscape of disrupted hematopoiesis in AA at single-cell resolution, providing new insights into the molecular interactions of engaged T cells with residual HSPCs and render novel therapeutic opportunities for AA. <<<
#paper DOI: 10.1182/blood.2020006287. Tumor-intrinsic and -extrinsic determinants of response to blinatumomab in adults with B-ALL. Blood. 2021 Jan 28;137(4):471-484. 通过对肿瘤和免疫细胞的综合基因组分析,证明肿瘤内在和外在因素都会影响患者对blinatumomab(博纳吐单抗治疗)的反应。 单细胞测序研究了44位采用blinatumomab治疗的复发性/难治性B-ALL成人患者(包括2例MRD阳性的患者)。 血液病患者的总体缓解率为 55%,CRLF2 重排费城染色体样 ALL 患者(Ph样ALL)的缓解率很高(12 [75%] of 16)。 转录组结果来看,应答者的预处理样本在肿瘤内表现出免疫应答增强。在治疗期间,外显子CD19 ex2part的外显子剪接亚型的表达增加与治疗失败有关。 未来的研究可评估使用ex2part作为CD19定向免疫疗法(包括blinatumomab和CAR19)反应的生物标志物。
IF:21.000Q1 Blood, 2021-01-28. DOI: 10.1182/blood.2020006287 PMID: 32881995
Blinatumomab, a bispecific antibody that directs CD3+ T cells to CD19+ tumor cells, shows variable efficacy in B-progenitor acute lymphoblastic leukemia (B-ALL). To determine tumor-intrinsic and -extrinsic determinants of response, we studied 44 adults with relapsed or refractory B-ALL (including 2 minimal residual disease positive) treated with blinatumomab using bulk tumor and single-cell sequencing. The overall response rate in patients with hematological disease was 55%, with a high response rate in those with CRLF2-rearranged Philadelphia chromosome-like ALL (12 [75%] of 16). Pretreatment samples of responders exhibited a tumor-intrinsic transcriptomic signature of heightened immune response. Multiple mechanisms resulted in loss of CD19 expression, including CD19 mutations, CD19-mutant allele-specific expression, low CD19 RNA expression, and mutations in CD19 signaling complex member CD81. Patients with low hypodiploid ALL were prone to CD19- relapse resulting from aneuploidy-mediated loss of the nonmutated CD19 allele. Increased expression of a CD19 isoform with intraexonic splicing of exon 2, CD19 ex2part, at baseline or during therapy was associated with treatment failure. These analyses demonstrate both tumor-intrinsic and -extrinsic factors influence blinatumomab response. We show that CD19 mutations are commonly detected in CD19- relapse during blinatumomab treatment. Identification of the CD19 ex2part splice variant represents a new biomarker predictive of blinatumomab therapy failure. <<<
#paper De novo design of protein interactions with lerned surface pingerprints doi: 10.1038/s41586-023-05993-x. 文章的主要思路是分为三个阶段:(1)使用MaSIF-site预测目标蛋白质表面上具有高结合倾向的埋藏界面位点;(2)使用MaSIF-seed基于表面指纹寻找互补的结构基元(结合种子),这些基元具有与目标位点相匹配的特征;(3)将结合种子移植到蛋白质骨架上,使用Rosetta优化设计界面,增加稳定性和额外的接触。 文章的主要结论是,作者利用这种表面为中心的方法成功地设计并实验验证了针对四种蛋白质靶标的从头结合剂:SARS-CoV-2刺突蛋白、PD-1、PD-L1和CTLA-4。其中一些设计经过实验优化,而另一些则完全在计算机上生成,达到了纳摩尔级别的亲和力。结构和突变分析显示预测非常准确。总体而言,作者的方法能够捕捉分子识别的物理和化学决定因素,为从头设计蛋白质相互作用以及更广泛地设计具有功能的人工蛋白质提供了一种方法. 以上是通过chat GPT总结的。 不过我读完的感受就是,我并不认为这篇文章的水平是 nature 正刊的水平, masif 的算法在蛋白质结构对比上确实有用,但是背后有个深层次的问题这篇文章没有谈到,即目前来说,对于已知蛋白设计一个有效的配体蛋白,算法已经比较丰富了。并且最近2年发的文章已经有很好的实验结果来验证。 但是对于结构全新,或者说没有任何可用配体的蛋白来说,这个挑战非常巨大,文章并没有提到这种问题出现后的解决思路,而且甚至算法的创新比不上前段时间的 baker 的 rf diffusion. 总之吧 现在真的是蓝海市场。 这个领域机会太多了
IF:50.500Q1 Nature, 2023-05. DOI: 10.1038/s41586-023-05993-x PMID: 37100904
Physical interactions between proteins are essential for most biological processes governing life. However, the molecular determinants of such interactions have been challenging to understand, even as genomic, proteomic and structural data increase. This knowledge gap has been a major obstacle for the comprehensive understanding of cellular protein-protein interaction networks and for the de novo design of protein binders that are crucial for synthetic biology and translational applications. Here we use a geometric deep-learning framework operating on protein surfaces that generates fingerprints to describe geometric and chemical features that are critical to drive protein-protein interactions. We hypothesized that these fingerprints capture the key aspects of molecular recognition that represent a new paradigm in the computational design of novel protein interactions. As a proof of principle, we computationally designed several de novo protein binders to engage four protein targets: SARS-CoV-2 spike, PD-1, PD-L1 and CTLA-4. Several designs were experimentally optimized, whereas others were generated purely in silico, reaching nanomolar affinity with structural and mutational characterization showing highly accurate predictions. Overall, our surface-centric approach captures the physical and chemical determinants of molecular recognition, enabling an approach for the de novo design of protein interactions and, more broadly, of artificial proteins with function. <<<
#paper doi:10.3390/ph16030328 Chemical Composition and Antimicrobial Potential of a Plant-Based Substance for the Treatment of Seborrheic Dermatitis 脂溢性皮炎 (SD) 是最常见的头皮皮肤病,全世界高达 50% 的成年人都会发生。SD的发生与多种因素有关,但确切的发病机制仍未阐明。目前对SD的治疗以抗真菌和抗炎为主,但长期使用这类药物不仅有引起副作用的风险,真菌耐药性的诱导也是不可忽视的风险之一,为此作者将目光投向了植物源药物,即本文中的互叶白千层叶油(TTO)。作者首先使用GS/MS分析了成分,确定了 TTO 特有的 10 种抗菌单萜和倍半萜以及以前未在 TTO 中检测到的 7 种新萜。随后的抑菌试验和抗真菌实验表明这种物质的抗菌活性与苯扎氯铵、酮康唑和氯咪唑相当;因此,该物质有望用于进一步研究和评估可能用于治疗 SD 的药物开发。
Seborrheic dermatitis (SD) is the most prevalent dermatological disease, occurring in up to 50% of newborns, children, and adults around the world. The antibacterial and antifungal resistance contributed to the search for new natural substances and the development of a novel substance based on () leaf oil (TTO), 1,8-cineole (eucalyptol), and α-(-)-bisabolol. Thus, this work aimed to determine the chemical composition of the novel plant-based substance and to evaluate its antimicrobial activity against standard microorganisms involved in the pathogenesis of SD. Moreover, the chemical composition of the substance was analyzed by gas chromatography coupled with mass spectrometry (GC/MS). (), (), (), and () were used for antimicrobial and antifungal assays by means of the broth microdilution method to determine the minimal inhibitory concentration (MIC). Finally, the substance's ability to inhibit () was evaluated. Eighteen compounds from different chemical groups were identified by GC/MS. The major biologically active compounds of the substance were terpinen-4-ol (20.88%), 1,8-cineole (22.28%), (-)-α-bisabolol (25.73%), and o-cymene (8.16%). The results showed that the substance has a synergistic antimicrobial and antifungal activity, while and strains were the most susceptible. Furthermore, the substance inhibited , which is a main pathogen involved in the pathogenesis of SD and clinical manifestations. It can be concluded that the novel plant-based substance has a promising potential against and scalp commensal bacteria and may be helpful for the development of new drugs for treatment of dandruff and SD. <<<
#paper An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling DOI:arXiv:1803.01271 . 最近密集地做时序问题的分享,认真看了一下TCN的原文.除了RNN那一套,TCN还是用得比较多。为了在不增加太多层的情况下实现大的感受野,通过空洞卷积来实现,并通过padding和裁剪的方式避免了数据泄露问题。一个TCN块有两个空洞因果卷积,激活层,norm层以及一个残差链接组成。实验证明了TCN的超参数相对不敏感,但卷积核大小k是个关键,另外drop out 和梯度裁剪也有较大的帮助。
For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks. To assist related work, we have made code available at this http URL . <<<
#paper RhoB affects colitis through modulating cell signaling and intestinal microbiome.Microbiome, 2022 Sep 16;10(1). DOI:10.1186/s40168-022-01347-3. 背景:炎症性肠病(IBD)的发病机制是多因素的,IBD的诊断和治疗策略仍有待制定。RhoB调节多种细胞功能;然而,它在结肠炎中的作用尚未被探索。 结果:本文中研究者发现溃疡性结肠炎(UC)患者和DSS诱导的结肠炎小鼠的结肠组织中RhoB显著增加。与野生型小鼠相比,RhoB和RhoB小鼠发生了较轻的DSS诱导的结肠炎,杯状细胞数量和IEC增殖增加。RhoB降低通过抑制Wnt信号通路和激活p38 MAPK信号通路促进杯状细胞分化和上皮再生。此外,在RhoB和RhoB小鼠的肠道微生物组中检测到SCFA产生菌和SCFA浓度增加,并且还观察到SCFA受体表达上调。 结论:总之,较高水平的RhoB与UC有关,UC也通过调节细胞信号传导和改变肠道细菌组成和代谢产物来促进UC的发展。这些观察结果表明,RhoB具有作为UC的生物标志物和治疗靶点的潜力。
IF:13.800Q1 Microbiome, 2022-09-16. DOI: 10.1186/s40168-022-01347-3 PMID: 36114582
BACKGROUND: The pathogenesis of inflammatory bowel diseases (IBD) is multifactorial, and diagnostic and treatment strategies for IBD remain to be developed. RhoB regulates multiple cell functions; however, its role in colitis is unexplored.RESULTS: Here, we found RhoB was dramatically increased in colon tissues of ulcerative colitis (UC) patients and mice with DSS-induced colitis. Compared with wild type mice, RhoB+/- and RhoB-/- mice developed milder DSS-induced colitis and increased goblet cell numbers and IEC proliferation. Decreased RhoB promoted goblet cell differentiation and epithelial regeneration through inhibiting Wnt signaling pathway and activating p38 MAPK signaling pathway. Moreover, increased SCFA-producing bacteria and SCFA concentrations were detected in intestinal microbiome of both RhoB+/- and RhoB-/- mice and upregulated SCFA receptor expression was also observed.CONCLUSIONS: Taken together, a higher level of RhoB is associated with UC, which also contributes to UC development through modulating cell signaling and altering intestinal bacterial composition and metabolites. These observations suggest that RhoB has potential as a biomarker and a treatment target for UC. Video Abstract. <<<
#paper doi: 10.1371/journal.pgen.1009325,PLOS GENETICS, 2021, Ablation of DNA-methyltransferase 3A in skeletal muscle does not affect energy metabolism or exercise capacity。早年的研究表明,运动负荷后肌纤维DNA甲基化水平发生了明显改变,然而二者之间是否存在因果关系仍未被深入研究,该研究以小鼠比目鱼肌为实验材料,研究在肌纤维特异性DNMT3A敲除后运动负荷引起的DNA甲基化、运动负荷诱导的基因表达谱重塑和运动负荷诱导后的表型变化。结果显示DNMT3A敲除诱导的全基因组去甲基化与其余诸项均无显著关联,因而认为DNA甲基化图谱的重塑仅是运动负荷诱导的基因表达重编程过程的副产品。该研究的一个明显缺陷是DNMT3A的敲除效率和由此诱导的全基因组去甲基化幅度均不足。该研究提供了运动负荷调控DNA甲基化图谱的有益信息,同时提供了合理报告阴性结果的范例。
IF:4.000Q1 PLoS genetics, 2021-01. DOI: 10.1371/journal.pgen.1009325 PMID: 33513138
In response to physical exercise and diet, skeletal muscle adapts to energetic demands through large transcriptional changes. This remodelling is associated with changes in skeletal muscle DNA methylation which may participate in the metabolic adaptation to extracellular stimuli. Yet, the mechanisms by which muscle-borne DNA methylation machinery responds to diet and exercise and impacts muscle function are unknown. Here, we investigated the function of de novo DNA methylation in fully differentiated skeletal muscle. We generated muscle-specific DNA methyltransferase 3A (DNMT3A) knockout mice (mD3AKO) and investigated the impact of DNMT3A ablation on skeletal muscle DNA methylation, exercise capacity and energy metabolism. Loss of DNMT3A reduced DNA methylation in skeletal muscle over multiple genomic contexts and altered the transcription of genes known to be influenced by DNA methylation, but did not affect exercise capacity and whole-body energy metabolism compared to wild type mice. Loss of DNMT3A did not alter skeletal muscle mitochondrial function or the transcriptional response to exercise however did influence the expression of genes involved in muscle development. These data suggest that DNMT3A does not have a large role in the function of mature skeletal muscle although a role in muscle development and differentiation is likely. <<<
#paper Ali Madani, Ben Krause, Eric R Greene, Subu Subramanian, Benjamin P Mohr, James M Holton, Jose Luis Olmos Jr, Caiming Xiong, Zachary Z Sun, Richard Socher, James S Fraser, Nikhil Naik Large language models generate functional protein sequences across diverse families PMID: 36702895 DOI: 10.1038/s41587-022-01618-2。 文章通过对超过1万9千个家族的2.8亿条蛋白序列进行训练从而构建 和LLM类似的深度学习模型 ProGen。其可以进一步微调到精选的序列和标签,以提高来自具有足够同源样本的家族的蛋白质的可控生成性能。针对五个不同的溶菌酶家族进行微调的人工蛋白质显示出与天然溶菌酶相似的催化效率,且与天然蛋白质的序列同一性只有 31.4%。就在论文登上Nature Biotechnology的同一天,由论文第一作者Ali Madani创办的公司Profluent Bio宣布获得由Insight Partners领投的900万美元种子轮融资。该笔融资的将用于在加利福尼亚州伯克利建立一个湿实验室,使Profluent能够在通过实验方法产生的数据与其AI系统之间创建一个紧密的反馈循环,为设计任何蛋白质提供强大的验证,并不断改进他们的AI。
IF:33.100Q1 Nature biotechnology, 2023-08. DOI: 10.1038/s41587-022-01618-2 PMID: 36702895
Deep-learning language models have shown promise in various biotechnological applications, including protein design and engineering. Here we describe ProGen, a language model that can generate protein sequences with a predictable function across large protein families, akin to generating grammatically and semantically correct natural language sentences on diverse topics. The model was trained on 280 million protein sequences from >19,000 families and is augmented with control tags specifying protein properties. ProGen can be further fine-tuned to curated sequences and tags to improve controllable generation performance of proteins from families with sufficient homologous samples. Artificial proteins fine-tuned to five distinct lysozyme families showed similar catalytic efficiencies as natural lysozymes, with sequence identity to natural proteins as low as 31.4%. ProGen is readily adapted to diverse protein families, as we demonstrate with chorismate mutase and malate dehydrogenase. <<<
#paper Pub Date : 2019-01-23 DOI : 10.1038/s41564-018-0355-8 Harnessing undomesticated life 在实验室中只能对细菌进行细微的培养和工程改造,这限制了我们在恶劣环境中部署细菌或使用细菌生产重要化合物的能力。最近的工作通过开发新的方法来表征和工程化各种未驯化的细菌物种,从而打开了这一领域。这些技术可用于环境改造,为人类以后殖民外太空有极大的帮助
#paper arXiv:2103.00020 Learning Transferable Visual Models From Natural Language Supervision 前天拜读了CLIP论文并去了解了一下论文中提到的prompt 拜读笔记见博文:CLIP论文拜读及理解 链接:https://blog.csdn.net/weixin_44845357/article/details/130206779
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at this https URL. <<<
#paper https://doi.org/10.48550/arXiv.2302.10051 一种用于理解神经计算算法基础的既定规范方法是从原则计算目 标中导出在线算法, 并评估它们与解剖学和生理学观察的兼容性。 相似性匹配目标已成为成功导出在线算法的起点, 这些算法映射到具有点神经元和 Hebbian/anti‐Hebbian 可塑性的神经网络 (NN)。这些神经网络模型解释了许多解剖学和生理学观察; 然而, 这些目 标的计算能力有限, 并且派生的 NN 无法解释在整个大脑中普遍存在的多隔室神经元结构和非赫布形式的可塑性。在本文中, 我们回顾并统一了相似性匹配方法的最新扩展, 以解决更复杂的目 标, 包括范围广泛的无监督和自 监督学习任务, 这些任务可以表述为广义特征值问题或非负矩阵分解问题。有趣的是, 源自这些目 标的在线算法自 然地映射到具有多隔室神经元和局部非赫布学习规则的神经网络。 因此, 这种相似性匹配方法的统一扩展提供了一个规范框架, 有助于理解整个大脑中发现的多区室神经元结构和非赫布可塑性。
An established normative approach for understanding the algorithmic basis of neural computation is to derive online algorithms from principled computational objectives and evaluate their compatibility with anatomical and physiological observations. Similarity matching objectives have served as successful starting points for deriving online algorithms that map onto neural networks (NNs) with point neurons and Hebbian/anti-Hebbian plasticity. These NN models account for many anatomical and physiological observations; however, the objectives have limited computational power and the derived NNs do not explain multi-compartmental neuronal structures and non-Hebbian forms of plasticity that are prevalent throughout the brain. In this article, we review and unify recent extensions of the similarity matching approach to address more complex objectives, including a broad range of unsupervised and self-supervised learning tasks that can be formulated as generalized eigenvalue problems or nonnegative matrix factorization problems. Interestingly, the online algorithms derived from these objectives naturally map onto NNs with multi-compartmental neurons and local, non-Hebbian learning rules. Therefore, this unified extension of the similarity matching approach provides a normative framework that facilitates understanding the multi-compartmental neuronal structures and non-Hebbian plasticity found throughout the brain. <<<
#paper Toama W, Wiederin J, Shanley R, Jewett P, Gu C, Shenoy C, Nijjar PS, Blaes AH. Impact of pectoralis muscle loss on cardiac outcome and survival in Cancer patients who received anthracycline based chemotherapy: retrospective study. BMC Cancer. 2022 Jul 13;22(1):763. doi: 10.1186/s12885-022-09882-w. PMID: 35831837; PMCID: PMC9281070. 文章回顾研究了几种癌症患者用蒽环类药物化疗后胸肌质量指数(pectoralis muscle mass index,PMI)与总体死亡率,主要心脏事件(Major Adverse Cardiovascular Events,MACE)生存率间的关系。(这里给大家解释几个专业名词MACE通俗讲解就是与心脏相关的不好的事情,例如,1,心脏原因引起的死亡;2,发生非致命的心肌梗死;3;发生非致命的心血管事件。具体点来说,日常遇到的,复发心绞痛、急性心肌梗死、严重心律失常、心力衰竭、冠心病死亡,心血管疾病事件,心衰,缺血性心血管事件,心源性死亡。PMI是计算胸肌的一个指标,有点类似于我们日常生活中的BMI,PMI:胸大肌面积 [cm 2 ]/身高2 [m 2 ])文章对474名癌症患者进行了回顾性分析,发现,接受蒽环类药物治疗的患者治疗前胸肌指数越高,发生 MACE 的风险越低。认为对化疗前检测PMI,尤其是对肌肉减少症患者化疗前进行干预预防能有效减少患者的MACE风险。
IF:3.400Q2 BMC cancer, 2022-Jul-13. DOI: 10.1186/s12885-022-09882-w PMID: 35831837
INTRODUCTION: The impact of pectoralis muscle mass index (PMI) on cardiac events is not well studied in cancer patients, especially in those who have received chemotherapy with high potential cardiac toxicity such as anthracyclines.METHODS: Individuals aged ≥18 years with a diagnosis of breast cancer, sarcoma, or lymphoma who received anthracycline-based chemotherapy at the University of Minnesota MHealth Fairview between 2009 and 2014. Eligible patients had to have two CT scans: a baseline CT scan within 6 months prior to chemotherapy and a follow-up CT scan within 2 years after treatment. The PMI was calculated as the right pectoralis muscle area indexed to height squared. Multivariable linear regression was used to analyze factors associated with PMI at follow-up, overall mortality, and major cardiac events (MACE).RESULTS: A total of 474 patients (breast cancer 192; lymphoma 184; sarcoma 98) participated with a median age of 61 years at the time of baseline CT scan; 161 (34%) were male. Almost all patients received anthracyclines except 12% who received trastuzumab only. The median baseline PMI was 5.8 cm2/m2 (4.9, 7.7) which decreased 10.5% after chemotherapy, to 5.2 cm2/m2 (4.4, 6.4). Baseline PMI was not significantly associated with OS, but we detected lower risks of MACE with larger PMI at baseline. Greater baseline PMI was associated with greater follow-up PMI, but also with greater relative PMI loss. Female gender, older age, and history of smoking were also associated with greater PMI losses.CONCLUSION: Greater pre-treatment pectoralis muscle index in patients treated with anthracyclines have a lower risk of MACE. Early identification of sarcopenia using PMI could trigger proactive engagement for intervention and risk-stratified therapies. <<<
#paper doi:/10.1371/journal.pone.0000943.Causal Inference in Multisensory Perception,2007,Plos One(发表在Plos One,但是引用高) 神经系统不断地将来自不同感觉方式的不确定信息组合成对感觉刺激原因的综合理解。这些信息可能有相同的来源,也可能来自不同的来源,因此,线索的组合必须根据线索的因果关系。多模式感知整合的方法之一是线索整合概率模型,线索整合概率模型的基础是假定原因是统一的,但是后来的实验发现,当视觉和听觉刺激差异很大时,这种整合就会失效。信息之间的差异称为disparity(分离度)。当两个线索之间的disparity(分离度)增大,那么线索A对于另一个线索B的影响就会减小,反之亦然。disparity(分离度)的存在说明强制融合(无条件假定原因统一)是不成立的,因此还需要对线索之间的因果关系进行推断,需要在模型中增加一个检验交互性的先验(一个联合先验分布),用来分析两个线索同源的可能性高,还是不同源的可能性高。本研究提出了一个因果推断模型,该模型准确地预测了人类受试者在两个听觉-视觉定位任务中对线索的非线性整合。结果表明,人类确实可以有效地推断因果结构以及线索源的位置。推断因果结构的能力不仅限于有意识的、高层次的认知;它也在感知中不断地、毫不费力地进行。
IF:2.900Q1 PLoS ONE, 2007. DOI: 10.1371/journal.pone.0000943
Perceptual events derive their significance to an animal from their meaning about the world, that is from the information they carry about their causes. The brain should thus be able to efficiently infer the causes underlying our sensory events. Here we use multisensory cue combination to study causal inference in perception. We formulate an ideal-observer model that infers whether two sensory cues originate from the same location and that also estimates their location(s). This model accurately predicts the nonlinear integration of cues by human subjects in two auditory-visual localization tasks. The results show that indeed humans can efficiently infer the causal structure as well as the location of causes. By combining insights from the study of causal inference with the ideal-observer approach to sensory cue combination, we show that the capacity to infer causal structure is not limited to conscious, high-level cognition; it is also performed continually and effortlessly in perception. <<<
#paper doi: 10.1016/j.cell.2018.03.027.Chen H, et al. A Pan-Cancer Analysis of Enhancer Expression in Nearly 9000 Patient Samples. Cell. 2018 Apr 5;173(2):386-399.e12. 增强子(enhancer)通常位于结构基因的附近,是一类非编码DNA调节元件,在癌症的发展过程中起到越来越重要的作用。本研究利用TCGA数据库33癌种,总共8928肿瘤患者的RNA-seq数据,从全基因组范围识别和鉴定出大量表达的增强子。通过与正常组织进行比较,发现大多数癌种的增强子处在激活状态,且与非整倍体改变正相关,但与突变负荷无关,由此提出增强子与基因互作的染色体状态假说。为了建立因果关系的增强子-基因调控网络模型,作者通过整合eQTL分析、mRNA共表达分析以及Hi-C数据分析的结果,最终发现65个增强子-基因互作对。这些互作对经过CGC注释,总共包含22个原癌基因和8个肿瘤抑制基因。文章的最后,作者还通过CRISPR/Cas9 RNAs技术证实了存在于PD-L1基因上游140kb的一个增强子。
IF:45.500Q1 Cell, 2018-04-05. DOI: 10.1016/j.cell.2018.03.027 PMID: 29625054
The role of enhancers, a key class of non-coding regulatory DNA elements, in cancer development has increasingly been appreciated. Here, we present the detection and characterization of a large number of expressed enhancers in a genome-wide analysis of 8928 tumor samples across 33 cancer types using TCGA RNA-seq data. Compared with matched normal tissues, global enhancer activation was observed in most cancers. Across cancer types, global enhancer activity was positively associated with aneuploidy, but not mutation load, suggesting a hypothesis centered on "chromatin-state" to explain their interplay. Integrating eQTL, mRNA co-expression, and Hi-C data analysis, we developed a computational method to infer causal enhancer-gene interactions, revealing enhancers of clinically actionable genes. Having identified an enhancer ∼140 kb downstream of PD-L1, a major immunotherapy target, we validated it experimentally. This study provides a systematic view of enhancer activity in diverse tumor contexts and suggests the clinical implications of enhancers. <<<
#paper 2023-Structural changes induced by pasteurisation and/or high-pressure treatment of skim caprine milk。https://doi.org/10.1016/j.idairyj.2022.105528。该研究结果表明脱脂山羊乳的巴氏杀菌和高压(HP)组合(PHP)过程可以改变蛋白质的二级结构,导致表面疏水性增加。HP处理前的巴氏杀菌降低了α-螺旋结构含量,同时增加了β-折叠结构含量,这与脱脂山羊乳样品表面疏水性和固有荧光的变化有关。对于PHP和HP处理组样品,随着压力水平的增加和处理时间的延长,α-螺旋和β-转角结构含量降低,而β-折叠和无规卷曲结构含量增加。PHP处理可作为乳制品行业的一种良好替代技术,以提高脱脂山羊乳的功能特性。
The effects of pasteurisation, high-pressure (HP), and a combination of pasteurisation and high-pressure (PHP) on the physicochemical properties and protein structure of caprine skim milk was investigated. Samples treated by PHP generally had a higher pH, whey protein denaturation, surface hydrophobicity, and intrinsic fluorescence than those treated only with heat or pressure. In contrast, the size of skim milk casein micelles decreased significantly with an increase in pressure level and time; however, the effect was less marked when heat and pressure treatments were combined. For the PHP and HP samples, as the level and time of pressure increased, the α-helix and β-turn content reduced, whereas β-sheet and random coil were induced. Thus, PHP treatment could be used as a good alternative technology in the dairy industry to promote the functional properties of skim caprine milk. <<<
#paper, BloombergGPT: A Large Language Model for Finance, doi:10.48550/arXiv.2303.17564, ChatGPT引爆的AI热潮也“烧到了”金融圈,彭博社重磅发布为金融界打造的大型语言模型(LLM)——BloombergGPT。3月30日,根据彭博社最新发布的报告显示,其构建迄今为止最大的特定领域数据集,并训练了专门用于金融领域的LLM,开发了拥有500亿参数的语言模型——BloombergGPT。报告显示,该模型依托彭博社的大量金融数据源,构建了一个3630亿个标签的数据集,支持金融行业内的各类任务。该模型在金融任务上的表现远超过现有模型,且在通用场景上的表现与现有模型也能一较高下。报告指出,从测试来看,BloombergGPT在五项任务中的四项(ConvFinQA,FiQA SA,FPB和Headline)表现最佳,在NER(Named Entity Recognition)中排名第二。因此,BloombergGPT有其优势性。
The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. As a next step, we plan to release training logs (Chronicles) detailing our experience in training BloombergGPT. <<<
#paper doi:10.1136/gutjnl-2018-317178 Gut, 2019, Enteric fungal microbiota dysbiosis and ecological alterations in colorectal cancer 结直肠癌的肠道真菌失调和生态改变。一般研究癌症里的微生物多是细菌,病毒和真菌的少。这篇文献主要分析了结直肠癌患者样本的真菌。真菌一般在采样中的序列占比也少,但仍有分析价值。研究分析了香港的184名CRC患者、197名腺瘤患者和204名对照受试者。分析的粪便样本,分为发现队列和验证队列。主成分分析显示结直肠癌和对照组可以分为两个簇,早期和晚期的结直肠癌也有不同的真菌群。与健康人相比,结直肠癌患者粪便内的担子菌(Basidiomycota):子囊菌(Ascomycota)的比值增加。在CRC中真菌纲马拉色纲(Malasseziomycetes)富集,Saccharomycetes和Pneumocystidomycetes减少。研究找了14种真菌标志物其丰度可以将CRC和对照组分开,AUC为0.74~0.93。研究还采用了SparCC算法做了生态学分析,与对照组相比,CRC中的真菌会有界内共存的联系,而真菌与细菌会有排斥现象。研究发现了真菌粪便标志物在诊断CRC上有潜在潜力。真菌一般分析较少,研究主要用了Kraken注释reads,并且用Jellyfish程序利用公开数据做了一个自定义数据库。
IF:23.000Q1 Gut, 2019-04. DOI: 10.1136/gutjnl-2018-317178 PMID: 30472682
OBJECTIVES: Bacteriome and virome alterations are associated with colorectal cancer (CRC). Nevertheless, the gut fungal microbiota in CRC remains largely unexplored. We aimed to characterise enteric mycobiome in CRC.DESIGN: Faecal shotgun metagenomic sequences of 184 patients with CRC, 197 patients with adenoma and 204 control subjects from Hong Kong were analysed (discovery cohort: 73 patients with CRC and 92 control subjects; validation cohort: 111 patients with CRC, 197 patients with adenoma and 112 controls from Hong Kong). CRC-associated fungal markers and ecological changes were also validated in additional independent cohorts of 90 patients with CRC, 42 patients with adenoma and 66 control subjects of published repository sequences from Germany and France. Assignment of taxonomies was performed by exact k-mer alignment against an integrated microbial reference genome database.RESULTS: Principal component analysis revealed separate clusters for CRC and control (p<0.0001), with distinct mycobiomes in early-stage and late-stage CRC (p=0.0048). Basidiomycota:Ascomycota ratio was higher in CRC (p=0.0042), with increase in Malasseziomycetes (p<0.0001) and decrease in Saccharomycetes (p<0.0001) and Pneumocystidomycetes (p=0.0017). Abundances of 14 fungal biomarkers distinguished CRC from controls with an area under the receiver-operating characteristic curve (AUC) of 0.93 and validated AUCs of 0.82 and 0.74 in independent Chinese cohort V1 and European cohort V2, respectively. Further ecological analysis revealed higher numbers of co-occurring fungal intrakingdom and co-exclusive bacterial-fungal correlations in CRC (p<0.0001). Moreover, co-occurrence interactions between fungi and bacteria, mostly contributed by fungal Ascomycota and bacterial Proteobacteria in control, were reverted to co-exclusive interplay in CRC (p=0.00045).CONCLUSIONS: This study revealed CRC-associated mycobiome dysbiosis characterised by altered fungal composition and ecology, signifying that the gut mycobiome might play a role in CRC. <<<