当前共找到 1479 篇文献分享,本页显示第 1 - 20 篇。
1.
颜林林
(2026-04-27 01:07):
#paper doi:10.1016/j.jmoldx.2026.04.002, The Journal of Molecular Diagnostics, 2026, Evaluation of chimerism testing by next generation sequencing using indel markers: Analytical validation and examples of clinical utilization. 这篇文章署名作者来自美国多个州(算得上是个多中心研究),它其实应该是GenDx公司(一家荷兰公司,现已被一家法国公司收购)的移植后基因监测产品的技术验证文章(虽然GenDx否认自己参与分析和结论,只承认自己会在文章接收后支付发表费用)。产品涉及的技术很简单,通过高通量测序,鉴别其中的低频基因组突变(重点看indel,短序列插入或缺失,频率可低至0.5%),以此监测供血者的血细胞占接受输血移植者全血的比例,在术后病程期间的变化。从原理上,这就应该的确比传统STR(基于微卫星,即短串联重复片段,的方法更精确),结论没啥新鲜的(不做到这样,这产品估计也没法卖)。本文通过使用患者样本进行不同比例的掺混,得到用于性能评估的标准样品,由此对技术进行验证(这也算是行业的常规方法),确认了性能的确优于STR方法。此外,为了提升价值(包括讲述临床应用故事),还持续追踪了几位真实的移植患者(最长超过300天),在多个时间点进行采样和检测,确定其各自体内的移植成分比例变化(包括使用流式细胞术做独立验证),并与表型进行关联解释。
Shannon Dutterer,
Christle Moore,
Maya Giddens,
Juliet Smith,
Jenifer Williams,
Jessica Magwood,
Jeane Silva,
Cathi Murphey,
Valia Bravo-Egana
2.
惊鸿
(2026-04-24 11:41):
#paper
DOI: 10.1056/NEJMoa2309149
英文标题: CRISPR-Cas9 In Vivo Gene Editing of KLKB1 for Hereditary Angioedema
发表时间: 2024年2月1日(《新英格兰医学杂志》正式发表)
核心突破
本研究首次在人体内应用CRISPR-Cas9基因编辑技术,靶向KLKB1基因(编码激肽释放酶原),成功治疗了遗传性血管性水肿(HAE)。这是一种罕见的、可能危及生命的遗传病,患者因C1抑制剂缺乏导致缓激肽过度生成,引发反复发作的皮肤和黏膜水肿。
技术亮点
体内一次性编辑:通过单次静脉输注携带CRISPR-Cas9系统的脂质纳米颗粒(LNP),直接在肝脏中敲除KLKB1基因,从而持久降低激肽释放酶原水平,从源头上减少缓激肽的生成。
高效且持久:治疗后,97%的患者在观察期内疾病未再发作,血浆激肽释放酶原水平显著且持续下降,避免了传统疗法需长期定期给药的负担。
良好安全性:未报告与治疗相关的严重不良事件,初步验证了体内CRISPR编辑在HAE患者中的安全性。
局限与展望
长期随访需完善:目前报道的随访时间仍有限,其长期安全性(如潜在脱靶效应)和疗效持久性需数年观察。
适用范围:该策略主要适用于HAE类型Ⅰ/Ⅱ(C1抑制剂缺乏型),对其他亚型(如正常C1抑制剂型)的疗效有待探索。
可扩展性:该“肝靶向LNP递送+基因敲除”平台有望拓展至其他由肝脏特异性蛋白异常引起的遗传病。
总结
这项研究标志着体内CRISPR基因编辑疗法在遗传性血管性水肿治疗中取得了突破性进展,首次实现了通过一次性治疗近乎完全控制疾病发作的目标。它不仅为HAE患者提供了潜在的根治性选择,也进一步验证了体内基因编辑在治疗单基因遗传病中的巨大潜力,为更多罕见病的治疗开辟了新路径。
原文链接:https://doi.org/10.1056/NEJMoa2309149
3.
徐炳祥
(2026-04-22 22:43):
#paper doi: 10.1186/s13059-026-04004-2 Genome Biology, 2026, Sequence bias in chromatin fragmentation leads to misinterpretation of protein-DNA interactions in vivo。在染色质可及性分析中,一个重要的任务是结合蛋白保护片段长度分布(V plot)与序列模体(motif)判断转录因子的结合情况(足迹分析)。本文通过不同序列片段化方式(DNaseI,MNase和超声打断)片段化染色质或清除了其中所有蛋白组分的裸DNA,发现片段化中存在序列偏好性。这一偏好性可导致在裸DNA中仍能观察到之前被认为是由转录因子结合保护所导致的特异性片段长度分布特征。这些序列偏好不能为移除存在序列偏好的部分碱基等简单生物信息手段所避免。基于本文的结论,基于DNase-seq / ATAC-seq / MNase-seq等实验数据的转录因子足迹分析结果可能是不可靠的,应当慎用。
Genome Biology,
2026-2-16.
DOI: 10.1186/s13059-026-04004-2
Laura Durán,
Laura Rodríguez,
Alicia García,
Rodrigo Santamaría,
Mar Sánchez,
Francisco Antequera
4.
ZĒNG Yíngzhū (Zoo) 曾莹珠
(2026-04-21 09:47):
#paper
Sycophantic AI decreases prosocial intentions and promotes dependence.
Myra Cheng et al.
2026
https://doi.org/10.1126/
science.aec8352
研究者使用三批数据(日常的建议寻求;Reddit里一个论坛Am I the asshole,众人认同做错了的;伤害自我或他人的行为描述),对11个主流AI模型的谄媚程度进行了分析。AI比人类更多认同用户,多出47-51%。
之后进行了三个实验。
2AI谄媚与否(是vs否)*2AI回答的风格(拟人vs机械)。
2AI谄媚与否(是vs否)*感知回答来源(真人vsAI) 。
被试回忆自身的人际矛盾,2AI谄媚与否(是vs否)。
结果是,谄媚组被试更不愿意道歉,更不会主动改善处境或者改变自身行为。这效应不受AI回答风格或感知回答来源的影响。控制场景和被试人口学变量之后效应还是显著。
被试对谄媚的AI的信任度更高,对它们回答质量的评分更高,更愿意下次再用这些谄媚的模型。
读完之后,我的感受是,个人用户来说,可以特意要求AI指出自己的问题所在,下明确的指令,以及质疑AI。
还有反过来提问,描述问题时,把对方当做自己,换个角度来描述。就类似明明我是要polish稿子,要过的人,但说成我在给别人审批,让AI列出我不让稿子过的原因。
Science,
2026-3-26.
DOI: 10.1126/science.aec8352
Myra Cheng,
Cinoo Lee,
Pranav Khadpe,
Sunny Yu,
Dyllan Han,
Dan Jurafsky
Abstract:
Despite rising concerns about sycophancy—excessive agreement or flattery from artificial intelligence (AI) systems—little is known about its prevalence or consequences. We show that sycophancy is widespread and harmful. Across 11 state-of-the-art models, AI affirmed users’ actions 49% more often than humans, even when queries involved deception, illegality, or other harms. In three preregistered experiments (
N
= 2405), even a single interaction with sycophantic AI reduced participants’ willingness to take responsibility and repair interpersonal conflicts, while increasing th… >>>
Despite rising concerns about sycophancy—excessive agreement or flattery from artificial intelligence (AI) systems—little is known about its prevalence or consequences. We show that sycophancy is widespread and harmful. Across 11 state-of-the-art models, AI affirmed users’ actions 49% more often than humans, even when queries involved deception, illegality, or other harms. In three preregistered experiments (
N
= 2405), even a single interaction with sycophantic AI reduced participants’ willingness to take responsibility and repair interpersonal conflicts, while increasing th… >>>
<br> Despite rising concerns about sycophancy—excessive agreement or flattery from artificial intelligence (AI) systems—little is known about its prevalence or consequences. We show that sycophancy is widespread and harmful. Across 11 state-of-the-art models, AI affirmed users’ actions 49% more often than humans, even when queries involved deception, illegality, or other harms. In three preregistered experiments (<br> N</i><br> = 2405), even a single interaction with sycophantic AI reduced participants’ willingness to take responsibility and repair interpersonal conflicts, while increasing their conviction that they were right. Despite distorting judgment, sycophantic models were trusted and preferred. This creates perverse incentives for sycophancy to persist: The very feature that causes harm also drives engagement. Our findings underscore the need for design, evaluation, and accountability mechanisms to protect user well-being.<br> <<<
5.
龙海晨
(2026-04-14 19:15):
#paper Ma W. Artificial intelligence and multi-omics nominate TAZ as an insomnia-related diagnostic and druggable target for Parkinson's disease patients. Front Aging Neurosci. 2026 Feb 4;18:1727472. doi: 10.3389/fnagi.2026.1727472. PMID: 41717220; PMCID: PMC12913377. 这是一篇人工智能与生物信息相结合用于筛选药物靶点的文章。作者从 GEO 和 Genecard 数据库下载帕金森病 (PD) 的数据,以及失眠相关基因列表。用DEG 鉴定和 WGCNA 分析筛选基因,用KEGG 和 GO 功能富集分析,用机器学习的方法进一步筛选,验证的到中心基因TAZ,又在单细胞中进行分析与TAZ相关的,之后进行相关的药物筛选DrugRefLector筛选药物,接着用AutoDock 识别潜在结合位点。用QPCR检测PD细胞中TAZ mRNA表达显著升高。利用人工智能和多组学在分子水平上强调了失眠和 PD 进展之间的机制联系。
Frontiers in Aging Neuroscience,
2026-2-4.
DOI: 10.3389/fnagi.2026.1727472
Wenjing Ma
Abstract:
Background
Insomnia is one of the most common non-motor comorbidities of Parkinson’s disease (PD) and often before the onset of motor symptoms. Identifying the molecular mechanisms of insomnia may facilitate the early diagnosis of PD and contribute to therapeutic development.
Methods
Five human PD substantia nigra (SN) bulk-seq datasets (GSE20141, GSE7621, GSE20164, GSE20163, and GSE20333), with an insomnia-related gene list, were acquired from GEO and Genecard databases. First, the integration of GSE20141 and GSE7621 was analyzed to identify insomnia-related D… >>>
Background
Insomnia is one of the most common non-motor comorbidities of Parkinson’s disease (PD) and often before the onset of motor symptoms. Identifying the molecular mechanisms of insomnia may facilitate the early diagnosis of PD and contribute to therapeutic development.
Methods
Five human PD substantia nigra (SN) bulk-seq datasets (GSE20141, GSE7621, GSE20164, GSE20163, and GSE20333), with an insomnia-related gene list, were acquired from GEO and Genecard databases. First, the integration of GSE20141 and GSE7621 was analyzed to identify insomnia-related D… >>>
<br> Background<br> Insomnia is one of the most common non-motor comorbidities of Parkinson’s disease (PD) and often before the onset of motor symptoms. Identifying the molecular mechanisms of insomnia may facilitate the early diagnosis of PD and contribute to therapeutic development.<br> <br> <br> Methods<br> <br> Five human PD substantia nigra (SN) bulk-seq datasets (GSE20141, GSE7621, GSE20164, GSE20163, and GSE20333), with an insomnia-related gene list, were acquired from GEO and Genecard databases. First, the integration of GSE20141 and GSE7621 was analyzed to identify insomnia-related DEGs using limma and the WGCNA framework. GSE20164 and GSE20163 combination were used as a training set for insomnia-related hub gene recognition. Furthermore, the aforementioned four datasets, along with an independent validation set (GSE20333), were cross-validated for insomnia-related diagnostic model construction. The human PD-SN single-cell profile (GSE140231) was utilized for exploring the mechanisms underlying the heterogeneity of insomnia-related hub genes in spatial and temporal contexts. Furthermore, a cutting-edge artificial intelligence (AI)-driven framework (DrugRefLector) and molecular docking techniques was used to identify an optimal agent for the treatment of PD based on the GSE20164 and GSE20163 integrated dataset. Finally, an<br> <i>in vitro</i><br> q-RT-PCR experiment was conducted to estimate the targeted gene expression.<br> <br> <br> <br> Results<br> TAZ (WWTR1) is associated with the increased expression of insomnia-related diagnostic markers linked to PD pathogenesis, mainly in neurons, and has excellent predictive performance for PD diagnosis. Furthermore, BRD-K97481123 can be considered as a potential therapeutic agent for the treatment of PD by targeting TAZ.<br> <br> <br> Conclusion<br> By integrating AI pipelines and multi-omics, our study first traced TAZ mechanisms in PD pathogenesis and elaborated on TAZ’s predictive and druggable potential for PD patients.<br> <<<
6.
DeDe宝
(2026-04-02 02:27):
#paper Integration of Memory and Sensory Information in Skilled Sequence Production. The Journal of Neuroscience.2026
本研究聚焦序列动作中记忆与感觉信息的动态整合机制。当人类完成一系列动作时,需要整合之前的经验(内部记忆)和当前的刺激(外部感觉信息),但此前的研究孤立探讨这两个重要因素。本研究采用离散序列产生任务(Discrete Sequence Production Task, DSP)范式探究上述两个因素的影响,要求被试用右手五指重复固定长度的离散按键序列。研究操纵线索数量(外部感觉信息)和序列重复性(内部记忆),并构建证据累积计算模型拟合行为数据。研究发现:学习初期被试整合记忆与感觉线索;熟练且完全可预测时转为纯记忆驱动;引入违背引发不确定性后,重新整合两类信息。模型证实序列动作可并行独立计划接下来三次动作,且序列记忆遇冲突快速失活、需连续一致线索才缓慢重激活。该研究揭示了大脑灵活整合内外信息以生成精准序列动作的核心规律。
The Journal of Neuroscience,
2026-4-1.
DOI: 10.1523/JNEUROSCI.1797-25.2026
Amin Nazerzadeh,
Medha Porwal,
J. Andrew Pruszynski,
Jörn Diedrichsen
Abstract:
Sequential movements rely on two information sources: external sensory cues and internal memory representations. Although often both sources jointly drive sequential behavior, previous research has primarily examined them in isolation. To address this, we trained participants (
n
= 26, 15F) to perform sequences of rapid finger presses in response to numerical cues. Sensory influence was measured by varying the number of visible cues, and memory influence was determined by comparing repeating and random sequences. Early in learning, participants integrated sensory and memor… >>>
Sequential movements rely on two information sources: external sensory cues and internal memory representations. Although often both sources jointly drive sequential behavior, previous research has primarily examined them in isolation. To address this, we trained participants (
n
= 26, 15F) to perform sequences of rapid finger presses in response to numerical cues. Sensory influence was measured by varying the number of visible cues, and memory influence was determined by comparing repeating and random sequences. Early in learning, participants integrated sensory and memor… >>>
<br> Sequential movements rely on two information sources: external sensory cues and internal memory representations. Although often both sources jointly drive sequential behavior, previous research has primarily examined them in isolation. To address this, we trained participants (<br> <i>n</i><br> = 26, 15F) to perform sequences of rapid finger presses in response to numerical cues. Sensory influence was measured by varying the number of visible cues, and memory influence was determined by comparing repeating and random sequences. Early in learning, participants integrated sensory and memory information: repeating sequences were performed more quickly when more cues were visible. After learning, when repeating sequences were predictable with certainty, participants relied solely on memory and ignored sensory cues. However, when this certainty was manipulated by introducing occasional violations within repeating sequences, participants reverted to integrating memory with sensory cues. We propose a computational model that successfully predicted both speed and accuracy of individual presses. Critically, this model relied on the assumption that multiple movements are planned independently of each other. This independence assumption was then validated by examining response patterns to isolated violations in repeating sequences. Finally, we provide evidence into how sequence memories can be flexibly deactivated and reactivated in response to these violations. Together, these results reveal how brain dynamically integrates sensory and memory information to produce sequences of movements.<br> <<<
7.
刘昊辰
(2026-04-01 15:10):
#paper THE CDE METHOD A TECHNIQUE IN FUNCTIONAL EQUATIONS. 本文提出了一种解决中学数学竞赛中函数方程问题的新的较为通用的方法(CDE方法),并给出3个相关引理,28个例题和若干习题。此方法在最近几年的数学竞赛中已经有所应用,也被AoPS论坛讨论过,值得关注中学数学竞赛动向的人学习。下载地址:https://arxiv.org/abs/1901.11131
arXiv,
2019-01-30T22:42:29Z.
DOI: 10.48550/arXiv.1901.11131
Athanasios Kontogeorgis,
Rafail Tsiamis
Abstract:
In this article we present an extremely effective and relatively unknown approach to solving functional equations that appear in mathematical competitions. We aim to explain the philosophy of this novel method through numerous examples, which also highlight how this idea can be paired with other useful techniques to crack challenging problems.
8.
尹志
(2026-03-31 23:30):
#paper, Quantum-HPC hybrid computation of biomolecular excited-state energies, DOI: 10.48550/arXiv.2601.15677.
通过ONIOM框架,结合TE-QSCI算法,在离子阱方案上实现了视网膜醛的光异构化的S0、S1以及T0的能量计算。非常好的量子+HPC混合计算的例子。
arXiv,
2026-01-22T05:57:54Z.
DOI: 10.48550/arXiv.2601.15677
Abstract:
We develop a workflow within the ONIOM framework and demonstrate it on the hybrid computing system consisting of the supercomputer Fugaku and the Quantinuum Reimei trapped-ion quantum computer. This hybrid platform extends the layered approach for biomolecular chemical reactions to accurately treat the active site, such as a protein, and the large and often weakly correlated molecular environment. Our result marks a significant milestone in enabling scalable and accurate simulation of complex biomolecular reactions
9.
徐炳祥
(2026-03-31 22:57):
#paper doi: 10.1038/s41576-026-00939-1 nature reviews genetics, 2026, Gene regulatory networks from correlative models to causal explanations。这篇综述详细论述了近期基因调控网络建模的瓶颈和发展趋势。基因调控网络正逐渐从传统的机制解释转变为复杂的统计相关性模型,导致其难以准确捕捉分子间的因果关系。作者指出,现有的调控网络由于规模庞大、动态复杂且存在模型“松散性”,使得仅依靠单细胞组学数据进行推断面临严峻挑战。为此,文章提出了一种表示学习框架,主张通过三个原则建立更具解释力的模型:一是模型必须以细胞和进化生物学为基础,具有内在机制;二是利用分子约束条件缩小学习空间;三是结合精密的实验扰动和合成生物学工程来验证预测。该框架旨在通过多层次的抽象(如计算、表示和实现层),实现从海量数据到生物学新认知的跨越。这篇综述对基因表达调控的研究、单细胞组学和合成生物学均具有前瞻性参考价值。
Nature Reviews Genetics,
2026-3-9.
DOI: 10.1038/s41576-026-00939-1
Rory J. Maizels,
James Briscoe
10.
小年
(2026-03-31 22:08):
#paper arXiv:2603.12457(预印本),The Single-Model Illusion in AI-Driven Drug Discovery: Introducing a Systems-Level Multi-Model Framework for Translational Discovery
研究团队针对当前AI驱动药物研发普遍依赖单一模型带来的预测偏差、泛化能力弱、临床转化成功率低等“单模型错觉”问题,提出了一套系统级多模型整合框架。该研究通过对比分析单一预测模型在分子设计、靶点结合、药代动力学及毒性评估中的局限性,揭示了过度依赖单一会导致研究结果与临床实际脱节。在此基础上构建的多模型协同体系,整合了靶点建模、分子生成、理化性质预测、细胞与动物水平验证等多层级计算模型,实现从分子筛选到转化研究的全流程交叉验证与决策优化。实际测试表明,该框架能有效降低假阳性与模型误导,提升候选药物的可靠性与可转化性,为突破现有AI制药瓶颈、建立更稳健的转化式药物发现体系提供了新的研究范式。
Zenodo,
2026/3/26.
DOI: 10.5281/zenodo.19240171
Melinda Chu
Abstract:
Recent advances in AI-driven drug discovery have led to widespread narratives suggesting that a single model or platform can generate viable therapeutic candidates and, when combined with automated laboratory systems, rapidly progress to clinical development. These narratives often imply that AI-driven design coupled with robotic execution can substantially compress the path to Phase I trials and accelerate the treatment of complex diseases within a few years. However, practical implementation reveals a significant gap between model-level performance and end-to-end drug development success.
11.
cellsarts
(2026-03-31 21:54):
#paperDOI:10.3389/fpls.2018.009022018-06-29 Functional Microbial Features Driving Community Assembly During Seed Germination and Emergence
驱动种子萌发与出苗期间群落组装的功能性微生物特征
plant science 影响因子:4.1JCR分区:2区 - 生化与分子生物学1区 - 植物科学中科院分区:2区年发文量:294 种子及其周围发生的微生物相互作用对植物的适应性尤其重要,因为种子携带的微生物是植物微生物群落最初接种源。在本研究中,我们分析了植物生命周期早期阶段——即萌发和出苗阶段——植物微生物群落内部发生的结构与功能变化。为此,我们对两种植物物种:菜豆和萝卜的种子、萌发种子及幼苗相关微生物群落进行了鸟枪法DNA测序。我们观察到,在出苗过程中肠杆菌目和假单胞菌目的丰度显著增加,并且发现了一系列与富营养型代谢相关的功能特征,这些特征可能正是由于萌发后养分供应增加而导致的这一选择结果。从幼苗中筛选出的代表性细菌分离株的确表现出比种子相关细菌分离株更快的生长速率。最后,通过宏基因组重叠群聚类,我们重建了与样品相关的主要细菌类群的群体基因组。综合我们的研究结果表明,尽管不同植物物种的种子微生物群落存在差异,但萌发期间的养分供应却会引发微生物群落组成的改变,从而可能选择出具有与富营养型代谢相关功能特征的微生物类群。本文所呈现的数据首次以实证方式评估了植物出苗过程中微生物群落的变化,推动我们朝着更全面地理解植物微生物组的方向迈进。
Frontiers in Plant Science,
2018-6-29.
DOI: 10.3389/fpls.2018.00902
Gloria Torres-Cortés,
Sophie Bonneau,
Olivier Bouchez,
Clémence Genthon,
Martial Briand,
Marie-Agnès Jacques,
Matthieu Barret
12.
白鸟
(2026-03-31 21:49):
#paper DOI:10.1101/2025.01.29.635579, A SNP Foundation Model: Application in Whole-Genome Haplotype Phasing and Genotype Imputation.
SNPBag是一个基于Transformer的基础模型,专为全基因组规模的SNP分析而设计。包括基因型填补、单倍型分相、基因组嵌入、祖先推断和亲缘关系推断。它解决了传统工具的扩展性、效率和参考依赖问题,实现了10-100倍的加速。
SNPBag展示了基础模型在SNP分析中的潜力,提供统一、高效框架。优势包括无需参考面板,通过预训练直接建模全局遗传模式、分析加速和压缩存储。
局限:依赖模拟数据,可能未完全捕捉真实变异;非洲等高多样性人群性能较低;亲缘推断在远亲上召回有限。
未来可扩展到更多任务(如GWAS、PRS)、整合多模态数据,并使用更大真实数据集微调。
bioRxiv,
2025-10-11.
DOI: 10.1101/2025.01.29.635579
Augix Guohua Xu,
Yu Xu,
Yiming Xing,
Pengchao Luo,
Jianbo Yang,
Yinqi Bai,
Kun Tang
Abstract:
Abstract
Millions of human genomes have been genotyped by national biobanks worldwide. Training large language models (LLM) with this data may lead to a universal model of human genome with tremendous potential. Yet the quadrillions (10
15
) of nucleotides— resulting from genome length multiplied by population size—pose formidable challenges for modeling. In this study, we propose a novel AI framework designed to scale with this data and support diverse analytical tasks. To demonstrate this scheme, we developed SNPBag—a foundation model focusing on single nucleotide polymorph… >>>
Millions of human genomes have been genotyped by national biobanks worldwide. Training large language models (LLM) with this data may lead to a universal model of human genome with tremendous potential. Yet the quadrillions (10
15
) of nucleotides— resulting from genome length multiplied by population size—pose formidable challenges for modeling. In this study, we propose a novel AI framework designed to scale with this data and support diverse analytical tasks. To demonstrate this scheme, we developed SNPBag—a foundation model focusing on single nucleotide polymorph… >>>
Abstract<br> <br> Millions of human genomes have been genotyped by national biobanks worldwide. Training large language models (LLM) with this data may lead to a universal model of human genome with tremendous potential. Yet the quadrillions (10<br> 15<br> ) of nucleotides— resulting from genome length multiplied by population size—pose formidable challenges for modeling. In this study, we propose a novel AI framework designed to scale with this data and support diverse analytical tasks. To demonstrate this scheme, we developed SNPBag—a foundation model focusing on single nucleotide polymorphism (SNP). With 0.8 billion parameters, it is trained on one million synthesized human genomes, corresponding to a total of 6 trillion SNP tokens. SNPBag showed superior performance in benchmarking of multiple tasks. In genotype imputation, it achieves state-of-the-art (SOTA) accuracy. In haplotype phasing, it rivals the best method with a 72-fold speedup. By encoding 6 million SNPs per genome into a 0.75 MB embedding, SNPBag enables efficient storage, transfer and downstream applications. In particular, the genome embeddings facilitate rapid ancestry inference across global populations and detection of genetic relationships up to 12th-degree relatives. Collectively, SNPBag introduces a new paradigm for scalable, unified and multitask analysis of the ever-growing human variation data.<br> <<<
13.
林海onrush
(2026-03-31 20:08):
#paper, Wasserstein Distance Rivals Kullback-Leibler Divergence for Knowledge Distillation, DOI: 10.48550/arXiv.2412.08139. 论文提出用 Wasserstein Distance来替代知识蒸馏中长期主流的 KL Divergence(KL 散度).作者认为 KL 只擅长做“同类别对同类别”的概率对齐,难以显式利用类别之间的相似关系,而且在中间层特征蒸馏中对高维、稀疏、分布不重叠的数据也不够合适;因此他们分别设计了基于离散 WD 的WKD-L来做 logit 蒸馏、基于连续 WD 的WKD-F来做特征蒸馏,并在 ImageNet、CIFAR-100、Self-KD 和 MS-COCO 上都取得了优于多种 KL 系方法和强基线的方法效果,说明 WD 在知识蒸馏里不仅可用,而且在不少场景下甚至优于 KL 散度。
arXiv,
2024/12/11.
Jiaming Lv, Haoyuan Yang, Peihua Li
Abstract:
Since pioneering work of Hinton et al., knowledge distillation based on Kullback-Leibler Divergence (KL-Div) has been predominant, and recently its variants have achieved compelling performance. However, KL-Div only compares probabilities of the corresponding category between the teacher and student while lacking a mechanism for cross-category comparison. Besides, KL-Div is problematic when applied to intermediate layers, as it cannot handle non-overlapping distributions and is unaware of geometry of the underlying manifold. To address these downsides, we propose a methodology of Wasserstein … >>>
Since pioneering work of Hinton et al., knowledge distillation based on Kullback-Leibler Divergence (KL-Div) has been predominant, and recently its variants have achieved compelling performance. However, KL-Div only compares probabilities of the corresponding category between the teacher and student while lacking a mechanism for cross-category comparison. Besides, KL-Div is problematic when applied to intermediate layers, as it cannot handle non-overlapping distributions and is unaware of geometry of the underlying manifold. To address these downsides, we propose a methodology of Wasserstein Distance (WD) based knowledge distillation. Specifically, we propose a logit distillation method called WKD-L based on discrete WD, which performs cross-category comparison of probabilities and thus can explicitly leverage rich interrelations among categories. Moreover, we introduce a feature distillation method called WKD-F, which uses a parametric method for modeling feature distributions and adopts continuous WD for transferring knowledge from intermediate layers. Comprehensive evaluations on image classification and object detection have shown (1) for logit distillation WKD-L outperforms very strong KL-Div variants; (2) for feature distillation WKD-F is superior to the KL-Div counterparts and state-of-the-art competitors. The source code is available at https://peihuali.org/WKD <<<
14.
半面阳光
(2026-03-31 18:15):
#paper doi: 10.1101/gr.278413.123. Genome Res. 2025. Artificial intelligence and machine learning in cell-free-DNA-based diagnostics. 这篇综述文章不是提出某个全新算法,而是系统总结了 AI/机器学习怎样用于 cfDNA(cell-free DNA)诊断,尤其是 NIPT 和 肿瘤液体活检 两大场景。作者先回顾了 cfDNA 的生物学特征,再介绍常见的 ML/AI 方法,最后重点讲这些方法如何处理 cfDNA 这类高维、多特征数据。
Genome Research,
2025-1.
DOI: 10.1101/gr.278413.123
W.H. Adrian Tsui,
Spencer C. Ding,
Peiyong Jiang,
Y.M. Dennis Lo
Abstract:
The discovery of circulating fetal and tumor cell-free DNA (cfDNA) molecules in plasma has opened up tremendous opportunities in noninvasive diagnostics such as the detection of fetal chromosomal aneuploidies and cancers and in posttransplantation monitoring. The advent of high-throughput sequencing technologies makes it possible to scrutinize the characteristics of cfDNA molecules, opening up the fields of cfDNA genetics, epigenetics, transcriptomics, and fragmentomics, providing a plethora of biomarkers. Machine learning (ML) and/or artificial intelligence (AI) technologies that are known f… >>>
The discovery of circulating fetal and tumor cell-free DNA (cfDNA) molecules in plasma has opened up tremendous opportunities in noninvasive diagnostics such as the detection of fetal chromosomal aneuploidies and cancers and in posttransplantation monitoring. The advent of high-throughput sequencing technologies makes it possible to scrutinize the characteristics of cfDNA molecules, opening up the fields of cfDNA genetics, epigenetics, transcriptomics, and fragmentomics, providing a plethora of biomarkers. Machine learning (ML) and/or artificial intelligence (AI) technologies that are known for their ability to integrate high-dimensional features have recently been applied to the field of liquid biopsy. In this review, we highlight various AI and ML approaches in cfDNA-based diagnostics. We first introduce the biology of cell-free DNA and basic concepts of ML and AI technologies. We then discuss selected examples of ML- or AI-based applications in noninvasive prenatal testing and cancer liquid biopsy. These applications include the deduction of fetal DNA fraction, plasma DNA tissue mapping, and cancer detection and localization. Finally, we offer perspectives on the future direction of using ML and AI technologies to leverage cfDNA fragmentation patterns in terms of methylomic and transcriptional investigations. <<<
15.
Vincent
(2026-03-31 15:00):
#paper https://doi.org/10.1038/s41586-026-10265-5
Nature 2026. Towards end-to-end automation of AI research. 这篇文章首次构建了一个能够端到端自动完成科研全流程的 AI 系统,覆盖从想法生成、实验执行到论文写作与同行评审的完整闭环。系统基于多智能体架构,并通过分阶段实验流程与 agentic tree search在研究空间中进行系统性探索。实验表明,AI 生成论文已具备真实科研质量,其中一篇通过 ICLR workshop 盲审,达到人类接受阈值 。同时,自动审稿系统与人类评审一致性相当,用于大规模评估生成结果。研究进一步发现,论文质量随模型能力与 test-time compute 显著提升,揭示科研能力的可扩展性。尽管当前仍存在实现错误与幻觉问题,该工作将科研过程形式化为可搜索的计算问题,标志着从“AI 辅助科研”向“AI 自动科研”的范式转变。
Nature,
2026-3-26.
DOI: 10.1038/s41586-026-10265-5
Chris Lu,
Cong Lu,
Robert Tjarko Lange,
Yutaro Yamada,
Shengran Hu,
Jakob Foerster,
David Ha,
Jeff Clune
Abstract:
Abstract
The automation of science is a long-standing ambition in artificial intelligence (AI) research
1,2
. Although the community has made substantial progress in automating individual components of the scientific process, a system that autonomously navigates the entire research life cycle—from conception to publication—has remained out of reach. Here we present a pipeline for automating the entire scientific process end to end. We present The AI Scientist, which creates research ideas, writes code, runs experiments, plots and analyses data, writes the entire scientific ma… >>>
The automation of science is a long-standing ambition in artificial intelligence (AI) research
1,2
. Although the community has made substantial progress in automating individual components of the scientific process, a system that autonomously navigates the entire research life cycle—from conception to publication—has remained out of reach. Here we present a pipeline for automating the entire scientific process end to end. We present The AI Scientist, which creates research ideas, writes code, runs experiments, plots and analyses data, writes the entire scientific ma… >>>
Abstract<br> <br> The automation of science is a long-standing ambition in artificial intelligence (AI) research<br> 1,2<br> . Although the community has made substantial progress in automating individual components of the scientific process, a system that autonomously navigates the entire research life cycle—from conception to publication—has remained out of reach. Here we present a pipeline for automating the entire scientific process end to end. We present The AI Scientist, which creates research ideas, writes code, runs experiments, plots and analyses data, writes the entire scientific manuscript, and performs its own peer review. Its ideas, execution and presentation are of sufficient quality that the manuscript generated by this AI system passed the first round of peer review for a workshop of a top-tier machine learning conference. The workshop had an acceptance rate of 70%. Our system leverages modern foundation models<br> 3–5<br> within a complex agentic system. We evaluate The AI Scientist in two settings: a focused mode using human-provided code templates as an initial scaffold for conducting research on a specific topic and a template-free, open-ended mode that leverages agentic search for wider scientific exploration<br> 6,7<br> . Both settings produce diverse ideas and automatically test, report on and evaluate them. This achievement demonstrates the growing capacity of AI for making scientific contributions and signifies a potential paradigm shift in how research is conducted. As with any impactful new technology, there could be important risks, including taxing overwhelmed review systems and adding noise to the scientific literature. However, if developed responsibly, such autonomous systems could greatly accelerate scientific discovery.<br> <<<
16.
钟鸣
(2026-03-31 14:02):
#paper doi:10.1038/S41467-024-47345-X Multicore fiber optic imaging reveals that astrocyte calcium activity in the mouse cerebral cortex is modulated by internal motivational state
围绕胶质细胞的研究正在快速增长,使用现有钙成像技术均存在局限性:头戴显微镜(或其他需要埋置透镜的技术)组织损伤大,引发反应性胶质增生;双光子显微镜则需固定头部仅适用于动物固定的实验场景,光纤记录系统的空间分辨率太低。基于此,本文开发了多芯光纤成像系统,系统由含有30000根光纤的多核光纤书和微型透镜精密耦合而成,实现了2.8μm横向分辨率,且使用时不侵入皮层、不限制动物活动。随后通过一些列对照严谨的实验,探索了星形胶质细胞钙活动的规律,强调其在不同生理条件下的动态性。
Nature Communications,
2024-4-8.
DOI: 10.1038/S41467-024-47345-X
Yung-Tian A. Gau,
Eric T. Hsu,
Richard J. Cha,
Rebecca W. Pak,
Loren L. Looger,
Jin U. Kang,
Dwight E. Bergles
Abstract:
AbstractAstrocytes are a direct target of neuromodulators and can influence neuronal activity on broad spatial and temporal scales in response to a rise in cytosolic calcium. However, our knowledge about how astrocytes are recruited during different animal behaviors remains limited. To measure astrocyte activity calcium in vivo during normative behaviors, we utilize a high-resolution, long working distance multicore fiber optic imaging system that allows visualization of individual astrocyte calcium transients in the cerebral cortex of freely moving mice. We define the spatiotemporal dynamics… >>>
AbstractAstrocytes are a direct target of neuromodulators and can influence neuronal activity on broad spatial and temporal scales in response to a rise in cytosolic calcium. However, our knowledge about how astrocytes are recruited during different animal behaviors remains limited. To measure astrocyte activity calcium in vivo during normative behaviors, we utilize a high-resolution, long working distance multicore fiber optic imaging system that allows visualization of individual astrocyte calcium transients in the cerebral cortex of freely moving mice. We define the spatiotemporal dynamics of astrocyte calcium changes during diverse behaviors, ranging from sleep-wake cycles to the exploration of novel objects, showing that their activity is more variable and less synchronous than apparent in head-immobilized imaging conditions. In accordance with their molecular diversity, individual astrocytes often exhibit distinct thresholds and activity patterns during explorative behaviors, allowing temporal encoding across the astrocyte network. Astrocyte calcium events were induced by noradrenergic and cholinergic systems and modulated by internal state. The distinct activity patterns exhibited by astrocytes provides a means to vary their neuromodulatory influence in different behavioral contexts and internal states. <<<
17.
符毓
(2026-03-31 12:30):
#paper doi:10.1080/17452759.2026.2613185 Virtual and Physical Prototyping, 2026, Fully 3D-Printed electric motor manufactured via multi-modal, multi-material extrusion.
用商用桌面3D打印机制造电机的所有关键部件,包括线圈、软磁芯、硬磁体和机械联轴器。本研究以最少的组装工序,制造出了首个完全3D打印的电机,展示了该技术在复杂电磁硬件整体制造方面的潜力。在此过程中,展示了能够产生2mT磁场的完全3D打印线圈——其强度几乎是先前报道的四倍
这款概念验证电机可以被描述为线圈-磁体线性致动器,它是首款完全采用单一技术——材料挤出——进行3D打印的电机,仅需对硬磁体进行磁化处理
Virtual and Physical Prototyping,
2026-12-31.
DOI: 10.1080/17452759.2026.2613185
Jorge Cañada,
Zoey Bigelow,
Luis Fernando Velásquez-García
18.
理但哪
(2026-03-31 11:28):
#paper 严耕望是享誉世界的史学家,在中国政治制度史和中国历史地理学方面卓有成就。他以学问为生命,一生以做"坚强纯净的学术人"自守,深受学界尊敬。他的巨大学术成就,体现出鲜明的治学风格。其主要表现:在学术路径上是通过专精以达博通;在研究旨趣上,倾向于做实在具体的研究,不做抽象理论的研究;在资料运用上,主张把研究建立在基本资料上;在研究方法上,主要是通过对史料的考辨、归纳、统计而得出结论,而不倚重新奇的理论和方法。作为一个著名史学家,他提出的中国现代史学"四大家"观点,对二陈、吕思勉、傅斯年、唯物史观等所作的评论,均有其独到的视角和价值。Link-链接: https://kns.cnki.net/kcms2/article/abstract?v=FFXUSKHsLlY6foEHiVSX7k0IyXE_sDPFBBEzQQyNBqrW8locvd2Idwhv7ck_fEzMS-XRDgqCNS697BcXilo1GZgXh_3MgUf6GJRI_HajqHxOwewdQDtv0qdIk6mTZjMVrSsi9MjgV_CW3X1ny8kpYu0d6n31CN4HtRLqXPYTHaEgC2V1IVYhTA==&uniplatform=NZKPT&language=CHS
史学史研究,
2017-07-18.
周文玖
Abstract:
严耕望是享誉世界的史学家,在中国政治制度史和中国历史地理学方面卓有成就。他以学问为生命,一生以做"坚强纯净的学术人"自守,深受学界尊敬。他的巨大学术成就,体现出鲜明的治学风格。其主要表现:在学术路径上是通过专精以达博通;在研究旨趣上,倾向于做实在具体的研究,不做抽象理论的研究;在资料运用上,主张把研究建立在基本资料上;在研究方法上,主要是通过对史料的考辨、归纳、统计而得出结论,而不倚重新奇的理论和方法。作为一个著名史学家,他提出的中国现代史学"四大家"观点,对二陈、吕思勉、傅斯年、唯物史观等所作的评论,均有其独到的视角和价值。
19.
李翛然
(2026-03-30 23:30):
#paper Learning the All-Atom Equilibrium Distribution of Biomolecular Interactions at Scale doi:10.64898/2026.03.10.710952v1 字节跳动与Anew Therapeutics推出AnewSampling,通过跳过漫长模拟直接预测分子相互作用的平衡态构象。模型基于含1500万数据的数据库,结合AlphaFold3架构,采用LoRA与全参数微调,在多项基准测试中生成结果与分子动力学模拟无统计差异。它能高效处理CDK2激酶等复杂动态过程,甚至超越常规模拟能力,为药物设计提供动态视角。当前局限包括依赖结构模板、集中于蛋白质-配体体系及固定热力学环境。
bioRxiv,
2026-3-13.
DOI: 10.64898/2026.03.10.710952
Abstract:
Abstract
Biomolecular functions are governed by dynamic conformational ensembles rather than static structures. While models like AlphaFold have revolutionized static structure prediction, accurately capturing the equilibrium distribution of all-atom biomolecular interactions remains a significant challenge due to the high computational cost of molecular dynamics (MD). We present AnewSampling, a transferable generative foundation framework designed for the high-fidelity sampling of all-atom equilibrium distributions, which is the first model to faithfully reproduce MD at the all-atom leve… >>>
Biomolecular functions are governed by dynamic conformational ensembles rather than static structures. While models like AlphaFold have revolutionized static structure prediction, accurately capturing the equilibrium distribution of all-atom biomolecular interactions remains a significant challenge due to the high computational cost of molecular dynamics (MD). We present AnewSampling, a transferable generative foundation framework designed for the high-fidelity sampling of all-atom equilibrium distributions, which is the first model to faithfully reproduce MD at the all-atom leve… >>>
Abstract<br> Biomolecular functions are governed by dynamic conformational ensembles rather than static structures. While models like AlphaFold have revolutionized static structure prediction, accurately capturing the equilibrium distribution of all-atom biomolecular interactions remains a significant challenge due to the high computational cost of molecular dynamics (MD). We present AnewSampling, a transferable generative foundation framework designed for the high-fidelity sampling of all-atom equilibrium distributions, which is the first model to faithfully reproduce MD at the all-atom level. It uses a novel quotient-space generative framework to ensure mathematical consistency and leverages the largest self-curated database of protein-ligand trajectories to date, with over 15 million conformations. Statistically, AnewSampling consistently outperforms all prior generative methods on the ATLAS monomer benchmark, and the all-atom capabilities of AnewSampling enable close statistical alignment with ground-truth MD for evaluating atomic biomolecular interactions in protein-ligand dynamics. Furthermore, AnewSampling successfully recovers coupled ligand and side-chain motions in CDK2 systems, overcoming a major sampling hurdle inherent to conventional MD. AnewSampling enables rapid exploration of conformational landscapes prior to intensive simulations, elucidating fundamental biophysical mechanisms and accelerating the broader design of functional biomolecules. <<<
20.
颜林林
(2026-03-30 23:19):
#paper doi:10.1038/s41568-025-00900-0, Nature Reviews Cancer, 2026, Artificial intelligence agents in cancer research and oncology. 这篇综述比较详细地讲解了关于LLM及Agent的很多基本概念和思路,相对地,作为其主题的癌症研究内容,在篇幅上却算不得很多,大概就是为了向这些研究者读者群进行科普,帮助他们了解相应知识,从而更高效地利用到自己的研究中吧。文章中关于生信流程的迭代(正文插图),大体上我是比较认同的,不过从未来的大方向上,我相信并不会完全如其展示的那样、像是在使用一个完全的黑盒,而更可能是一个多维度、多层面的人机深度协作。这也是文章为什么会“担忧”,给AI设定一个降低癌症致死率的目标,让其不断自主探索,有可能会使人们的生活质量等被牺牲,因而要呼吁应关注更多人的元素。
Nature Reviews Cancer,
2026-4.
DOI: 10.1038/s41568-025-00900-0
Daniel Truhn,
Shekoofeh Azizi,
James Zou,
Leonor Cerda-Alberich,
Faisal Mahmood,
Jakob Nikolas Kather