魏魏魏 (2022-07-31 21:37):
#paper doi:10.1186/s12991-020-0260-4 Annals of General Psychiatry, (2020), Association between personal values in adolescence and mental health and well-being in adulthood: a cross-cultural study of working populations in Japan and the United States. 心理健康是心理学家关注的重点领域,在积极心理学兴起后,幸福感也成为研究的焦点,然而这些与个人价值观的关系还很少有人去关注。当前研究采用了跨文化研究的方式,探讨了日本和美国职场成年人的心理健康和幸福感与其青少年时期个人价值观的关系的差异。青少年时期价值观的测量和其他变量不一样,采用了回忆的方式。在统计分析时使用了多群组路径分析的方法。结果发现两种不同文化下上述关系存在差异,比如,信念和挑战与良好的心理健康和幸福感的积极关系只出现美国被试身上,而对经济成功的追求只在日本与较差的心理健康和幸福感相关。这一研究结果对中年人心理健康和幸福感研究有很大启发,然而,这一研究存在一些研究设计和研究方法上的缺陷,比如使用追踪研究应该更好,既能够测量即时的个人价值感,也能够确定自变量与因变量之间的关系。
BACKGROUND: For promoting mental health and well-being of individuals, it is important to investigate its association with personal values. However, in Eastern Asian countries, no study has yet investigated the association between personal values in adolescence and mental health and well-being in adulthood. To fill that research gap, we conducted a cross-sectional study based on two online surveys of working populations in Japan and the United States.METHODS: A total of 516 workers from each of the two countries, aged 30-49 years, completed a questionnaire that measured personal values in adolescence, current psychological distress, health-related quality of life, and subjective well-being (satisfaction and happiness). Personal values were measured by items based on Schwartz's theory of basic values and people's commitment to those ten values. Multiple group path analysis was performed to examine the associations between personal values in adolescence and health-related outcomes, grouped by country.RESULTS: Care, graduating from school, and commitment to values were associated with better mental health and well-being in Japanese participants. Belief and challenging were associated with better mental health and well-being in US participants. On the other hand, financial success was associated with poor mental health and well-being in Japanese participants. Avoiding causing trouble and positive evaluation were associated with poor mental health and well-being in the US participants.CONCLUSIONS: Certain personal values and commitment to those values in adolescence may be associated with mental health and well-being in adulthood. To address the limitations of this study, future studies should use a longitudinal design and investigate the interactions among the types of personal values and commitment to the values. <<<
张贝 (2022-07-31 21:13):
#paper DOI: 10.1038/s41564-021-00993-x, Nat Microbiol 2021, Aspergillus fumigatus pan-genome analysis identifies genetic variants associated with human infection. 烟曲霉是一种环境腐生菌和机会致病真菌,尽管每年在全世界范围引发的侵袭性疾病超过30万例,但人们对烟曲霉的基因组多样性(毒力因子和抗真菌耐药基因多样性)仍然缺乏全面的理解。本文对来自世界各地的300株烟曲霉(83个临床分离株和217个环境分离株)进行泛基因组分析,发现7563个核心基因和3344个非核心基因。利用该环境和临床样本的大型基因组数据集,作者发现临床分离株富集于基因簇5,且基因组还包含更多的编码跨膜转运蛋白和具有铁结合活性的蛋白的基因,以及涉及碳水化合物和氨基酸代谢的基因。最后,作者采用全基因组关联研究分析与临床菌株、唑类耐药性和已知毒力相关基因的遗传变异。本文通过对300株烟曲霉的泛基因组分析,有助于了解其致病机制的多样性,最终实现对该类感染的更好管控。
IF:20.500Q1 Nature microbiology, 2021-12. DOI: 10.1038/s41564-021-00993-x PMID: 34819642
Aspergillus fumigatus is an environmental saprobe and opportunistic human fungal pathogen. Despite an estimated annual occurrence of more than 300,000 cases of invasive disease worldwide, a comprehensive survey of the genomic diversity present in A. fumigatus-including the relationship between clinical and environmental isolates and how this genetic diversity contributes to virulence and antifungal drug resistance-has been lacking. In this study we define the pan-genome of A. fumigatus using a collection of 300 globally sampled genomes (83 clinical and 217 environmental isolates). We found that 7,563 of the 10,907 unique orthogroups (69%) are core and present in all isolates and the remaining 3,344 show presence/absence of variation, representing 16-22% of the genome of each isolate. Using this large genomic dataset of environmental and clinical samples, we found an enrichment for clinical isolates in a genetic cluster whose genomes also contain more accessory genes, including genes coding for transmembrane transporters and proteins with iron-binding activity, and genes involved in both carbohydrate and amino-acid metabolism. Finally, we leverage the power of genome-wide association studies to identify genomic variation associated with clinical isolates and triazole resistance as well as characterize genetic variation in known virulence factors. This characterization of the genomic diversity of A. fumigatus allows us to move away from a single reference genome that does not necessarily represent the species as a whole and better understand its pathogenic versatility, ultimately leading to better management of these infections. <<<
Vincent (2022-07-31 17:30):
#paper doi: 10.1093/bioinformatics/btab083 DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. 由于序列多义性和遥远的语义联系,基因调控编码十分复杂。近年来有研究陆续发现DNA序列,尤其是非编码区序列,在字符表、语法、语义方面的特征都与自然语言相似,而基于transformer注意力机制的机器学习工具BERT在自然语言处理方面大放异彩。这篇文章运用类似的研究思路开发了DNABERT,一个基于上下文序列的、能表征DNA特征的预处理模型。为了展现这个模型的用处和效果,这篇文章尝试了几个经典的计算任务:启动子预测、剪切位点预测和转录因子结合位点的预测,文章先使用该模型去encode DNA 序列,然后再对具体的计算任务fine-tune,发现其在准确度上能够轻松超越其他算法。同时为了解决基于深度学习可解释性差的问题,该方法提供了可视化选项,能展现位点层面的重要性以及与其他位点的联系(attention机制)。同时该工作还发现用人类基因组预训练的模型,运用到其他生物也有很好的效果,进一步展现了这种encoding是可以迁移的(不是memorize,而是真正抓住了一些序列层面特征)
MOTIVATION: Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios.RESULTS: To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks.AVAILABILITY AND IMPLEMENTATION: The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT).SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. <<<
小年 (2022-07-31 16:40):
#paper doi:10.1038/ng.3972 The draft genome of tropical fruit durian (Durio zibethinus) 这是一篇癌症研究所,发表在2017年的Nature Genetics上的关于榴莲(品种是猫山王)基因组组装草图的文章。(不务正业)文章介绍了拼接方法和组装完整性与正确性验证的思路,有一定的参考意义。 文章还用转录组测序,将猫山王、金枕和小猫山王三个品种,与其他水果做了RNA测序,通过富集分析差异比较,发现了榴莲与其他水果气味差异的途径,同时也分析了不同榴莲品种之间味道和气味差异的控制途径,以及连个基因组与转录组分析表明榴莲气味与果实成熟之间存在潜在关联。 看完这个我就在想还有哪些贵的水果还没测基因组的
IF:31.700Q1 Nature genetics, 2017-Nov. DOI: 10.1038/ng.3972 PMID: 28991254
Durian (Durio zibethinus) is a Southeast Asian tropical plant known for its hefty, spine-covered fruit and sulfury and onion-like odor. Here we present a draft genome assembly of D. zibethinus, representing the third plant genus in the Malvales order and first in the Helicteroideae subfamily to be sequenced. Single-molecule sequencing and chromosome contact maps enabled assembly of the highly heterozygous durian genome at chromosome-scale resolution. Transcriptomic analysis showed upregulation of sulfur-, ethylene-, and lipid-related pathways in durian fruits. We observed paleopolyploidization events shared by durian and cotton and durian-specific gene expansions in MGL (methionine γ-lyase), associated with production of volatile sulfur compounds (VSCs). MGL and the ethylene-related gene ACS (aminocyclopropane-1-carboxylic acid synthase) were upregulated in fruits concomitantly with their downstream metabolites (VSCs and ethylene), suggesting a potential association between ethylene biosynthesis and methionine regeneration via the Yang cycle. The durian genome provides a resource for tropical fruit biology and agronomy. <<<
JZY (2022-07-31 16:12):
#paper DOI: 10.1109/TNSRE.2021.3110665 Multi-View Spatial-Temporal Graph Convolutional Networks With Domain Generalization for Sleep Stage Classification 该文被脑机接口的顶刊IEEE TNSRE录用,这项工作提出了一种基于域泛化的多视图时空图卷积神经网络进行睡眠分期,该模型使用域泛化方法有效地解决了受试者差异性问题,在无需目标域数据的情况下提取去个性化的睡眠特征,提高了模型的泛化性;同时充分建模多视图脑网络(脑功能性连接视图和脑空间距离视图)的空间特性。与现有的SOTA模型相比较,达到了最佳的性能。
Sleep stage classification is essential for sleep assessment and disease diagnosis. Although previous attempts to classify sleep stages have achieved high classification performance, several challenges remain open: 1) How to effectively utilize time-varying spatial and temporal features from multi-channel brain signals remains challenging. Prior works have not been able to fully utilize the spatial topological information among brain regions. 2) Due to the many differences found in individual biological signals, how to overcome the differences of subjects and improve the generalization of deep neural networks is important. 3) Most deep learning methods ignore the interpretability of the model to the brain. To address the above challenges, we propose a multi-view spatial-temporal graph convolutional networks (MSTGCN) with domain generalization for sleep stage classification. Specifically, we construct two brain view graphs for MSTGCN based on the functional connectivity and physical distance proximity of the brain regions. The MSTGCN consists of graph convolutions for extracting spatial features and temporal convolutions for capturing the transition rules among sleep stages. In addition, attention mechanism is employed for capturing the most relevant spatial-temporal information for sleep stage classification. Finally, domain generalization and MSTGCN are integrated into a unified framework to extract subject-invariant sleep features. Experiments on two public datasets demonstrate that the proposed model outperforms the state-of-the-art baselines. <<<
符毓 Yu (2022-07-31 15:10):
#paper doi:10.1115/1.2429697 Journal of Mechanical Design, 2007, Review of Metamodeling Techniques in Support of Engineering Design Optimization 本文概述了Metamodeling及其在工程设计优化中的应用。根据设计工程师的需求分为:模型简化、设计边界定义、问题公式化和优化支持。文中还讨论了挑战和未来的发展。可以帮助在这一领域刚刚起步的学习人员,适合作为进一步学习的框架
Computation-intensive design problems are becoming increasingly common in manufacturing industries. The computation burden is often caused by expensive analysis and simulation processes in order to reach a comparable level of accuracy as physical testing data. To address such a challenge, approximation or metamodeling techniques are often used. Metamodeling techniques have been developed from many different disciplines including statistics, mathematics, computer science, and various engineering disciplines. These metamodels are initially developed as “surrogates” of the expensive simulation process in order to improve the overall computation efficiency. They are then found to be a valuable tool to support a wide scope of activities in modern engineering design, especially design optimization. This work reviews the state-of-the-art metamodel-based techniques from a practitioner’s perspective according to the role of metamodeling in supporting design optimization, including model approximation, design space exploration, problem formulation, and solving various types of optimization problems. Challenges and future development of metamodeling in support of engineering design is also analyzed and discussed. <<<
小张(快乐科研版🦭) (2022-07-31 12:37):
#paper Plant Volatiles as Mate-Finding Cues for Insects • DOI: 10.1016/j.tplants.2017.11.004 植物挥发物不只是食草性昆虫用来进行寄主定位取食时发挥作用,还能为昆虫交配提供信息和地点。也能帮助食草性昆虫的天敌定位捕食发挥作用。植物的花、果实、种子都拥有不同的挥发物联合各类信息素吸引昆虫。挥发物刺激昆虫各类信息素的分泌,从而吸引更多同类前来交配。而被为害的植物被诱导产生的挥发物又能够吸引昆虫的天敌。 本篇综述文章分别从植物叶片、果实、花和虫害诱导产生的挥发物与昆虫信息素协同和抑制作用举例阐述了植物-昆虫-天敌三者通过挥发物的种间关系。
Plant volatiles are used not only by herbivorous insects to find their host plants, but also by the natural enemies of the herbivores to find their prey. There is also increasing evidence that plant volatiles, in addition to species-specific pheromones, help these insects to find mating partners. Plant structures such as flowers, fruit, and leaves are frequently rendezvous sites for mate-seeking insects. Here we propose that the combined use of plant volatiles and pheromones can efficiently guide insects to these sites, where they will have access to both mates and food. This notion is supported by the fact that plant volatiles can stimulate the release of sex pheromones and can render various insects more receptive to potential mates. <<<
四叶草 (2022-07-31 10:05):
#paper DOI: 10.1038/s41586-020-2352-3 hair-bearing human skin generated entirely from pluripotent stem cells 2020年发表在Nature上的文章,关于人多能干细胞向皮肤类器官分化,且伴随皮肤附属器结构的形成。文章通过对TGFb,BMP以及FGF通路的控制将干细胞形成的EB通过非神经外胚层逐步分化成为皮肤,再诱发皮肤的自我重排,使皮肤形成多层结构。体外培养3个月可明显看到毛囊的生长,经过裸鼠的体内移植实验进一步验证了类器官可以在体内分层,形成皮脂腺和含有感受器细胞的毛囊,为皮肤发育提供模型,为皮肤移植提供供体。
IF:50.500Q1 Nature, 2020-06. DOI: 10.1038/s41586-020-2352-3 PMID: 32494013
The skin is a multilayered organ, equipped with appendages (that is, follicles and glands), that is critical for regulating body temperature and the retention of bodily fluids, guarding against external stresses and mediating the sensation of touch and pain. Reconstructing appendage-bearing skin in cultures and in bioengineered grafts is a biomedical challenge that has yet to be met. Here we report an organoid culture system that generates complex skin from human pluripotent stem cells. We use stepwise modulation of the transforming growth factor β (TGFβ) and fibroblast growth factor (FGF) signalling pathways to co-induce cranial epithelial cells and neural crest cells within a spherical cell aggregate. During an incubation period of 4-5 months, we observe the emergence of a cyst-like skin organoid composed of stratified epidermis, fat-rich dermis and pigmented hair follicles that are equipped with sebaceous glands. A network of sensory neurons and Schwann cells form nerve-like bundles that target Merkel cells in organoid hair follicles, mimicking the neural circuitry associated with human touch. Single-cell RNA sequencing and direct comparison to fetal specimens suggest that the skin organoids are equivalent to the facial skin of human fetuses in the second trimester of development. Moreover, we show that skin organoids form planar hair-bearing skin when grafted onto nude mice. Together, our results demonstrate that nearly complete skin can self-assemble in vitro and be used to reconstitute skin in vivo. We anticipate that our skin organoids will provide a foundation for future studies of human skin development, disease modelling and reconstructive surgery. <<<
小W (2022-07-31 09:59):
#paper doi:https ://doi.org/10.1016/j.cell.2022.05.013 Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq Perturb-seq 是一种实验方法,通过将基于 CRISPR 的遗传筛选与单细胞 RNA 测序表型相结合,绘制遗传扰动的转录效应。本文使用 CRISPRi 靶向慢性髓性白血病细胞(K562)中的所有表达基因和视网膜色素上皮细胞(RPE1)中的所有 DepMap(癌症依赖关系分析数据库) 必需基因,基于其CRISPRi基因-RNA表型的内在可解释性,将基因与它在细胞中的作用联系起来。阐述了Perturb-seq基因组筛选在以下方向的应用:1.预测引起转录表型的遗传扰动特征;2.从转录表型注释基因功能;3.复合表型假设驱动研究;4.线粒体基因组应激特异性调控。本文是使用Perturb-seq 技术对每个基因的遗传扰动分析,其测序数据以及表达(和差异分析)数据、sgRNA库(未找到)已公布,主要实验方法和分析脚本参照另一篇论文 "Scalable single-cell CRISPR screens by direct guide RNA capture and targeted library enrichment, Nature Biotechnology 2020"。
IF:45.500Q1 Cell, 2022-07-07. DOI: 10.1016/j.cell.2022.05.013 PMID: 35688146
A central goal of genetics is to define the relationships between genotypes and phenotypes. High-content phenotypic screens such as Perturb-seq (CRISPR-based screens with single-cell RNA-sequencing readouts) enable massively parallel functional genomic mapping but, to date, have been used at limited scales. Here, we perform genome-scale Perturb-seq targeting all expressed genes with CRISPR interference (CRISPRi) across >2.5 million human cells. We use transcriptional phenotypes to predict the function of poorly characterized genes, uncovering new regulators of ribosome biogenesis (including CCDC86, ZNF236, and SPATA5L1), transcription (C7orf26), and mitochondrial respiration (TMEM242). In addition to assigning gene function, single-cell transcriptional phenotypes allow for in-depth dissection of complex cellular phenomena-from RNA processing to differentiation. We leverage this ability to systematically identify genetic drivers and consequences of aneuploidy and to discover an unanticipated layer of stress-specific regulation of the mitochondrial genome. Our information-rich genotype-phenotype map reveals a multidimensional portrait of gene and cellular function. <<<
笑对人生 (2022-07-31 09:15):
#paper Phasing analysis of lung cancer genomes using a long read sequencer. Nat Commun. 2022 Jun 16;13(1):3464. doi: 10.1038/s41467-022-31133-6 背景知识:SNV(单核苷酸位点变异,single nucleotide variant)是指基因组上发生单碱基改变的位点。SNP(单核苷酸多态性,single nucleotide polymorphism)是指基因组上由单个核苷酸变异引起的DNA序列多态性。SNP描述的是个体基因组上发生碱基改变,而SNP更倾向于是一种群体属性。更加易懂的英文:A haplotype is a physical grouping of genomic variants (or polymorphisms) that tend to be inherited together. A specific haplotype typically reflects a unique combination of variants that reside near each other on a chromosome. 单倍型(Haplotype)是指位于一条染色体上某个区域,具有一定相关联等位变异位点的组合。一种组合就代表一种单倍型。对单倍型进行分型,判断变异是否来自同一条染色体的过程称为phasing(又称haplotype estimation)。这里提到的分型或变异,常常是经过比较后得出的结果,在群体遗传学中,这种比较可能是某个个体与群体其他人的比较,或子代和亲本之间的比较,讲述的是进化或变异的结果(自己理解)。等位基因(allele,又称allelomorph)一般指位于一对同源染色体(一条来自父本,一条来自母本)的相同位置上控制相同性状不同状态的一对基因。英文解释:Any one of a series of two or more different genes that may occupy the same locus on a specific chromosome;An allele is a variant form of a gene. 目前的二代或三代测序,测到的reads是来自同一条染色体,因此无法区分某一条序列来自父源还是母源。不过,相对于二代,三代测序可凭借长读长优势,能覆盖大部分相邻的单核苷酸多态性位点,实现更为准确的单倍型分型。另外,三代测序可精准检测拷贝数变异(copy number variant,CNV),以及在进行对序列进行定相的同时,携带甲基化等碱基修饰信息。 研究目的:既往对肿瘤内SNVs、indels和CNV的检测大多是基于二代测序。然而,二代测序因短读长的特点,无法对基因组上高GC、重复序列区域以及染色体大片段变异进行准确识别。因此,利用三代测序技术超长读长的优势,将有助于更加全面地揭示肿瘤内发生的变异事件。最近公布的ONT-Q20+测序技术,可实现>99%的原始reads(单链)准确度,或约Q30的双链(Duplex)准确度。本研究的研究目的就是利用ONT的nanopore技术对非小细胞肺癌进行组织和细胞层面的定相分析、拷贝数变异和染色体碎裂等研究。 研究方法:对来自20名非小细胞肺癌患者的正常组织同时进行二代和三代的全基因组测序,对肿瘤组织只进行三代测序。另外,利用测序中甲基化信号和基于ONT平台的全长转录组测序探究变异与表型的关系。为了进一步探究肿瘤细胞的克隆结构,还对2例样本完成了基于ONT平台的scDNAseq。 研究结果:本研究通过利用二代测序对三代测序数据进行校正,在N50长度超过834 kb定相区块中,实现与公开二代测序的WGS数据库一致性接近99%的SNVs检测。结合甲基化数据和全长转录组测序,仅在两个样本中发现定相区块的变异(单倍型变异)与甲基化修饰和转录调控存在相关。另外,对染色体大片段变异进行分析,发现EGFR突变阳性肺腺癌肿瘤组织存在特有的染色体碎裂事件,揭示了EGFR通路的异常可能会影响端粒酶活性。
IF:14.700Q1 Nature communications, 2022-06-16. DOI: 10.1038/s41467-022-31133-6 PMID: 35710642
Chromosomal backgrounds of cancerous mutations still remain elusive. Here, we conduct the phasing analysis of non-small cell lung cancer specimens of 20 Japanese patients. By the combinatory use of short and long read sequencing data, we obtain long phased blocks of 834 kb in N50 length with >99% concordance rate. By analyzing the obtained phasing information, we reveal that several cancer genomes harbor regions in which mutations are unevenly distributed to either of two haplotypes. Large-scale chromosomal rearrangement events, which resemble chromothripsis events but have smaller scales, occur on only one chromosome, and these events account for the observed biased distributions. Interestingly, the events are characteristic of EGFR mutation-positive lung adenocarcinomas. Further integration of long read epigenomic and transcriptomic data reveal that haploid chromosomes are not always at equivalent transcriptomic/epigenomic conditions. Distinct chromosomal backgrounds are responsible for later cancerous aberrations in a haplotype-specific manner. <<<
翁凯 (2022-07-31 08:58):
#paper doi:10.1016/j.ccell.2015.09.018 Cancer Cell, 2015, RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics. 发现血小板携带的mRNA可以预测(准确率96%)是否患癌,并且可以进一步预测(准确率74%)原发组织。
IF:48.800Q1 Cancer cell, 2015-Nov-09. DOI: 10.1016/j.ccell.2015.09.018 PMID: 26525104 PMCID:PMC4644263
Tumor-educated blood platelets (TEPs) are implicated as central players in the systemic and local responses to tumor growth, thereby altering their RNA profile. We determined the diagnostic potential of TEPs by mRNA sequencing of 283 platelet samples. We distinguished 228 patients with localized and metastasized tumors from 55 healthy individuals with 96% accuracy. Across six different tumor types, the location of the primary tumor was correctly identified with 71% accuracy. Also, MET or HER2-positive, and mutant KRAS, EGFR, or PIK3CA tumors were accurately distinguished using surrogate TEP mRNA profiles. Our results indicate that blood platelets provide a valuable platform for pan-cancer, multiclass cancer, and companion diagnostics, possibly enabling clinical advances in blood-based "liquid biopsies". <<<
颜林林 (2022-07-31 07:26):
#paper doi:10.1016/j.ccell.2022.07.003 Cancer Cell, 2022, Dark genome, bright ideas: Recent approaches to harness transposable elements in immunotherapies. 占比达到近一半人类基因组的转座元件(transposable element,TE)是个需要继续深入研究的存在。这篇评论文章,快速综述了有关TE与免疫之间的关系,如TE具备的免疫原性,它能激活 DNA 或 RNA 的传感器,也能引发免疫系统反应,从而可能形成新的免疫治疗方法。本文相继描述了 TE 表达对抗肿瘤免疫的影响,以及如何通过介导 TE 表达、介导 TE 免疫原性、辅助 CAR-T 细胞等方式,来实现对肿瘤开展免疫治疗。补充点个人想法:在 DNA 水平上研究各类重复片段,一直是相当困难的,这也是这些序列区间通常被称为“dark genome”(暗黑基因组)的原因;这种困难类似于想要通过地面的投影去反推空中漂浮的大量物件,许多物件的投影彼此重叠而无法区分;而所幸新技术让我们能从长读长、多组学等角度,开始一层层剥开迷雾。
IF:48.800Q1 Cancer cell, 2022-08-08. DOI: 10.1016/j.ccell.2022.07.003 PMID: 35907399
Transposable elements (TEs), which make up almost half of the human genome, often display altered expression in cancers. Here, we review recent progress in elucidating the role of TEs as mediators of immune responses in cancer and discuss how novel therapeutic strategies can harness TE immunogenicity for cancer immunotherapy. <<<
尹志 (2022-07-30 22:41):
#paper https://doi.org/10.48550/arXiv.2205.01529 Masked Generative Distillation ECCV 2022. 这是一篇知识蒸馏的文章,通过类似对比学习的方式去生成特征,从而实现蒸馏。我们知道,知识蒸馏作为一个通用的技巧,已经被用于各类 机器学习任务,在视觉上比如分类、分割、检测等。一般来说蒸馏算法通过使得学生模仿老师特征去提高学生特征的表征能力。但这篇文章提出,学生不用去模仿老师的特征了,干脆自己生成特征好了,即通过对学生特征进行随机遮盖,然后用学生的部分特征去生成老师特征。这样学生特征就具有了较强的表征能力。这个想法很有意思,我打个比方(可能不太合适),就像本来是要学习老师的一举一动,但是现在这个老师不太出现,你不方便直接模仿,那就学生自己通过监督,去盲猜老师的特征什么样的,这样多猜几次,每次都能猜准的时候,说明对这位老师已经很熟悉了,然后说明学生的表征能力就比较强了。通过这个方式,作者在图像分类、目标检测、语义分割、实例分割等多种任务上,在不同数据集不同model的基础上,做了大量实验,发现性能都得到了提升(基本上都有2-3个点的提升,具体数值见文献)。
Knowledge distillation has been applied to various tasks successfully. The current distillation algorithm usually improves students' performance by imitating the output of the teacher. This paper shows that teachers can also improve students' representation power by guiding students' feature recovery. From this point of view, we propose Masked Generative Distillation (MGD), which is simple: we mask random pixels of the student's feature and force it to generate the teacher's full feature through a simple block. MGD is a truly general feature-based distillation method, which can be utilized on various tasks, including image classification, object detection, semantic segmentation and instance segmentation. We experiment on different models with extensive datasets and the results show that all the students achieve excellent improvements. Notably, we boost ResNet-18 from 69.90% to 71.69% ImageNet top-1 accuracy, RetinaNet with ResNet-50 backbone from 37.4 to 41.0 Boundingbox mAP, SOLO based on ResNet-50 from 33.1 to 36.2 Mask mAP and DeepLabV3 based on ResNet-18 from 73.20 to 76.02 mIoU. Our codes are available at this https URL. <<<
哪有情可长 (2022-07-30 21:34):
#paper doi: 10.1126/science.abl7392 Gametophyte genome activation occurs at pollen mitosis I in maize. 孢子体经过减数分裂成单倍体的孢子,然后经细胞增殖和分化,形成配子体。配子体世代的主要功能是形成单倍体配子,而精、卵细胞的融合又产生了新的孢子体,从而完成了一个生活周期。母体基因控制着植物受精后大多数早期事件,随后是母体到合子的转变,这个过程中母体产物的降解与合子基因组的激活相协调。本研究对玉米减数分裂开始到花粉脱落的26天内单个玉米花粉前体细胞和籽粒RNA含量进行测序,发现,花粉发育到一半的过程中,花粉粒的单倍体基因组从亲本的二倍体基因组中夺取控制权,随着孢子体到配子体的转变,为下一代的生长发育奠定了基础。
Flowering plants alternate between multicellular haploid (gametophyte) and diploid (sporophyte) generations. Pollen actively transcribes its haploid genome, providing phenotypic diversity even among pollen grains from a single plant. In this study, we used allele-specific RNA sequencing of single pollen precursors to follow the shift to haploid expression in maize pollen. We observed widespread biallelic expression for 11 days after meiosis, indicating that transcripts synthesized by the diploid sporophyte persist long into the haploid phase. Subsequently, there was a rapid and global conversion to monoallelic expression at pollen mitosis I, driven by active new transcription from the haploid genome. Genes showed evidence of increased purifying selection if they were expressed after (but not before) pollen mitosis I. This work establishes the timing during which haploid selection may act in pollen. <<<
张浩彬 (2022-07-30 17:14):
#paper doi:10.1287/ijoc.2021.1147,Improving Sales Forecasting Accuracy: A Tensor Factorization Approach with Demand Awareness 针对的是多个商店的多商品销售预测问题,借鉴于协同过滤思想,把数据看作高维张量,对张量进行分解,从而实现更好提取相关信息及上下文关系,并对分解后的特征接入时间序列框架SARIMA 及LSTM,实现了比传统方法更好的效果。
Because of the accessibility of big data collections from consumers, products, and stores, advanced sales forecasting capabilities have drawn great attention from many businesses, especially those in retail, because of the importance of forecasting in decision making. Improvement of forecasting accuracy, even by a small percentage, may have a substantial impact on companies’ production and financial planning, marketing strategies, inventory controls, and supply chain management. Specifically, our research goal is to forecast the sales of each product in each store in the near future. Motivated by tensor factorization methodologies for context-aware recommender systems, we propose a novel approach called the advanced temporal latent factor approach to sales forecasting, or ATLAS for short, which achieves accurate and individualized predictions for sales by building a single tensor factorization model across multiple stores and products. Our contribution is a combination of a tensor framework (to leverage information across stores and products), a new regularization function (to incorporate demand dynamics), and extrapolation of the tensor into future time periods using state-of-the-art statistical (seasonal autoregressive integrated moving-average models) and machine-learning (recurrent neural networks) models. The advantages of ATLAS are demonstrated on eight product category data sets collected by Information Resources, Inc., where we analyze a total of 165 million weekly sales transactions of over 15,560 products from more than 1,500 grocery stores. Summary of Contribution: Sales forecasting has been a task of long-standing importance. Accurate sales forecasting provides critical managerial implications for companies’ decision making and operations. Improvement of forecasting accuracy may have a substantial impact on companies’ production planning, marketing strategies, inventory controls, and supply chain management, among other things. This paper proposes a novel computational (machine-learning-based) approach to sales forecasting and thus is positioned directly at the intersection of computing and business/operations research. <<<
颜林林 (2022-07-30 01:17):
#paper doi:10.15252/msb.202211017 Molecular Systems Biology, 2022, Computational estimation of quality and clinical relevance of cancer cell lines. 这是一篇关于肿瘤细胞系的综述,主要考察公开并被广泛使用的各肿瘤细胞系的质量。文章首先概述了当前不同癌种的细胞系公共资源,包括相应的多组学数据。接着,介绍可能对细胞系质量产生影响的因素,如交叉污染、传代过程中的突变积累、缺少微环境因素、分子和细胞状态等层面的异质性等。然后,针对这些问题,可以如何进行评估,综述了相应的不同计算方法(含工具)。最后,在讨论部分,展望未来的改进方向,诸如多组学整合、迁移学习的引入、单细胞数据的使用、可解释性的提高等。细胞系是肿瘤研究的重要体系,本文对其相应的资源选择和分析评估方法,都系统性地提供了汇总信息。
Immortal cancer cell lines (CCLs) are the most widely used system for investigating cancer biology and for the preclinical development of oncology therapies. Pharmacogenomic and genome-wide editing screenings have facilitated the discovery of clinically relevant gene-drug interactions and novel therapeutic targets via large panels of extensively characterised CCLs. However, tailoring pharmacological strategies in a precision medicine context requires bridging the existing gaps between tumours and in vitro models. Indeed, intrinsic limitations of CCLs such as misidentification, the absence of tumour microenvironment and genetic drift have highlighted the need to identify the most faithful CCLs for each primary tumour while addressing their heterogeneity, with the development of new models where necessary. Here, we discuss the most significant limitations of CCLs in representing patient features, and we review computational methods aiming at systematically evaluating the suitability of CCLs as tumour proxies and identifying the best patient representative in vitro models. Additionally, we provide an overview of the applications of these methods to more complex models and discuss future machine-learning-based directions that could resolve some of the arising discrepancies. <<<
洪媛媛 (2022-07-29 14:23):
#paper https://doi.org/10.1038/s41467-022-31765-8 Nat Commun 13, 4248 (2022). Accurate somatic variant detection using weakly supervised deep learning。肿瘤体细胞突变的calling一般使用统计学方法结合过滤条件来确定。这篇文章使用一种命名为“VarNet" 的深度学习方法,利用配对的肿瘤和正常DNA数据来确定体细胞突变。VarNet利用已知突变和非突变答案的肿瘤DNA和它配对正常DNA序列信息,将每个位点的base, base quality, mapping quality, strand bias 和 the reference base信息形成多维矩阵来训练模型,预测每个位置存在突变的概率。接着又在4套publicly available benchmark datasets比较VarNet和另外4种已发表方法,calling突变的Precision和recall能力,证明VarNet优于现有的4种方法。
IF:14.700Q1 Nature communications, 2022-07-22. DOI: 10.1038/s41467-022-31765-8 PMID: 35869060
Identification of somatic mutations in tumor samples is commonly based on statistical methods in combination with heuristic filters. Here we develop VarNet, an end-to-end deep learning approach for identification of somatic variants from aligned tumor and matched normal DNA reads. VarNet is trained using image representations of 4.6 million high-confidence somatic variants annotated in 356 tumor whole genomes. We benchmark VarNet across a range of publicly available datasets, demonstrating performance often exceeding current state-of-the-art methods. Overall, our results demonstrate how a scalable deep learning approach could augment and potentially supplant human engineered features and heuristic filters in somatic variant calling. <<<
沈么是快乐星球 (2022-07-29 08:53):
#paper doi:10.1038/s41467-020-19681-1 Nature Communications, 2020, Genome-enabled discovery of anthraquinone biosynthesis in Senna tora.决明作为一种中草药,主要活性物质为其大量蒽醌,蒽醌主要存在于种子中。本文通过全基因组测序,比较基因组学分析发现决明中CHS基因家族的特异快速扩展的特征,且集中分布在染色体7上;通过不同发育时期种子的代谢物测定与转录组测定,筛选出3个候选基因,根据表达模式,进化关系与基因结构确定一个候选基因,并选择亲缘关系较远的另一个CHS基因家族为阴性对照;最后通过体外酶学反应进行验证(候选基因表达蛋白、失活候选基因表达蛋白、阴性对照蛋白,仅候选基因蛋白催化底物生成下游产物)。思路简单明了,在筛选候选基因时,使用了基因表达模式与代谢物表达模式相似的基因簇为基础,并构建了“代谢库”,分析其主要富集的代谢通路。在进行酶学反应时,因涉及到大部分的代谢知识,还并未详细研究。
IF:14.700Q1 Nature communications, 2020-11-18. DOI: 10.1038/s41467-020-19681-1 PMID: 33208749
Senna tora is a widely used medicinal plant. Its health benefits have been attributed to the large quantity of anthraquinones, but how they are made in plants remains a mystery. To identify the genes responsible for plant anthraquinone biosynthesis, we reveal the genome sequence of S. tora at the chromosome level with 526 Mb (96%) assembled into 13 chromosomes. Comparison among related plant species shows that a chalcone synthase-like (CHS-L) gene family has lineage-specifically and rapidly expanded in S. tora. Combining genomics, transcriptomics, metabolomics, and biochemistry, we identify a CHS-L gene contributing to the biosynthesis of anthraquinones. The S. tora reference genome will accelerate the discovery of biologically active anthraquinone biosynthesis pathways in medicinal plants. <<<
颜林林 (2022-07-29 08:21):
#paper doi:10.1093/nar/gkac586 Nucleic Acid Research, 2022, De novo assembly of human genome at single-cell levels. 作者之前开发的一项名为 SMOOTH-seq 的技术,大致原理是:用 Tn5 转座子插入基因组DNA,使其随机片段化,然后用带有 barcode 的引物对片段进行链置换和扩增,再将双链末端分别连入一段序列以成环,进行滚环扩增,得到可供长读长测序的长片段,该长片段上带有多份原始序列片段,因而可以准确校正序列碱基。本文在此基础上进行了改进,使用 PacBio HiFi 和 Oxford Nanopore Technologies(ONT)两种测序平台,对 K562 和 HG002 两个细胞系进行单细胞测序。首次在单细胞水平上完成了具有高连续性的人类基因组组装。其结果包括:95 个 K562 细胞,总测序深度约37x(如果没理解错,应该每个细胞的测序深度为 37/95 = 0.4 x),NG50 约 2 Mb;30 个 HG002 细胞,每个细胞的测序深度约为 1G(相当于是 0.33x),NG50 约 1.3 Mb。按文章摘要的说法“开启了单细胞基因组从头组装实践的新篇章”。这个主题看似创新度很高,仔细推敲却不禁有些疑问:单细胞基因组测序很难区分不同类群细胞,因而应该只能在单细胞水平上分别进行组装,否则大量不同类群细胞混合起来组装,则又失去了原本的立意。但是,单个细胞的基因组覆盖度是不可能很全面的(文章提到平均覆盖率约是 41.7%,我猜提升测序数据量也未必对此会有大幅改善),这又很大程度上会限制组装本身,因而最终只能关注其中的结构变异鉴定结果。此外,单细胞基因组结果其实很难验证,很难用其他细胞的结果来评判当前被测细胞的结果是否准确,这应该也是一个逻辑上的硬伤。所以,最终这篇文章的贡献,除了两个细胞系的单细胞基因组测序数据本身外,大概主要还是在于实验方法摸索优化和技术方法建立吧,当然其数据分析方法过程也是值得参考的。
IF:16.600Q1 Nucleic acids research, 2022-07-22. DOI: 10.1093/nar/gkac586 PMID: 35819189 PMCID:PMC9303314
Genome assembly has been benefited from long-read sequencing technologies with higher accuracy and higher continuity. However, most human genome assembly require large amount of DNAs from homogeneous cell lines without keeping cell heterogeneities, since cell heterogeneity could profoundly affect haplotype assembly results. Herein, using single-cell genome long-read sequencing technology (SMOOTH-seq), we have sequenced K562 and HG002 cells on PacBio HiFi and Oxford Nanopore Technologies (ONT) platforms and conducted de novo genome assembly. For the first time, we have completed the human genome assembly with high continuity (with NG50 of ∼2 Mb using 95 individual K562 cells) at single-cell levels, and explored the impact of different assemblers and sequencing strategies on genome assembly. With sequencing data from 30 diploid individual HG002 cells of relatively high genome coverage (average coverage ∼41.7%) on ONT platform, the NG50 can reach over 1.3 Mb. Furthermore, with the assembled genome from K562 single-cell dataset, more complete and accurate set of insertion events and complex structural variations could be identified. This study opened a new chapter on the practice of single-cell genome de novo assembly. <<<
李翛然 (2022-07-28 13:15):
#paper DOI: 10.1126/science.aba2374 Preventing Engrailed-1 activation in Mbrob- lasts yields wound regeneration without scarring 2021年4月份发表在Science的皮肤损伤修复靶点很有意思,号称不留伤疤,目前发现一个老药作用在这个靶点,没有其它药物进入临床,但是有其它抑制剂。那个老药Verteporfin是通过激光照射眼睛治疗眼部血管破裂的,不知道用于皮肤损伤新靶点的疗效会受制于老药靶点(假定靶点不同)。这个靶点已经成功地引起了我司注意
Skin scarring, the end result of adult wound healing, is detrimental to tissue form and function. lineage-positive fibroblasts (EPFs) are known to function in scarring, but lineage-negative fibroblasts (ENFs) remain poorly characterized. Using cell transplantation and transgenic mouse models, we identified a dermal ENF subpopulation that gives rise to postnatally derived EPFs by activating expression during adult wound healing. By studying ENF responses to substrate mechanics, we found that mechanical tension drives activation via canonical mechanotransduction signaling. Finally, we showed that blocking mechanotransduction signaling with either verteporfin, an inhibitor of Yes-associated protein (YAP), or fibroblast-specific transgenic YAP knockout prevents activation and promotes wound regeneration by ENFs, with recovery of skin appendages, ultrastructure, and mechanical strength. This finding suggests that there are two possible outcomes to postnatal wound healing: a fibrotic response (EPF-mediated) and a regenerative response (ENF-mediated). <<<