来自杂志 BMC bioinformatics 的文献。
当前共找到 5 篇文献分享。
1.
颜林林 (2022-10-02 15:26):
#paper doi:10.1186/s12859-022-04948-9 BMC Bioinformatics, 2022, Visualizing the knowledge structure and evolution of bioinformatics. 这篇文章用了一些生物信息学中常用的数据分析方法和可视化方法,来研究生物信息学学科本身。对过去几十年所发表的相关论文摘要文本的分析,展示了一些研究模式变迁过程(如从纯理论性的模型计算到堆机器学习模型)以及相应的知识结构的变化过程。思路上很新颖,正文中以UMAP点图展示知识结构的方式也很有创意。
IF:2.900Q1 BMC bioinformatics, 2022-Sep-30. DOI: 10.1186/s12859-022-04948-9 PMID: 36180852
Abstract:
BACKGROUND: Bioinformatics has gained much attention as a fast growing interdisciplinary field. Several attempts have been conducted to explore the field of bioinformatics by bibliometric analysis, however, such works did … >>>
BACKGROUND: Bioinformatics has gained much attention as a fast growing interdisciplinary field. Several attempts have been conducted to explore the field of bioinformatics by bibliometric analysis, however, such works did not elucidate the role of visualization in analysis, nor focus on the relationship between sub-topics of bioinformatics.RESULTS: First, the hotspot of bioinformatics has moderately shifted from traditional molecular biology to omics research, and the computational method has also shifted from mathematical model to data mining and machine learning. Second, DNA-related topics are bridge topics in bioinformatics research. These topics gradually connect various sub-topics that are relatively independent at first. Third, only a small part of topics we have obtained involves a number of computational methods, and the other topics focus more on biological aspects. Fourth, the proportion of computing-related topics hit a trough in the 1980s. During this period, the use of traditional calculation methods such as mathematical model declined in a large proportion while the new calculation methods such as machine learning have not been applied in a large scale. This proportion began to increase gradually after the 1990s. Fifth, although the proportion of computing-related topics is only slightly higher than the original, the connection between other topics and computing-related topics has become closer, which means the support of computational methods is becoming increasingly important for the research of bioinformatics.CONCLUSIONS: The results of our analysis imply that research on bioinformatics is becoming more diversified and the ranking of computational methods in bioinformatics research is also gradually improving. <<<
翻译
2.
颜林林 (2022-08-18 00:34):
#paper doi:10.1186/s12859-022-04876-8 BMC Bioinformatics, 2022, IMSE: interaction information attention and molecular structure based drug drug interaction extraction. 让机器自动读取大量论文,并从中提炼有用信息,是很多人的梦想,BERT等模型让这件事逐步成为现实。本文便是基于PubMed摘要和PMC全文,进行BioBERT预训练,并由此改进DDIExtraction 2013的任务执行性能,该任务旨在从生物医学领域的自由文本中提取药物间相互作用(drug-drug interaction, DDI)。关于这项任务已有不少研究工作,本文引入了交互注意力向量(interaction attention vector),以及加入药物分子结构(以利用其特征空间信息)等,来改善模型性能及可解释性,取得不错的效果。
IF:2.900Q1 BMC bioinformatics, 2022-Aug-14. DOI: 10.1186/s12859-022-04876-8 PMID: 35965308
Abstract:
BACKGROUND: Extraction of drug drug interactions from biomedical literature and other textual data is an important component to monitor drug-safety and this has attracted attention of many researchers in healthcare. … >>>
BACKGROUND: Extraction of drug drug interactions from biomedical literature and other textual data is an important component to monitor drug-safety and this has attracted attention of many researchers in healthcare. Existing works are more pivoted around relation extraction using bidirectional long short-term memory networks (BiLSTM) and BERT model which does not attain the best feature representations.RESULTS: Our proposed DDI (drug drug interaction) prediction model provides multiple advantages: (1) The newly proposed attention vector is added to better deal with the problem of overlapping relations, (2) The molecular structure information of drugs is integrated into the model to better express the functional group structure of drugs, (3) We also added text features that combined the T-distribution and chi-square distribution to make the model more focused on drug entities and (4) it achieves similar or better prediction performance (F-scores up to 85.16%) compared to state-of-the-art DDI models when tested on benchmark datasets.CONCLUSIONS: Our model that leverages state of the art transformer architecture in conjunction with multiple features can bolster the performances of drug drug interation tasks in the biomedical domain. In particular, we believe our research would be helpful in identification of potential adverse drug reactions. <<<
翻译
3.
颜林林 (2022-07-02 00:24):
#paper doi:10.1186/s12859-022-04798-5 BMC Bioinformatics, 2022, DeepPN: a deep parallel neural network based on convolutional neural network and graph convolutional network for predicting RNA-protein binding sites. 识别RNA与蛋白的结合位点(RBP),是研究基因调控的重要内容。传统采用免疫沉淀等方法进行高通量的筛选和测定,但实验方法存在诸多局限,故人们尝试开发了许多计算工具来预测RBP,这其中大多为根据序列和结构信息进行数学计算的方法。深度学习技术,由于能够自动根据数据学习到重要且复杂的隐藏特征,因此也逐步被应用到这个问题上来。本文的研究,在考虑深度学习技术时,采用了图卷积网络(GCN)中的ChebNet。该方法过去多被用于光谱数据,且近年的研究在fMRI、图像语义分割等领域也都取得不错效果。于是本文基于CNN和ChebNet搭建了一个名为DeepPN的并行深度神经网络,并在24个真实数据集上进行测试,效果优于其他同类方法。推测可能是由于本文方法利用了统计频率来补充特征,因此取得了更好的性能。
IF:2.900Q1 BMC bioinformatics, 2022-Jun-29. DOI: 10.1186/s12859-022-04798-5 PMID: 35768792
Abstract:
BACKGROUND: Addressing the laborious nature of traditional biological experiments by using an efficient computational approach to analyze RNA-binding proteins (RBPs) binding sites has always been a challenging task. RBPs play … >>>
BACKGROUND: Addressing the laborious nature of traditional biological experiments by using an efficient computational approach to analyze RNA-binding proteins (RBPs) binding sites has always been a challenging task. RBPs play a vital role in post-transcriptional control. Identification of RBPs binding sites is a key step for the anatomy of the essential mechanism of gene regulation by controlling splicing, stability, localization and translation. Traditional methods for detecting RBPs binding sites are time-consuming and computationally-intensive. Recently, the computational method has been incorporated in researches of RBPs. Nevertheless, lots of them not only rely on the sequence data of RNA but also need additional data, for example the secondary structural data of RNA, to improve the performance of prediction, which needs the pre-work to prepare the learnable representation of structural data.RESULTS: To reduce the dependency of those pre-work, in this paper, we introduce DeepPN, a deep parallel neural network that is constructed with a convolutional neural network (CNN) and graph convolutional network (GCN) for detecting RBPs binding sites. It includes a two-layer CNN and GCN in parallel to extract the hidden features, followed by a fully connected layer to make the prediction. DeepPN discriminates the RBP binding sites on learnable representation of RNA sequences, which only uses the sequence data without using other data, for example the secondary or tertiary structure data of RNA. DeepPN is evaluated on 24 datasets of RBPs binding sites with other state-of-the-art methods. The results show that the performance of DeepPN is comparable to the published methods.CONCLUSION: The experimental results show that DeepPN can effectively capture potential hidden features in RBPs and use these features for effective prediction of binding sites. <<<
翻译
4.
颜林林 (2022-06-23 07:02):
#paper doi:10.1186/s12859-022-04768-x BMC Bioinformatics, 2022, Using BERT to identify drug-target interactions from whole PubMed. 这篇文章通过使用自然语言处理技术中BERT模型,批量分析了PubMed和PMC的全数据库,从文章中识别出药物和蛋白质信息,并提取药物-靶点相互作用(DTI)数据,包括对应所使用的实验方法类别等重要信息。通过本文的方法,新识别出的60万篇文章,都未被公共DTI数据库所包含。通过人工抽查审核和较差验证的方法,确认了该方法的准确度(99%以上)。通常这类数据的文献挖掘和整理,都依赖于人工,在效率上存在很大局限。诸如本文的人工智能方法,将为药物发现和重定位、加快药物开发等提供帮助。
IF:2.900Q1 BMC bioinformatics, 2022-Jun-21. DOI: 10.1186/s12859-022-04768-x PMID: 35729494
Abstract:
BACKGROUND: Drug-target interactions (DTIs) are critical for drug repurposing and elucidation of drug mechanisms, and are manually curated by large databases, such as ChEMBL, BindingDB, DrugBank and DrugTargetCommons. However, the … >>>
BACKGROUND: Drug-target interactions (DTIs) are critical for drug repurposing and elucidation of drug mechanisms, and are manually curated by large databases, such as ChEMBL, BindingDB, DrugBank and DrugTargetCommons. However, the number of curated articles likely constitutes only a fraction of all the articles that contain experimentally determined DTIs. Finding such articles and extracting the experimental information is a challenging task, and there is a pressing need for systematic approaches to assist the curation of DTIs. To this end, we applied Bidirectional Encoder Representations from Transformers (BERT) to identify such articles. Because DTI data intimately depends on the type of assays used to generate it, we also aimed to incorporate functions to predict the assay format.RESULTS: Our novel method identified 0.6 million articles (along with drug and protein information) which are not previously included in public DTI databases. Using 10-fold cross-validation, we obtained ~ 99% accuracy for identifying articles containing quantitative drug-target profiles. The F1 micro for the prediction of assay format is 88%, which leaves room for improvement in future studies.CONCLUSION: The BERT model in this study is robust and the proposed pipeline can be used to identify previously overlooked articles containing quantitative DTIs. Overall, our method provides a significant advancement in machine-assisted DTI extraction and curation. We expect it to be a useful addition to drug mechanism discovery and repurposing. <<<
翻译
5.
颜林林 (2022-06-15 06:27):
#paper doi:10.1186/s12859-022-04783-y BMC Bioinformatics, 2022, CancerNet: a unified deep learning network for pan-cancer diagnostics. 这篇文章建立了一个通用的深度神经网络模型,基于来自TCGA的33种癌症的甲基化数据,检测癌症及其起源组织。同样的任务在2022年已有相应工作,能够达到96%的总体准确率。本文则通过同时使用无监督与有监督的方法,让模型在输出34个分类结果(33个癌种+1个正常非癌)的同时,也额外生成一组重新构造的CpG岛甲基化信息,并将生成的此信息,与用于模型输入的CpG到甲基化信息进行比对,损失函数中同时纳入了该比对差异。通过这种方式,模型整体性能得到进一步提高,总体准确率达到99.6%。此外,本文也同时考察了年龄、转移等混杂因素对模型的影响,并为未来研究和开发模型的可解释性提供了基础。整个研究基于OSF(开放科学框架)进行,数据和源代码都完全开放,是一份不错的学习材料。
IF:2.900Q1 BMC bioinformatics, 2022-Jun-13. DOI: 10.1186/s12859-022-04783-y PMID: 35698059
Abstract:
BACKGROUND: Despite remarkable advances in cancer research, cancer remains one of the leading causes of death worldwide. Early detection of cancer and localization of the tissue of its origin are … >>>
BACKGROUND: Despite remarkable advances in cancer research, cancer remains one of the leading causes of death worldwide. Early detection of cancer and localization of the tissue of its origin are key to effective treatment. Here, we leverage technological advances in machine learning or artificial intelligence to design a novel framework for cancer diagnostics. Our proposed framework detects cancers and their tissues of origin using a unified model of cancers encompassing 33 cancers represented in The Cancer Genome Atlas (TCGA). Our model exploits the learned features of different cancers reflected in the respective dysregulated epigenomes, which arise early in carcinogenesis and differ remarkably between different cancer types or subtypes, thus holding a great promise in early cancer detection.RESULTS: Our comprehensive assessment of the proposed model on the 33 different tissues of origin demonstrates its ability to detect and classify cancers to a high accuracy (> 99% overall F-measure). Furthermore, our model distinguishes cancers from pre-cancerous lesions to metastatic tumors and discriminates between hypomethylation changes due to age related epigenetic drift and true cancer.CONCLUSIONS: Beyond detection of primary cancers, our proposed computational model also robustly detects tissues of origin of secondary cancers, including metastatic cancers, second primary cancers, and cancers of unknown primaries. Our assessment revealed the ability of this model to characterize pre-cancer samples, a significant step forward in early cancer detection. Deployed broadly this model can deliver accurate diagnosis for a greatly expanded target patient population. <<<
翻译
回到顶部