颜林林
(2022-06-26 22:13):
#paper doi:10.1371/journal.pcbi.1009730 PLOS Computational Biology, 2022, Improved transcriptome assembly using a hybrid of long and short reads with StringTie. 这篇文章来自Johns Hopkins,开发了一个能够混合使用长读长及短读长测序数据进行转录组拼装的工具。高通量测序数据中,短读长平台的准确性高,但读长较短,难以覆盖完整转录本,而长读长平台虽然可以跨越多个外显子,帮助确定转录本剪切方式,但由于碱基准确度相对较差,因而也容易在比对时造成错误,影响转录本的确定。本文的工具,展示了由于测序错误导致的“嘈杂”比对,以及由此导致的搜索空间大幅增加。通过使用图论中的最大流量问题的解法,以及在“嘈杂”比对局部使用更准确的短读长数据,帮助确定正确的剪切位点,从而实现综合两种平台(长读长与短读长)的优势,且运算速度也并不弱于以往使用单一数据的工具算法。为评估此工具,本文除了使用模拟数据外,同时也选择了拟南芥、小鼠和人的多套真实数据集,在组装精读和输出的可正确注释的转录本等方面,都表现出符合预期的更好成绩。
Improved transcriptome assembly using a hybrid of long and short reads with StringTie
翻译
Abstract:
Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.
翻译
Related Links: