#paper doi:10.1101/2022.07.17.500374 bioRxiv, 2022, Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations. 这是一个“认真地做一件小事”的例子。在做基因组分析时,我们经常遭遇“究竟该用hg19还是hg38”的纠结,有时候不得不并行地分别使用两个参考基因组来进行两次差不多的分析,以避免由于使用liftOver之类的基因组坐标转换工具带来的信息丢失。这篇文章针对这个小小的(甚至不那么常见的)痛点,在兼容现有VCF格式的情况下,使其在同一个结果文件中带上两套基因组坐标,不仅不影响现有工具的使用,而且可以随时从中进行所需基因组坐标的提取。想法很简单,实现也不难,但却的确是有效解决了某些实际操作的问题。
Genozip Dual-Coordinate VCF format enables efficient genomic analyses and alleviates liftover limitations
We introduce Dual Coordinate VCF (DVCF), a file format that records genomic variants against two different reference genomes simultaneously and is fully compliant with the current VCF specification. As implemented in the Genozip platform, DVCF enables bioinformatics pipelines to seamlessly operate across two coordinate systems by leveraging the system most advantageous to each pipeline step, simplifying bioinformatics workflows and reducing file generation and associated data storage burden. Moreover, our benchmarking of Genozip DVCF shows that it produces more complete, less erroneous, and less biased translations across coordinate systems than two widely used alternative tools (i.e., LiftoverVcf and CrossMap).