The Variant Call Format Dual Coordinate Extension (DVCF) Specification
Preprints and early-stage research may not have been peer reviewed yet.
The specification defines a derived format of VCF, fully compliant with the VCF specification, which is called the Dual Coordinates VCF file (or DVCF). A DVCF file contains information about genetic variants in two different coordinate systems. The key feature of DVCF is that it can be rendered in two different ways - the Primary rendition and Luft rendition. Both these renditions are VCF specification-compliant files, that contain precisely the same information, merely rendered in two different coordinate systems. Since these two renditions contain precisely the same information, they can be losslessly cross-rendered back and forth. Cross-rendering is a fast operation that does not require a reference or chain file. Once a VCF file is lifted to a Dual Coordinate VCF file - it can be processed through an analytical pipeline, and since the data can be rendered in either coordinate system, each stage of the pipeline can arbitrarily operate on either coordinate system. Importantly, the rendering continues to work as fields and annotations are added, removed or modified, as the data works its way down the pipeline.