March 2025
·
1 Read
The Illumina Infinium MethylationEPIC v2.0 BeadChip (EPICv2 array) is a microarray for assessment of the human epigenome. Sites on the EPICv2 array are annotated with an open-source file provided by Illumina, the EPICv2 manifest. Of the 923,452 unique genomic sites targeted by the EPICv2 array, the Illumina manifest identifies just 214,808 as mapping to a gene, excluding many sites located within a gene body. Based on the genomic coordinates of probes, we have mapped each site assayed on the Illumina EPICv2 array using publicly available data, comprehensively annotating affiliated genes and regulatory elements. We have found that a total of 700,392 EPICv2 array sites are located within a gene body (exon, intron, or UTR) according to the GENCODE Human release 47 (GENCODEv47) database. 509,940 of these sites were not annotated as being within a gene in the Illumina EPICv2 manifest, primarily because the Illumina manifest does not annotate introns – 498,407 of the excluded sites, or 97.74%, are located within the intron of at least one transcript. The Illumina EPICv2 manifest annotates 358,539 sites as being within 1500bp of a transcription start site (TSS). Using a distance-based approach, we have labelled 267,183 sites as being within promoter distance of a gene (<1500bp upstream or <500bp downstream of the TSS), and 140,123 sites as being within enhancer distance (1501-5000bp upstream of the TSS, excluding sites located within a gene body). We re-annotated the EPICv2 manifest using GENCODEv47 data to label intragenic features, and a distance-based approach to label the regulatory genome. We also include a column indicating whether a site is located in any promoter or enhancer, according to the GeneHancer database. The re-annotated manifest additionally labels which sites are required for the Horvath DNA Methylation Age Calculator and MethylDetectR epigenetic clocks, to facilitate data preparation for these tools. In conclusion, we have re-annotated the EPICv2 manifest, allowing more complete assessment of EPICv2 sites associated with gene bodies and regulatory regions during the interpretation of epigenetic studies. The re-annotated manifest is publicly available – see the Data Availability section of this article.