PreprintPDF Available

Genome-wide Association Study of Long COVID

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Infections can lead to persistent or long-term symptoms and diseases such as shingles after varicella zoster, cancers after human papillomavirus, or rheumatic fever after streptococcal infections(1,2). Similarly, infection by SARS-CoV-2 can result in Long COVID, a condition characterized by symptoms of fatigue and pulmonary and cognitive dysfunction(3-5). The biological mechanisms that contribute to the development of Long COVID remain to be clarified. We leveraged the COVID-19 Host Genetics Initiative(6,7) to perform a genome-wide association study for Long COVID including up to 6,450 Long COVID cases and 1,093,995 population controls from 24 studies across 16 countries. We identified the first genome-wide significant association for Long COVID at the FOXP4 locus. FOXP4 has been previously associated with COVID-19 severity(6), lung function(8), and cancers(9), suggesting a broader role for lung function in the pathophysiology of Long COVID. While we identify COVID-19 severity as a causal risk factor for Long COVID, the impact of the genetic risk factor located in the FOXP4 locus could not be solely explained by its association to severe COVID-19. Our findings further support the role of pulmonary dysfunction and COVID-19 severity in the development of Long COVID.
Content may be subject to copyright.
1
Genome-wide Association Study of Long COVID
Authors
Vilma Lammi*, Tomoko Nakanishi*, Samuel E. Jones*, Shea J. Andrews, Juha Karjalainen, Beatriz
Cortés, Heath E. O'Brien, Brian E. Fulton-Howard, Hele H. Haapaniemi, Axel Schmidt, Ruth E. Mitchell,
Abdou Mousas, Massimo Mangino, Alicia Huerta-Chagoya, Nasa Sinnott-Armstrong, Elizabeth T.
Cirulli, Marc Vaudel, Alex S.F. Kwong, Amit K. Maiti, Minttu Marttila, Chiara Batini, Francesca Minnai,
Anna R. Dearman, C.A. Robert Warmerdam, Celia B. Sequeros, Thomas W. Winkler, Daniel M. Jordan,
Lindsay Guare, Ekaterina Vergasova, Eirini Marouli, Pasquale Striano, Ummu Afeera Zainulabid,
Ashutosh Kumar, Hajar Fauzan Ahmad, Ryuya Edahiro, Shuhei Azekawa, Long COVID Host Genetics
Initiative, FinnGen, DBDS Genomic Consortium, GEN-COVID Multicenter Study, Joseph J. Grzymski,
Makoto Ishii, Yukinori Okada, Noam D. Beckmann, Meena Kumari, Ralf Wagner, Iris M. Heid,
Catherine John, Patrick J. Short, Per Magnus, Karina Banasik, Frank Geller, Lude H. Franke, Alexander
Rakitko, Emma L. Duncan, Alessandra Renieri, Konstantinos K. Tsilidis, Rafael de Cid, Ahmadreza
Niavarani, Teresa Tusié-Luna, Shefali S. Verma, George Davey Smith, Nicholas J. Timpson, Mark J.
Daly, Andrea Ganna, Eva C. Schulte, J. Brent Richards, Kerstin U. Ludwig, Michael Hultström, Hugo
Zeberg, and Hanna M. Ollila
Author list footnotes:
*Joint first author
†Joint last author/corresponding author: hanna.m.ollila@helsinki.fi, hugo.zeberg@ki.se
See Authors_LongCOVIDHGI.xlsx for complete list of authors contributing to the Long COVID Host
Genetics Initiative and other consortium banner authors
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
2
Summary
Infections can lead to persistent or long-term symptoms and diseases such as shingles after varicella
zoster, cancers after human papillomavirus, or rheumatic fever after streptococcal infections1,2. Similarly,
infection by SARS-CoV-2 can result in Long COVID, a condition characterized by symptoms of fatigue
and pulmonary and cognitive dysfunction3–5. The biological mechanisms that contribute to the
development of Long COVID remain to be clarified. We leveraged the COVID-19 Host Genetics
Initiative6,7 to perform a genome-wide association study for Long COVID including up to 6,450 Long
COVID cases and 1,093,995 population controls from 24 studies across 16 countries. We identified the
first genome-wide significant association for Long COVID at the FOXP4 locus. FOXP4 has been
previously associated with COVID-19 severity6, lung function8, and cancers9, suggesting a broader role
for lung function in the pathophysiology of Long COVID. While we identify COVID-19 severity as a
causal risk factor for Long COVID, the impact of the genetic risk factor located in the FOXP4 locus could
not be solely explained by its association to severe COVID-19. Our findings further support the role of
pulmonary dysfunction and COVID-19 severity in the development of Long COVID.
Keywords
Post-acute sequelae of COVID-19 (PASC), Long COVID, Long-haul COVID-19, Post-COVID-19
syndrome, Post Covid Conditions (PCC), Long-term COVID-19 sequelae, FOXP4, Forkhead box
transcription factors, GWAS, Meta-analysis, COVID-19, Genetics, Genomics.
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
3
Introduction
The COVID-19 pandemic has led to the recognition of a new condition known as post-acute sequelae of
COVID-19 (PASC), post COVID-19 condition, or Long COVID. The current World Health Organization
definition includes any symptoms that present after COVID-19 and persist after three months10. Common
symptoms include fatigue, pulmonary dysfunction, muscle and chest pain, dysautonomia and cognitive
disturbances11–15. The incidence of Long COVID varies widely, with estimates ranging from 10% to
70%5. Long COVID is more common in individuals who have been hospitalized or treated at the
intensive care unit due to COVID-19, suggesting that COVID-19 severity could contribute to the risk of
Long COVID5,16. However, Long COVID can also occur in those with initially mild COVID-19
symptoms17. Potential mechanisms for Long COVID include persistent COVID-19 infection,
autoimmunity, reactivation of latent pathogens such as chickenpox or Epstein Barr virus, disrupted blood
clotting, and dysregulation of the autonomic nervous system5.
The COVID-19 Host Genetics Initiative (COVID-19 HGI) (https://www.covid19hg.org) was
launched to investigate the role of host genetics in COVID-19 and its various clinical subtypes18,19. The
COVID-19 HGI has identified 51 distinct genome-wide significant loci associated with COVID-19
critical illness, hospitalization and SARS-CoV-2 reported infection. These variants largely implicate
canonical pathways involved in viral entry, mucosal airway defence, and type I interferon response6,7,20,21.
To better understand the underlying causes of Long COVID, we conducted the first genome-wide
association study (GWAS) specifically focused on Long COVID. Our study includes data from 24 studies
conducted in 16 countries, totalling 6,450 individuals diagnosed with Long COVID and 1,093,995
controls (Fig. 1).
Results
Genetic variants in the FOXP4 locus are associated with Long COVID
We analysed 24 independent GWAS of Long COVID and computed four GWAS meta-analyses based on
two case definitions and two control definitions. A strict Long COVID case definition required having an
earlier test-verified SARS-CoV-2 infection (strict case definition), while a broader Long COVID case
definition also included self-reported or clinician-diagnosed SARS-CoV-2 infection (broad case
definition). The broad definition included all of the contributing studies whereas the strict definition was
met by 11 studies (Extended Table S1). Controls were either population controls, i.e., genetically
ancestry-matched samples without known Long COVID (broad control definition), or people that had
recovered from SARS-CoV-2 infection without Long COVID (strict control definition) (Fig. 1, Extended
Table S2). Data was obtained from altogether 16 countries, representing populations from 6 genetic
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
4
ancestries, and each meta-analysis combined data across ancestries. The most common symptoms in the
questionnaire-based studies with available symptom information were fatigue, shortness of breath, and
problems with memory and concentration. However, there was some heterogeneity in the frequency of
symptoms (Extended Fig. S1).
The GWAS meta-analysis using the strict case definition (N = 3,018) and the broad control
definition (N = 994,582) from 11 studies identified a genome-wide significant association within the
FOXP4 locus (chr6:41,515,652G>C, GRCh38, rs9367106, as the lead variant; P = 1.8×10-10, Fig. 2,
Table S3). The C allele at rs9367106 was associated with an increased risk of Long COVID (OR = 1.63,
95% confidence interval (CI): 1.40-1.89, risk allele frequency = 4.2%).
We observed an association, albeit not genome-wide significant, with rs9367106-C and Long
COVID also in all other three meta-analyses, including our largest meta-analysis with the broad case
definition (N = 6,450) and the broad control definition (N = 1,093,995) from 24 studies (OR = 1.34, 95%
CI: 1.20-1.49, P = 1.1×10-7, Extended Fig. S2, S3). Analyses with the strict case definition (N = 2,975)
and strict control definition (N = 37,935) (OR = 1.30, 95% CI: 1.09-1.56, P = 3.8×10-3), or the broad case
definition (N = 6,407) and strict control definition (N = 46,208) (OR = 1.16, 95% CI: 1.02-1.32, P =
0.023), further supported our finding (Extended Fig. S3).
To examine the consistency of the FOXP4 signal across the contributing studies, we investigated
the effect in each study. As a subset of studies lacked data for the lead variant, we used the available
variant in highest linkage disequilibrium (LD) as a proxy (Fig. 2b). Genetic variants in the meta-analysis
had varying statistical power due to missingness, due to genotyping and imputation quality, and due to
differences in allele frequency differences between populations. Therefore, the genetic variant that was
present in the majority of the studies was the most significant variant, not because it is the causal variant
but because it had the best statistical power. Moreover, variants in high LD with the same, or similar,
effect could display different association strengths because of the different statistical power across the
variants. We therefore examined the effect size of variants within 30 kb around the most significant
variant (rs9367106), including variants even with weak LD with the lead variant (r2 > 0.01 in individuals
of Europeans in the Human Genome Diversity Project22 and 1000 Genome Project23,24 and effective
sample size at least one third the sample size of the lead variant. Through this analysis, we identified a
haplotype spanning the genomic region chr6:41,512,355-41,537,458 (Genome Reference Consortium
Human Build 38, GRCh38), located upstream of FOXP4 gene, for which variants had a similar effect size
to the lead variant (Fig. 3b) and P values less than 5×10-7. This analysis identified 15 variants (Extended
Table S4). Relying on LD in the 1000 Genomes project among Europeans, we found 18 variants co-
segregating with the lead variant (r2 > 0.5, Extended Table S5). 9 variants overlapped between these two
analyses. None of the variants identified were protein coding.
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
5
Frequency of Long COVID variants at FOXP4 varies across ancestries
The allele frequency of rs9367106-C at the FOXP4 locus varied greatly among the different study
populations, with frequencies ranging from 1.6% in studies with non-Finnish Europeans to higher
frequencies such as 7.1% in Finnish, 19% in admixed Americans, and 36% in East Asians (Extended
Fig. S4). Most of the studies included in our analysis had individuals of primarily European descent
(Extended Fig. S5). Despite smaller sample sizes, we observed significant associations for the FOXP4
variant in the studies with admixed American, East Asian, and Finnish ancestries (Fig. 2b), owing to the
higher allele frequency, and thus larger statistical power to detect an association with the rs9367106
variant in these cohorts.
FOXP4 risk variants increase the expression of FOXP4 in the lung and is associated with
COVID-19 severity
The genomic region (+/-100 kilobases) surrounding the lead variant associated with Long COVID
contains four genes (FOXP4, FOXP4-AS1, LINC01276, MIR4641). Since no variant in LD with the lead
variant is coding, we investigated if any of these variants were associated with differential expression of
any of the surrounding genes within a 100 kb window. We used rs12660421 as a proxy (rs12660421-A
allele is correlated with the Long COVID risk allele rs9367106-C, r2 = 0.97 among European ancestry
individuals), given that the lead variant was not included in the GTEx dataset V8, and analysed
differential gene expression across all tissues included in the dataset. We found that rs12660421-A is
associated with an increase in FOXP4 expression in the lung (P = 5.3×10-9, normalized effect size (NES)
= 0.56) and in the hypothalamus (P = 2.6×10-6, NES = 1.4) (Fig. 4, Extended Fig. S6). None of the other
genes demonstrated any differential expression regarding the Long COVID-associated haplotype. FOXP4
is a transcription factor gene which has a broad tissue expression pattern and is expressed in nearly all
tissues, with the highest expression in the cervix, the thyroid, the vasculature, the stomach, and the
testis25. The expression also spans a broad set of cell types, including endothelial lung cells, immune cells,
and myocytes26. A colocalization analysis (Methods) suggested that the association signal of Long
COVID is the same signal that associates with the differential expression of FOXP4 in the lung (posterior
probability = 0.91) (Extended Fig. S7a,b, Extended Table S6).
Furthermore, variants in this region have also been identified as risk factors for hospitalization
due to COVID-19 in the COVID HGI meta-analyses6 and in Biobank Japan (Extended Fig. S8,
Extended Table S7). Our colocalization analysis demonstrated the FOXP4 risk haplotype identified here
as the same haplotype identified for COVID-19 severity (posterior probability > 0.97) (Extended Fig.
S7e,f, Extended Table S6).
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
6
Single cell sequencing data supports FOXP4 expression in alveolar and immune cells in the
lung
As lung tissue consists of several cell types, we wanted to elucidate the relevant cells that express FOXP4
and may contribute to Long COVID. To understand the role of FOXP4 in healthy lung before SARS-
CoV-2 infection, we analysed single cell sequencing data from the Tabula Sapiens (data available through
the Human Protein Atlas; https://www.proteinatlas.org/), a previously published atlas of single cell
sequencing data in healthy individuals free of COVID-1927. We observed the highest expression of
FOXP4 in type 2 alveolar cells (Fig. 4), a cell type that is capable of mounting robust innate immune
responses, thus participating in the immune regulation in the lung28. Furthermore, type 2 alveolar cells
secrete surfactant, keep the alveoli free from fluid, and serve as progenitor cells repopulating damaged
epithelium after injury29. In addition, we observed nearly equally high expression of FOXP4 in
granulocytes that similarly participate in regulation of innate immune responses. Overall, the findings
suggest a possible role of both immune and alveolar cells in lung in Long COVID.
The Long COVID FOXP4 variants are located at active chromatin in the lung
To understand the regulatory effects behind the variant association, we utilized the data from the
Regulome database30,31, ENCODE32, and VannoPortal33. We discovered that while the majority of the
Long COVID variants had active enhancer or transcription factor binding in a few ENCODE
experiments, we identified four variants of interest with possible additional functional consequences
(Extended Tables S8, S9). These variants had direct evidence of transcription factor binding based on
Chip sequencing experiments. rs2894439 located at the beginning of the risk haplotype was bound by
eight transcription factors, including POLR2A and EP300. rs7741164 and rs55889968 were both bound
by six transcription factors, including EP300 and FOXA1. And finally, one variant (rs9381074) was
directly located on a region that had DNA methylation marks across multiple tissues, including immune
and lung cells (H3K27me3 and H3K4me1, H3K4me3, H3K27ac, H3K4me2, H3K4me3), and had
evidence of transcriptional activity from 49 different transcription factors, of which we saw the most
consistent direct binding of FOXA1 across 55 experiments. Furthermore, we downloaded DNase
sequencing data from the ENCODE project and observed that rs9381074 was directly positioned on a
DNase hypersensitivity site in the lung (see Supplementary Methods for accession numbers).
The Long COVID FOXP4 variant is associated with lung cancer
To further understand the genetic variant that increases the risk of Long COVID, we examined whether
the FOXP4 variant was also associated with any other diseases. Specifically, we focused on Biobank
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
7
Japan34, as the Long COVID risk allele frequency is highest in East Asia. Phenome-wide association
study between rs9367106 and all phenotypes in Biobank Japan (N = 262) revealed that Long COVID risk
allele was associated with lung cancer (P = 1.2×10-6, Bonferroni P = 3.1×10-4, OR = 1.13, 95% CI = 1.07-
1.18) (Extended Fig. S8, Extended Table S7). Furthermore, the Long COVID risk allele is in LD with
the known risk variants for non-small cell lung carcinoma in Chinese and European populations35
(rs1853837, r2 = 0.88 in East Asian36) and for lung cancer in never-smoking Asian women37 (rs7741164,
r2 = 0.98 in East Asian36). Colocalization analysis supported that the associations in this locus (within 500
kb of rs9367106) for Long COVID and lung cancer shared the same genetic signal (colocalization
posterior probability = 0.98, Extended Fig. S7c,d).
Long COVID and other phenotypes
We investigated the relationship between Long COVID and cardiometabolic, behavioural, and psychiatric
traits7 (Fig. 5, Extended Table S10). We found positive genetic correlations between Long COVID and
insomnia symptoms, depression, risk tolerance, asthma, diabetes, and SARS-CoV-2 infection, while we
saw negative correlations with red and white blood cell counts (Fig. 5a). However, identified correlations
were only nominally significant without multiple testing correction (P < 0.05; Extended Table S11). The
estimated heritability of Long COVID was h2 = 0.023 in the meta-analysis using the strict case and
control definitions.
We used Mendelian randomization (MR) to estimate potential risk factors by analysing the same
traits mentioned above. Genetically predicted earlier smoking initiation (P = 0.022), more cigarettes
consumed per day (P = 0.046), higher levels of high-density lipoproteins (P = 0.029), and higher body-
mass index (P = 0.046) were nominally significant causal risk factors of Long COVID (Fig. 5b,
Extended Table S12). However, none of these associations survived correction for multiple comparisons.
The FOXP4 signals cannot be explained simply by severity of acute COVID-19
Earlier research has suggested that COVID-19 severity may be a risk factor for Long COVID3,16,38,39. We
investigated the relationship between COVID-19 hospitalization and Long COVID by performing a two-
sample MR (Extended Table S13). In terms of causality, we caution that COVID-19 hospitalization as
causal exposure is difficult to interpret because both Long COVID and COVID-19 hospitalization are two
outcomes of the same underlying infection. Nevertheless, the relationship between the effect size for
Long COVID versus the effect size for COVID-19 severity can shed some light on the role of COVID-19
severity in Long COVID. To perform two-sample MR without overlapping samples, we have excluded
the studies that contributed to the current Long COVID freeze 4 and re-run a meta-analysis of COVID-19
susceptibility and hospitalization of the remaining cohorts in the COVID-19 HGI. We observed a causal
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
8
relationship of susceptibility and hospitalization on Long COVID (inverse-variance weighted MR, P
2.4×10-3, for the strict case and the broad control definition) with no evidence of pleiotropy (MR Egger
intercept P
0.47) (Fig. 5c,d, Extended Table S13). Nevertheless, the Wald ratio of Long COVID to
COVID-19 hospitalization for the FOXP4 variant is 1.97 (95% CI: 1.36-2.57), which is significantly
greater than the slope of the MR-estimated relationship between COVID-19 hospitalization and Long
COVID (0.35, 95% CI: 0.12-0.57). The same phenomenon was seen when comparing with susceptibility
(5c). Thus, the FOXP4 signal demonstrates a stronger association with Long COVID than expected,
meaning that it cannot simply be explained by its association with either susceptibility or severity alone
(Fig. 5c,d). A recent systematic review of epidemiological data found a positive association between
COVID-19 hospitalization and Long COVID with a relationship on a log-odds scale of 0.91 (95% CI:
0.68 - 1.14)40. Even assuming this stronger relationship between COVID-19 hospitalization and Long
COVID, the observed effect of the FOXP4 variant on Long COVID still exceeds what would be expected
based on the association with severity alone.
Discussion
In this study, we aimed to understand the host genetic factors that contribute to Long COVID, using data
from 24 studies across 16 countries. Our analysis identified genetic variants within the FOXP4 locus as a
risk factor for Long COVID. The FOXP4 gene is expressed in the lung and the genetic variants associated
with Long COVID are also associated with differential expression of FOXP4 and with lung cancer and
COVID-19 severity. Additionally, using MR, we characterized COVID-19 severity as a causal risk factor
for Long COVID. Overall, our findings provide genomic evidence consistent with previous
epidemiological and clinical reports of Long COVID, indicating that Long COVID, similarly to other
post-viral conditions, is a heterogeneous disease entity where likely both individual genetic variants and
the environmental risk factors contribute to disease risk.
Our analysis revealed a connection between Long COVID and pulmonary endpoints through both
individual variants at FOXP4, a transcription factor-coding gene previously linked to lung cancer, and
MR analysis identifying smoking and COVID-19 severity as risk factors. Furthermore, expression
analysis of the lung, and cell type-specific single-cell sequencing analysis, showed FOXP4 expression in
both alveolar cell types and immune cells of the lung. FOXP4 belongs to the subfamily P of the forkhead
box transcription factor family genes and is expressed in various tissues, including the lungs and gut41,42.
Moreover, it is highly expressed in mucus-secreting cells of the stomach and intestines43, as well as in
naïve B, natural killer, and memory T-reg cells44, and required for normal T-cell memory function
following infection45. FOXP1/2/4 are also required for promoting lung endoderm development by
repressing expression of non-pulmonary transcription factors46, and the loss of FOXP1/4 adversely affects
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
9
the airway epithelial regeneration8. Furthermore, FOXP4 has been implicated in airway fibrosis47 and
promotion of lung cancer growth and invasion48. We find that the haplotype associated with Long COVID
is also associated with lung cancer in Biobank Japan34. These observations together with the present study
may suggest that the connection between FOXP4 and Long COVID may be rooted in both lung function
and immunology. Furthermore, the observation of FOXP4 expression in both alveolar and immune cells
in the lung, and the association with severe COVID-19 and pulmonary diseases such as cancer, suggest
that FOXP4 may participate in local immune responses in the lung.
We also discovered a causal relationship from COVID-19 infection to Long COVID, as expected,
and an additional causal risk from severe, hospital treatment-requiring COVID-19 to Long COVID. This
finding is in agreement with earlier epidemiological studies where higher prevalence of Long COVID was
seen among individuals that had severe acute COVID-19 infection3,16,38,39. The observation between
COVID-19 severity and Long COVID raises an interesting question: When SARS-CoV-2 infection is
required for COVID-19, and severe COVID-19, are all genetic variants that increase COVID-19
susceptibility or severity equally large risk factors for Long COVID? In the current study, we aimed to
answer this question through examining variant effect sizes between SARS-CoV-2 infection
susceptibility, COVID-19 severity, and Long COVID. We discovered that the majority of variants
affected only SARS-CoV-2 susceptibility or COVID-19 severity. In contrast, the FOXP4 variants had
higher effect size for Long COVID than expected, suggesting an independent role of FOXP4 for Long
COVID that was not observed with overall COVID-19 severity variants. Such observation offers clues on
biological mechanisms, such as FOXP4 affecting pulmonary function and immunity, which then
contribute to the development of Long COVID. Overall, our study elucidates genetic risk factors for Long
COVID, the relationship between Long COVID and severe COVID-19, and finally possible mechanisms
of how FOXP4 contributes to the risk of Long COVID. Future studies and iterations of this work will
likely grow the number of observed genetic variants and further clarify the biological mechanisms
underlying Long COVID.
We recognize that the symptomatology of Long COVID is variable and includes in addition to
lung symptoms, also other symptom domains such as fatigue and cognitive deficits3–5. In addition, the
long-term effects of COVID-19 are still being studied, and more research is needed to understand the full
extent of the long-term damage caused by SARS-CoV-2 and Long COVID disease. We also recognize
that the Long COVID diagnosis is still evolving. Nevertheless, our study provides direct genetic evidence
that lung pathophysiology can play an integral part in the development of Long COVID.
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
10
Figures
Fig. 1 | Geographic overview of studies contributing to analysis of Long COVID.
The 24 studies contributing to the Long COVID HGI Data Freeze 4 GWAS meta-
analyses. Each colour
represents a meta-analysis wit
h specific case and control definitions. Strict case definition = Long COVID
after test-verified SARS-CoV2 infection, broad case definition = Long COVID after any SARS-CoV-
2
infection. Strict control definition = individuals that had SARS-CoV-2 but did not
develop Long COVID,
broad control definition = population control i.e. all individuals in each study that did not meet the Long
COVID criteria. Effective sample sizes are shown as the size of each diamond shape. For more detailed
sample sizes, see Extended Table S1.
ur
ID
2
D,
ng
ed
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
11
Fig. 2 | Meta-analysis of 11 GWAS studies of Long COVID shows an association at the
FOXP4
locus.
a) Manhattan plot of Long COVID after test-verified SARS-CoV-
2 infection (strict case definition, N =
3,018) compared to all other individuals in each data set (population controls, broad control definition, N
= 994,582). A genome-wide significant associat
ion with Long COVID was found in the chromosome 6,
upstream of the FOXP4
gene (chr6:41515652:G:C, GRCh38, rs9367106, as the lead variant; P = 1.76×10
10
, Bonferroni P = 7.06×10
-10
, increased risk with the C allele, OR = 1.63, 95% CI: 1.40-
1.89). Horizontal
lines indicate genome-wide significant thresholds before (P
<
5×10
−8
, dashed line) and after
(1.25×10
−8
) Bonferroni correction over the four Long COVID meta-analyses.
=
N
6,
-
ter
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
12
b) Chromosome 6 lead variant across the contributing studies and ancestries GWAS meta-analyses of
Long COVID with strict case definition and broad control definition. Lead variant rs9367106 (solid line)
and if missing, imputed by the variant with the highest linkage disequilibrium (LD) with the lead variant
for illustrative purpose, i.e. rs12660421 (r = 0.98 in European in 1000G+HGDP samples49, dotted lines).
For the imputed variants, beta was weighted by multiplying by the LD correlation coefficient (r = 0.98).
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
13
Fig. 3 | The chromosome 6 region (chr6:41,490,001-41,560,000 (70 kb); FOXP4
locus) in the Long
COVID GWAS meta-analysis.
Long COVID meta-analysis with strict case and broad control definition (see Fig. 2). X-
axis shows the
position on chromosome 6 (Genome Reference Consortium Human Build 38). The Long COVID lead
ng
he
ad
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
14
variant (rs9367106) depicted with a triangle in each plot. a) Locus zoom plot with each variant coloured
by effective sample size and showing statistical significance on y-axis. b) Each variant coloured by
statistical significance (-log10(P value)) and showing effect sizes (beta ± standard error). c) Each variant
coloured by ancestry and showing linkage disequilibrium (r) with our lead variant on y-axis. AFR,
African; AMR, Admixed American; EAS, East Asian; EUR, European. d) Ensembl genes in the region
(FOXP4 not fully shown) (www.ensembl.org)50.
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
15
Fig. 4 | FOXP4 expression in the lung.
a)
The lead variant rs9367106 was not found in the GTEx dataset, but a proxy variant (rs12660421,
chr6:41520640) in high LD (r
2
=0.97, rs12660421-
A allele is correlated with the Long COVID risk allele
rs9367106-C) showed a significant expression quantitative trait locus (eQTL), increasing
F
OX
P4
expression in the lung (P = 5.3×10
-9
, normalized effect size (NES) = 0.56,
https://gtexportal.org/home/snp/rs12660421). For other tissues, see multi-tissue eQT
L plot in the
Supplemental information (Extended Fig. S6).
b)
Colocalization analysis using eQTL data from GTEx v8 tissue type and Long COVID association data.
Plots illustrate -log
10
P value for Long COVID (x-axis) and for FOXP4 expression in the Lung (y-
axis),
regional association of the FOXP4
locus variants with Long COVID (top right), and regional association
of the FOXP4
variants with RNA expression measured in the Lung in GTEx (bottom right). Variants are
coloured by 1000 Genomes European-ancestry LD r
2
with the lead variant (rs12660421) for
F
OX
P4
1,
ele
P4
6,
he
ta.
s),
on
re
P4
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
16
expression in lung tissue. (The most significant Long COVID variant overlapping the GTEx v8 dataset
(rs9381074) also annotated.)
c) Human Protein Atlas RNA single cell type tissue cluster data (transcript expression levels summarized
per gene and cluster) of lung (GSE130148) showing FOXP4 expression in unaffected individuals. The
values were visualized using log10(protein-transcripts per million [pTPM]) values. Each c-X annotation
is taken from the clustering results performed in Human Protein Atlas.
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
17
Fig. 5 | Genetic correlations and Mendelian randomization causal estimates between Long COVID
and potential risk factors, biomarkers and diseases.
a) Linkage disequilibrium score regression (LDSC, upper panel; Extended Table S11) and b)
inverse
variance-weighted Mendelian randomization (MR, bottom panel; Extended Table S12
) were used for
calculating two-sided P values. Size of each col
oured square corresponds to statistical significance (P
values < 0.01 [**] full-sized square, <0.05 [*] full-sized square, <
0.1 large square, <0.5 medium square,
and >0.5 small square; not corrected for multiple comparisons). BMI, body mass index; CRP, C-r
eactive
protein; eGFR, estimated glomerular filtration rate; ADHD, attention-deficit hyperactivity disorder.
c) MR scatter plot with effect sizes (beta±SE) of each variant on COVID-
19 susceptibility (reported
SARS-CoV-2 infection) as exposure and Long
COVID (strict case, broad control definition) as outcome
(P IVW = 2.4×10
−3
, pleiotropy P = 0.47; Extended Table S13).
d) Similarly, MR with COVID-
19 hospitalization as exposure and Long COVID as outcome (P IVW =
7.5×10
−5
, pleiotropy P = 0.55; Extended Table S13).
ID
rse
for
(P
re,
ve
ed
e
=
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
18
Methods
Contributing studies
A total of 24 studies contributed to the analysis, with a total sample size of 6,450 Long COVID cases with
46,208 COVID-19 positive controls and 1,093,955 population controls. Participants provided informed
consent to participate in the respective study, with recruitment and ethics following study-specific
protocols approved by their respective Institutional Review Boards (Details are provided in Extended
Table S2). The effective sample sizes for each study shown in Fig. 1 were calculated for display using the
formula:
(4 × N_case × N_control)/(N_case + N_control). The Long COVID Host Genetics Initiative is a global
and ongoing collaboration, open to all studies around the world that have data to run Long COVID
GWAS using our phenotypic criteria described below.
Phenotype definitions
We used the following criteria for assigning case control status for Long COVID aligning with the World
Health Organization guidelines10 (Supplementary Methods). Study participants were defined as Long
COVID cases if, at least three months since SARS-CoV-2 infection or COVID-19 onset, they met any of
the following criteria:
1. presence of one or more self-reported COVID-19 symptoms that cannot be explained by an
alternative diagnosis
2. report of ongoing significant impact on day-to-day activities
3. any diagnosis codes of Long COVID (e.g. Post COVID-19 condition, ICD-10 code U09(.9))
Criteria 1 and 2 were applied only to questionnaire-based cohorts, whereas 3 was used in studies with
electronic health records (EHR). Detailed phenotyping criteria and diagnosis codes of each study are
provided in Extended Table S2.
We used two Long COVID case definitions, a strict definition requiring a test-verified SARS-
CoV-2 infection and a broad definition including self-reported or clinician-diagnosed SARS-CoV-2
infection (any Long COVID).
We applied two control definitions. First, we used population controls, i.e. everybody that is not a case.
Population controls were genetic ancestry-matched individuals who were not defined as Long COVID
cases using the above-mentioned questionnaire or EHR-based definition. In the second analysis, we
compared Long COVID cases to individuals who had had SARS-CoV-2 infection but who did not meet
the criteria of Long COVID, i.e. had fully recovered within 3 months from the infection.
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
19
We used in total four different case-control definitions to generate four GWASs as below;
1) Long COVID cases after test-verified SARS-CoV-2 infection vs population controls (the strict
case definition vs the broad control definition)
2) Long COVID within test-verified SARS-CoV-2 infection (the strict case definition vs the strict
control definition)
3) Any Long COVID cases vs population controls (the broad case definition vs the broad control
definition)
4) Long COVID within any SARS-CoV-2 infection (the broad case definition vs the strict control
definition)
Genome-wide association studies
We largely applied the GWAS analysis plans used in the COVID-19 HGI6. Each study performed their
own sample collection, genotyping, genotype and sample quality control (QC), imputation and
association analyses independently, according to our central analysis plan
(https://docs.google.com/document/d/1XRQgDOEp62TbWaqLYi1RAk1OHVP5T3XZqfs_6PoPt_k),
before submitting the results for meta-analysis (Details are provided in Extended Table S2). We
recommended that GWAS were run using REGENIE51 on chromosomes 1–22 and X, though some
studies used SAIGE52 or PLINK53. The minimum set of covariates to be included at runtime were age,
age2, sex, age × sex and the first 10 genetic principal components. We advised studies to include any
additional study-specific covariates where needed, such as those related to genotype batches or other
demographic and technical factors that could lead to stratification within the cohort. Studies performing
the GWAS using a software that does not account for sample relatedness (such as PLINK) were advised
to exclude related individuals.
GWAS meta-analyses
The meta-analysis pipeline was also adopted from the COVID-19 HGI flagship paper
[https://www.nature.com/articles/s41586-021-03767-x]. The code is available at LongCOVID HGI
GitHub (https://github.com/long-covid-hg/META_ANALYSIS/), and is a modified version of the
pipeline developed for the COVID-19 HGI (https://github.com/covid19-hg/META_ANALYSIS). To
ensure that individual study results did not suffer from excessive inflation, deflation and false positives,
we manually investigated plots of the reported allele frequencies against aggregated gnomAD v3.0 49
allele frequencies in the same population. We also evaluated whether the association standard errors were
excessively small, given the calculated effective sample size, to identify studies deviating from the
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
20
expected trend. Where these issues were detected, the studies were contacted to reperform the association
analysis, if needed, and resubmit their results.
Prior to the meta-analysis itself, the summary statistics were standardized, filtered (excluding
variants with allele frequency <0.1% or imputation INFO score <0.6), lifted over to reference genome
build GRCh38 (in studies imputed to GRCh37), and harmonized to gnomAD v3.0 through matching by
chromosome, position and alleles (Supplementary Methods).
The meta-analysis was performed using a fixed-effects inverse-variance weighted (IVW) method
on variants that were present in at least two studies contributing to the specific phenotype being analysed.
To assess if one study was primarily driving any associations, we simultaneously ran a leave-most-
significant-study-out (LMSSO) meta-analysis for each variant (based on the variant’s study-level P
value). Heterogeneity between studies were estimated using Cochran’s Q-test54. Each set of meta-analysis
results were then filtered to exclude variants whose total effective sample size (in the non-LMSSO
analysis) was less than 1/3 of the total effective sample size of all studies contributing to that meta-
analysis. We report significant loci that pass the genome-wide significance threshold (P
5×10-8 / 4 =
1.25×10-8) accounting for the number of GWAS meta-analyses we performed.
Principal component projection
In a similar fashion to the COVID-19 HGI, we asked each study to project their cohort onto a multi-ethnic
genetic principal component space (Extended Fig. S5), by providing studies with pre-computed PC
loadings and reference allele frequencies from unrelated samples from the 1000 Genomes Project23,24 and
the Human Genome Diversity Project. The loadings and frequencies were generated for a set of 117,221
autosomal, common (MAF
0.1%) and LD-pruned (r2 < 0.8; 500-kb window) single-nucleotide
polymorphisms (SNPs) that would be available in the imputed data of most studies. Access to the
projecting and plotting scripts was made available to the studies at https://github.com/long-covid-
hg/pca_projection.
eQTL, PheWAS and colocalization
For the single (Bonferroni-corrected) genome-wide significant lead variant, rs9367106, we used the
GTEx portal (https://gtexportal.org/)26 to understand if this variant had any tissue-specific effects on gene
expression. As rs9367106 was not available in the GTEx database, we first identified a proxy variant,
rs12660421 (r2 = 0.90) using all individuals from the 1000 Genomes Project23 and then performed a
lookup in the portal’s GTEx v8 dataset25.
To identify other phenotypes associated with rs9367106, we used the Biobank Japan PheWeb
portal (https://pheweb.jp/)9 to perform a phenome-wide association analysis, as the minor allele frequency
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
21
of rs9367106 is highest in East Asia. To assess whether the FOXP4 association is shared between Long
COVID, and tissue-specific eQTLs, lung cancer, and COVID-19 hospitalization, we extracted a 1Mb
region centred on rs9367107 (chr6:41,015,652-42,015,652) from the lung cancer and COVID-19
hospitalization summary statistics and the GTEx v8 data and performed colocalization analyses using the
R package coloc (v5.1.0.1)55,56 in R v4.2.2. Colocalization locus zoom plots were created using the
LocusCompareR R package v1.0.057 with LD r2 estimated using 1000 Genomes European-ancestry
individuals23,24.
Genetic correlation and Mendelian Randomization
We assessed the genetic overlap and causal associations between Long COVID outcomes and the same
set of risk factors, biomarkers and diseases liabilities as in the COVID-19 HGI flagship paper6.
Additionally, we tested the overlap and causal impact of COVID-19 susceptibility and hospitalization
risk. Genetic correlations were assessed using Linkage Disequilibrium Score Regression (LDSC) v1.0.1
58,59. Where there were sufficient genome-wide significant variants, the causal impact was tested in a two-
sample Mendelian Randomization framework using the TwoSampleMR (v0.5.6) R package60 with R
v4.0.3. To avoid sample overlap between exposure GWASs (here COVID-19 hospitalization and SARS-
CoV-2 reported infection) and outcome GWASs (here Long COVID phenotypes), we performed meta-
analyses of COVID-19 hospitalization and SARS-CoV-2 reported infection using data freeze 7 of the
COVID-19 HGI by excluding studies that participated in the Long COVID (freeze 4) effort. Independent
significant exposure variants with p
5×10-8 were identified by LD-clumping the full set of summary
statistics using an LD r2 threshold of 0.001 (based on the 1000 Genomes European-ancestry reference
samples23) and a 10-Mb clumping window. For each exposure-outcome pair, these variants were then
harmonized to remove variants with mismatched alleles and ambiguous palindromic variants (MAF
>45%). Fixed-effects Inverse Variance Weighted meta-analysis was used as the primary MR methods,
with MR-Egger, Weighted Median Estimator, Weighted Mode Based Estimator, MR-PRESSO used in
sensitivity analyses. Heterogeneity was assessed using the MR-PRESSO global test and pleiotropy using
the MR-Egger intercept. The genetic correlation and Mendelian Randomization analysis were
implemented as a Snakemake Workflow made available at https://github.com/marcoralab/MRcovid.
Summaries of the exposure GWAS are provided in Extended Table S10 and the association statistics for
all exposure variants are provided in Extended Table S14.
Data availability
We have made the results of these GWAS meta-analyses publicly available for variants passing post-
meta-analysis filtering for minor allele frequency >=1% and effective sample size >1/3 of the maximum
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
22
effective sample size for each meta-analysis. These can be accessed at LocusZoom, where the
associations can be visually explored and the summary statistics exported for further scientific
discovery61.
Strict case definition (Long COVID after test-verified SARS-CoV-2 infection) vs broad control definition
(population control):
https://my.locuszoom.org/gwas/192226/?token=09a18cf9138243db9cdf79ff6930fdf8
Broad case definition (Long COVID after any SARS-CoV-2 infection) vs broad control definition:
https://my.locuszoom.org/gwas/826733/?token=c7274597af504bf3811de6d742921bc8
Strict case definition vs strict control definition (individuals that had SARS-CoV-2 but did not develop
Long COVID):
https://my.locuszoom.org/gwas/793752/?token=0dc986619af14b6e8a564c580d3220b4
Broad case definition vs strict control definition:
https://my.locuszoom.org/gwas/91854/?token=723e672edf13478e817ca44b56c0c068
References
1. Coates, M. M. et al. Burden of non-communicable diseases from infectious causes in 2017: a
modelling study. Lancet Glob. Health 8, e1489–e1498 (2020).
2. Patil, A., Goldust, M. & Wollina, U. Herpes zoster: A Review of Clinical Manifestations and
Management. Viruses 14, (2022).
3. Sudre, C. H. et al. Attributes and predictors of long COVID. Nat. Med. 27, 626–631 (2021).
4. Castanares-Zapatero, D. et al. Pathophysiology and mechanism of long COVID: a comprehensive
review. Ann. Med. 54, 1473–1487 (2022).
5. Davis, H. E., McCorkell, L., Vogel, J. M. & Topol, E. J. Long COVID: major findings, mechanisms
and recommendations. Nat. Rev. Microbiol. 21, 133–146 (2023).
6. Mapping the human genetic architecture of COVID-19. Nature 600, 472–477 (2021).
7. A first update on mapping the human genetic architecture of COVID-19. Nature 608, E1–E10
(2022).
8. Li, S. et al. Foxp1/4 control epithelial cell fate during lung development and regeneration through
regulation of anterior gradient 2. Dev. Camb. Engl. 139, 2500–2509 (2012).
9. Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat.
Genet. 53, 1415–1424 (2021).
10. Soriano, J. B., Murthy, S., Marshall, J. C., Relan, P. & Diaz, J. V. A clinical case definition of post-
COVID-19 condition by a Delphi consensus. Lancet Infect. Dis. 22, e102–e107 (2022).
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
23
11. Desai, A. D., Lavelle, M., Boursiquot, B. C. & Wan, E. Y. Long-term complications of COVID-19.
Am. J. Physiol. Cell Physiol. 322, C1–C11 (2022).
12. Mehandru, S. & Merad, M. Pathological sequelae of long-haul COVID. Nat. Immunol. 23, 194–202
(2022).
13. Hugon, J., Msika, E.-F., Queneau, M., Farid, K. & Paquet, C. Long COVID: cognitive complaints
(brain fog) and dysfunction of the cingulate cortex. J. Neurol. 269, 44–46 (2022).
14. Ceban, F. et al. Fatigue and cognitive impairment in Post-COVID-19 Syndrome: A systematic
review and meta-analysis. Brain. Behav. Immun. 101, 93–135 (2022).
15. Sykes, D. L. et al. Post-COVID-19 Symptom Burden: What is Long-COVID and How Should We
Manage It? Lung 199, 113–119 (2021).
16. Global Burden of Disease Long COVID Collaborators et al. Estimated Global Proportions of
Individuals With Persistent Fatigue, Cognitive, and Respiratory Symptom Clusters Following
Symptomatic COVID-19 in 2020 and 2021. JAMA 328, 1604–1615 (2022).
17. Mizrahi, B. et al. Long covid outcomes at one year after mild SARS-CoV-2 infection: nationwide
cohort study. BMJ 380, e072529 (2023).
18. The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic
factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur. J. Hum. Genet. EJHG
28, 715–718 (2020).
19. Nakanishi, T. et al. Age-dependent impact of the major common genetic risk factor for COVID-19
on severity and mortality. J. Clin. Invest. 131, (2021).
20. Ellinghaus, D. et al. Genomewide Association Study of Severe Covid-19 with Respiratory Failure.
N. Engl. J. Med. 383, 1522–1534 (2020).
21. Pairo-Castineira, E. et al. Genetic mechanisms of critical illness in COVID-19. Nature 591, 92–98
(2021).
22. Bergström, A. et al. Insights into human genetic variation and population history from 929 diverse
genomes. Science 367, (2020).
23. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
24. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291
(2016).
25. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318
1330 (2020).
26. GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585
(2013).
27. Jones, R. C. et al. The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans.
Science 376, eabl4896 (2022).
28. Corbière, V. et al. Phenotypic characteristics of human type II alveolar epithelial cells suitable for
antigen presentation to T lymphocytes. Respir. Res. 12, 15 (2011).
29. Mason, R. J. Biology of alveolar type II cells. Respirol. Carlton Vic 11 Suppl, S12-15 (2006).
30. Boyle, A. P. et al. Annotation of functional variation in personal genomes using RegulomeDB.
Genome Res. 22, 1790–1797 (2012).
31. Dong, S. et al. Annotating and prioritizing human non-coding variants with RegulomeDB v.2. Nat.
Genet. 55, 724–726 (2023).
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
24
32. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome.
Nature 489, 57–74 (2012).
33. Huang, D. et al. VannoPortal: multiscale functional annotation of human genetic variants for
interrogating molecular mechanism of traits and diseases. Nucleic Acids Res. 50, D1408–D1416
(2022).
34. Nagai, A. et al. Overview of the BioBank Japan Project: Study design and profile. J. Epidemiol. 27,
S2–S8 (2017).
35. Dai, J. et al. Identification of risk loci and a polygenic risk score for lung cancer: a large-scale
prospective cohort study in Chinese populations. Lancet Respir. Med. 7, 881–891 (2019).
36. Machiela, M. J. & Chanock, S. J. LDlink: a web-based application for exploring population-specific
haplotype structure and linking correlated alleles of possible functional variants. Bioinforma. Oxf.
Engl. 31, 3555–3557 (2015).
37. Wang, Z. et al. Meta-analysis of genome-wide association studies identifies multiple lung cancer
susceptibility loci in never-smoking Asian women. Hum. Mol. Genet. 25, 620–629 (2016).
38. Subramanian, A. et al. Symptoms and risk factors for long COVID in non-hospitalized adults. Nat.
Med. 28, 1706–1714 (2022).
39. Resendez, S. et al. Defining the Subtypes of Long COVID and Risk Factors for Prolonged Disease.
2023.05.19.23290234 Preprint at https://doi.org/10.1101/2023.05.19.23290234 (2023).
40. Tsampasian, V. et al. Risk Factors Associated With Post-COVID-19 Condition: A Systematic
Review and Meta-analysis. JAMA Intern. Med. e230750 (2023)
doi:10.1001/jamainternmed.2023.0750.
41. Lu, M. M., Li, S., Yang, H. & Morrisey, E. E. Foxp4: a novel member of the Foxp subfamily of
winged-helix genes co-expressed with Foxp1 and Foxp2 in pulmonary and gut tissues. Mech. Dev.
119 Suppl 1, S197-202 (2002).
42. Takahashi, K., Liu, F.-C., Hirokawa, K. & Takahashi, H. Expression of Foxp4 in the developing and
adult rat forebrain. J. Neurosci. Res. 86, 3106–3116 (2008).
43. Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419
(2015).
44. Schmiedel, B. J. et al. Impact of Genetic Polymorphisms on Human Immune Cell Gene Expression.
Cell 175, 1701-1715.e16 (2018).
45. Wiehagen, K. R. et al. Foxp4 is dispensable for T cell development, but required for robust recall
responses. PloS One 7, e42273 (2012).
46. Li, S.
et al. Foxp transcription factors suppress a non-pulmonary gene expression program to permit
proper lung development. Dev. Biol. 416, 338–346 (2016).
47. Chen, Y. et al. Downregulation of microRNA
423
5p suppresses TGF
β
1
induced EMT by
targeting FOXP4 in airway fibrosis. Mol. Med. Rep. 26, 242 (2022).
48. Yang, T. et al. FOXP4 modulates tumor growth and independently associates with miR-138 in non-
small cell lung cancer cells. Tumour Biol. J. Int. Soc. Oncodevelopmental Biol. Med. 36, 8185–8191
(2015).
49. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456
humans. Nature 581, 434–443 (2020).
50. Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res. 50, D988–D995 (2022).
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
25
51. Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary
traits. Nat. Genet. 53, 1097–1103 (2021).
52. Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-
scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
53. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets.
GigaScience 4, 7 (2015).
54. Neupane, B., Loeb, M., Anand, S. S. & Beyene, J. Meta-analysis of genetic association studies under
heterogeneity. Eur. J. Hum. Genet. EJHG 20, 1174–1181 (2012).
55. Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A Simple New Approach to Variable Selection
in Regression, with Application to Genetic Fine Mapping. J. R. Stat. Soc. 82, 1273–1300 (2020).
56. Wallace, C. A more accurate method for colocalisation analysis allowing for multiple causal
variants. PLoS Genet. 17, e1009440 (2021).
57. Liu, B., Gloudemans, M. J., Rao, A. S., Ingelsson, E. & Montgomery, S. B. Abundant associations
with gene expression complicate GWAS follow-up. Nat. Genet. 51, 768–769 (2019).
58. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in
genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
59. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet.
47, 1236–1241 (2015).
60. Hemani, G., Tilling, K. & Davey Smith, G. Orienting the causal relationship between imprecisely
measured traits using GWAS summary data. PLoS Genet. 13, e1007081 (2017).
61. Boughton, A. P. et al. LocusZoom.js: interactive and embeddable visualization of genetic association
study results. Bioinforma. Oxf. Engl. 37, 3017–3018 (2021).
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 1, 2023. ; https://doi.org/10.1101/2023.06.29.23292056doi: medRxiv preprint
... Several pathogenic mechanisms have been suggested for PASC and other PAIS. These include persistence of virus and/or viral components, virus-induced tissue damage, endothelial dysfunction, coagulopathy, autonomic dysfunction, chronic inflammation, and autoimmunity [35,36,[47][48][49][50][51]. The risk of developing PASC is two to fourfold higher after infection with a pre-omicron variant [52,53], and COVID-19 vaccination was shown to reduce the risk of PASC, depending on the triggering virus variant [53][54][55][56]. ...
Article
Full-text available
This review summarizes current knowledge on post-acute sequelae of COVID-19 (PASC) and post-COVID-19 condition (PCC) in children and adolescents. A literature review was performed to synthesize information from clinical studies, expert opinions, and guidelines. PASC also termed Long COVID — at any age comprise a plethora of unspecific symptoms present later than 4 weeks after confirmed or probable infection with severe respiratory syndrome corona virus type 2 (SARS-CoV-2), without another medical explanation. PCC in children and adolescents was defined by the WHO as PASC occurring within 3 months of acute coronavirus disease 2019 (COVID-19), lasting at least 2 months, and limiting daily activities. Pediatric PASC mostly manifest after mild courses of COVID-19 and in the majority of cases remit after few months. However, symptoms can last for more than 1 year and may result in significant disability. Frequent symptoms include fatigue, exertion intolerance, and anxiety. Some patients present with postural tachycardia syndrome (PoTS), and a small number of cases fulfill the clinical criteria of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). To date, no diagnostic marker has been established, and differential diagnostics remains challenging. Therapeutic approaches include appropriate self-management as well as the palliation of symptoms by non-pharmaceutical and pharmaceutical strategies. Conclusion: PASC in pediatrics present with heterogenous severity and duration. A stepped, interdisciplinary, and individualized approach is essential for appropriate clinical management. Current health care structures have to be adapted, and research was extended to meet the medical and psychosocial needs of young people with PASC or similar conditions. What is Known: • Post-acute sequelae of coronavirus 2019 (COVID-19) (PASC) — also termed Long COVID — in children and adolescents can lead to activity limitation and reduced quality of life. • PASC belongs to a large group of similar post-acute infection syndromes (PAIS). Specific biomarkers and causal treatment options are not yet available. What is New: • In February 2023, a case definition for post COVID-19 condition (PCC) in children and adolescents was provided by the World Health Organization (WHO), indicating PASC with duration of at least 2 months and limitation of daily activities. PCC can present as myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). • Interdisciplinary collaborations are necessary and have been established worldwide to offer harmonized, multimodal approaches to diagnosis and management of PASC/PCC in children and adolescents.
... For example, a recent large genome-wide association analysis demonstrated that the Forkhead Box P4 gene (FOXP4) has a signi cant association to long COVID. 5 Further, there have been over 20 genetic variants identi ed with signi cant associations to COVID-19 contraction and hospitalization. 6 Given these ndings, it is plausible that these genetic variants may also have an effect on a person's risk to have long lasting side effects after contracting COVID-19. ...
Preprint
Full-text available
Over 200 million SARS-CoV-2 patients have or will develop persistent symptoms (long COVID). Given this pressing research priority, the National COVID Cohort Collaborative (N3C) developed a machine learning model using only electronic health record data to identify potential patients with long COVID. We hypothesized that additional data from health surveys, mobile devices, and genotypes could improve prediction ability. In a cohort of SARS-CoV-2 infected individuals (n=17,755) in the All of Us program, we applied and expanded upon the N3C long COVID prediction model, testing machine learning infrastructures, assessing model performance, and identifying factors that contributed most to the prediction models. For the survey/mobile device information and genetic data, extreme gradient boosting and a convolutional neural network delivered the best performance for predicting long COVID, respectively. Combined survey, genetic, and mobile data increased specificity and the Area Under Curve the Receiver Operating Characteristic score versus the original N3C model.
... A higher platelet activity can lead to severe forms of COVID-19 through interactions with other platelets or leukocytes by aggregation, spreading, and adhesion, amplifying the dysfunction of the endothelium [48]. A recent GWAS study identified FOXP4 as a locus associated with long COVID [49], a gene involved in ciliogenesis and mucus production in the epithelium [50,51] and the effector cytokine production by T cells during specific antigen recall responses [51]. As such, TFs can perform different roles in several tissues or cell lineages during COVID-19 infection with synergistic or contradictory effects on the immune responses, simultaneously or sequentially. ...
Preprint
Full-text available
Long COVID, also known as post-acute sequelae of SARS-CoV-2 infection (PASC), has emerged as a significant health concern following the COVID-19 pandemic. Molecular mechanisms underlying the occurrence and progression of long COVID include viral persistence, immune dysregulation, endothelial dysfunction, and neurological involvement, and highlight the need for further research to develop targeted therapies for this condition. While a clearer picture of the clinical symptomatology is shaping, many molecular mechanisms are yet to be unraveled, given their complexity and high level of interaction with other metabolic pathways. This review summarizes some of the most important symptoms and associated molecular mechanisms that occur in long COVID, as well as the most relevant molecular techniques that can be used in understanding the viral pathogen, its affinity towards the host and the possible outcomes of host-pathogen interaction.
Article
Full-text available
One in ten severe acute respiratory syndrome coronavirus 2 infections result in prolonged symptoms termed long coronavirus disease (COVID), yet disease phenotypes and mechanisms are poorly understood ¹ . Here we profiled 368 plasma proteins in 657 participants ≥3 months following hospitalization. Of these, 426 had at least one long COVID symptom and 233 had fully recovered. Elevated markers of myeloid inflammation and complement activation were associated with long COVID. IL-1R2, MATN2 and COLEC12 were associated with cardiorespiratory symptoms, fatigue and anxiety/depression; MATN2, CSF3 and C1QA were elevated in gastrointestinal symptoms and C1QA was elevated in cognitive impairment. Additional markers of alterations in nerve tissue repair (SPON-1 and NFASC) were elevated in those with cognitive impairment and SCG3, suggestive of brain–gut axis disturbance, was elevated in gastrointestinal symptoms. Severe acute respiratory syndrome coronavirus 2-specific immunoglobulin G (IgG) was persistently elevated in some individuals with long COVID, but virus was not detected in sputum. Analysis of inflammatory markers in nasal fluids showed no association with symptoms. Our study aimed to understand inflammatory processes that underlie long COVID and was not designed for biomarker discovery. Our findings suggest that specific inflammatory pathways related to tissue damage are implicated in subtypes of long COVID, which might be targeted in future therapeutic trials.
Article
The pandemic of coronavirus disease 2019 (COVID-19), etiologically related to the SARS-CoV-2 virus (severe acute respiratory syndrome coronavirus-2), has drawn attention to new clinical and fundamental problems in the immunopathology of human diseases associated with virus-induced autoimmunity and autoinflammation. The provision that “the experience gained in rheumatology in the process of studying the pathogenetic mechanisms and pharmacotherapy of immunoinflammatory rheumatic diseases as the most common and severe forms of autoimmune and autoinflammatory pathology in humans will be in demand for deciphering the nature of the pathological processes underlying COVID-19 and developing approaches to effective pharmacotherapy” was confirmed in numerous studies conducted over the next 3 years in the midst of the COVID-19 pandemic. The main focus will be on a critical analysis of data regarding the role of autoimmune inflammation, which forms the basis of the pathogenesis of immune-mediated rheumatic diseases in the context of the immunopathology of COVID-19.
Article
Full-text available
The COVID-19 pandemic led to the rapid and worldwide development of highly effective vaccines against SARS-CoV-2. However, there is significant individual-to-individual variation in vaccine efficacy due to factors including viral variants, host age, immune status, environmental and host genetic factors. Understanding those determinants driving this variation may inform the development of more broadly protective vaccine strategies. While host genetic factors are known to impact vaccine efficacy for respiratory pathogens such as influenza and tuberculosis, the impact of host genetic variation on vaccine efficacy against COVID-19 is not well understood. To model the impact of host genetic variation on SARS-CoV-2 vaccine efficacy, while controlling for the impact of non-genetic factors, we used the Diversity Outbred (DO) mouse model. We found that DO mice immunized against SARS-CoV-2 exhibited high levels of variation in vaccine-induced neutralizing antibody responses. While the majority of the vaccinated mice were protected from virus-induced disease, similar to human populations, we observed vaccine breakthrough in a subset of mice. Importantly, we found that this variation in neutralizing antibody, virus-induced disease, and viral titer is heritable, indicating that the DO serves as a useful model system for studying the contribution of genetic variation of both vaccines and disease outcomes.
Article
Full-text available
Background Long COVID is a clinical entity characterized by persistent health problems or development of new diseases, without an alternative diagnosis, following SARS-CoV-2 infection that affects a significant proportion of individuals globally. It can manifest with a wide range of symptoms due to dysfunction of multiple organ systems including but not limited to cardiovascular, hematologic, neurological, gastrointestinal, and renal organs, revealed by observational studies. However, a causal association between the genetic predisposition to COVID-19 and many post-infective abnormalities in long COVID remain unclear. Methods Here we employed Mendelian randomization (MR), a robust genetic epidemiological approach, to investigate the potential causal associations between genetic predisposition to COVID-19 and long COVID symptoms, namely pulmonary (pneumonia and airway infections including bronchitis, emphysema, asthma, and rhinitis), neurological (headache, depression, and Parkinson’s disease), cardiac (heart failure and chest pain) diseases, and chronic fatigue. Using two-sample MR, we leveraged genetic data from a large COVID-19 genome-wide association study and various disorder-specific datasets. Results This analysis revealed that a genetic predisposition to COVID-19 was significantly causally linked to an increased risk of developing pneumonia, airway infections, headache, and heart failure. It also showed a strong positive correlation with chronic fatigue, a frequently observed symptom in long COVID patients. However, our findings on Parkinson’s disease, depression, and chest pain were inconclusive. Conclusion Overall, these findings provide valuable insights into the genetic underpinnings of long COVID and its diverse range of symptoms. Understanding these causal associations may aid in better management and treatment of long COVID patients, thereby alleviating the substantial burden it poses on global health and socioeconomic systems.
Article
Full-text available
Background Long COVID is a debilitating chronic condition that has affected over 100 million people globally. It is characterized by a diverse array of symptoms, including fatigue, cognitive dysfunction and respiratory problems. Studies have so far largely failed to identify genetic associations, the mechanisms behind the disease, or any common pathophysiology with other conditions such as myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) that present with similar symptoms. Methods We used a combinatorial analysis approach to identify combinations of genetic variants significantly associated with the development of long COVID and to examine the biological mechanisms underpinning its various symptoms. We compared two subpopulations of long COVID patients from Sano Genetics’ Long COVID GOLD study cohort, focusing on patients with severe or fatigue dominant phenotypes. We evaluated the genetic signatures previously identified in an ME/CFS population against this long COVID population to understand similarities with other fatigue disorders that may be triggered by a prior viral infection. Finally, we also compared the output of this long COVID analysis against known genetic associations in other chronic diseases, including a range of metabolic and neurological disorders, to understand the overlap of pathophysiological mechanisms. Results Combinatorial analysis identified 73 genes that were highly associated with at least one of the long COVID populations included in this analysis. Of these, 9 genes have prior associations with acute COVID-19, and 14 were differentially expressed in a transcriptomic analysis of long COVID patients. A pathway enrichment analysis revealed that the biological pathways most significantly associated with the 73 long COVID genes were mainly aligned with neurological and cardiometabolic diseases. Expanded genotype analysis suggests that specific SNX9 genotypes are a significant contributor to the risk of or protection against severe long COVID infection, but that the gene-disease relationship is context dependent and mediated by interactions with KLF15 and RYR3. Comparison of the genes uniquely associated with the Severe and Fatigue Dominant long COVID patients revealed significant differences between the pathways enriched in each subgroup. The genes unique to Severe long COVID patients were associated with immune pathways such as myeloid differentiation and macrophage foam cells. Genes unique to the Fatigue Dominant subgroup were enriched in metabolic pathways such as MAPK/JNK signaling. We also identified overlap in the genes associated with Fatigue Dominant long COVID and ME/CFS, including several involved in circadian rhythm regulation and insulin regulation. Overall, 39 SNPs associated in this study with long COVID can be linked to 9 genes identified in a recent combinatorial analysis of ME/CFS patient from UK Biobank. Among the 73 genes associated with long COVID, 42 are potentially tractable for novel drug discovery approaches, with 13 of these already targeted by drugs in clinical development pipelines. From this analysis for example, we identified TLR4 antagonists as repurposing candidates with potential to protect against long term cognitive impairment pathology caused by SARS-CoV-2. We are currently evaluating the repurposing potential of these drug targets for use in treating long COVID and/or ME/CFS. Conclusion This study demonstrates the power of combinatorial analytics for stratifying heterogeneous populations in complex diseases that do not have simple monogenic etiologies. These results build upon the genetic findings from combinatorial analyses of severe acute COVID-19 patients and an ME/CFS population and we expect that access to additional independent, larger patient datasets will further improve the disease insights and validate potential treatment options in long COVID.
Article
Full-text available
Long COVID-19 is a recognized entity that affects millions of people worldwide. Its broad clinical symptoms include thrombotic events, brain fog, myocarditis, shortness of breath, fatigue, muscle pains, and others. Due to the binding of the virus with ACE-2 receptors, expressed in many organs, it can potentially affect any system; however, it most often affects the cardiovascular, central nervous, respiratory, and immune systems. Age, high body mass index, female sex, previous hospitalization, and smoking are some of its risk factors. Despite great efforts to define its pathophysiology, gaps remain to be explained. The main mechanisms described in the literature involve viral persistence, hypercoagulopathy, immune dysregulation, autoimmunity, hyperinflammation, or a combination of these. The exact mechanisms may differ from system to system, but some share the same pathways. This review aims to describe the most prevalent pathophysiological pathways explaining this syndrome.
Preprint
Full-text available
Importance There have been over 759 million confirmed cases of COVID-19 worldwide. A significant portion of these infections will lead to long COVID and its attendant morbidities and costs. Objective To empirically derive a long COVID case definition consisting of significantly increased signs, symptoms, and diagnoses to support clinical, public health, research, and policy initiatives related to the pandemic. Design Case-Crossover Population-based study. Setting Veterans Affairs (VA) medical centers across the United States between January 1, 2020 and August 18, 2022. Participants 367,148 individuals with positive COVID-19 tests and preexisting ICD-10-CM codes recorded in the VA electronic health record were enrolled. Trigger SARS-CoV-2 infection documented by positive laboratory test. Case Window One to seven months following positive COVID testing. Main Outcomes and Measures We defined signs, symptoms, and diagnoses as being associated with long COVID if they had a novel case frequency of >= 1:1000 and they were significantly increased in our entire cohort after a positive COVID test when compared to case frequencies before COVID testing. We present odds ratios with confidence intervals for long COVID signs, symptoms, and diagnoses, organized by ICD-10-CM functional groups and medical specialty. We used our definition to assess long COVID risk based upon a patient’s demographics, Elixhauser score, vaccination status, and COVID disease severity. Results We developed a long COVID definition consisting of 323 ICD-10-CM diagnosis codes grouped into 143 ICD-10-CM functional groups that were significantly increased in our 367,148 patient post-COVID population. We define seventeen medical-specialty long COVID subtypes such as cardiology long COVID. COVID-19 positive patients developed signs, symptoms, or diagnoses included in our long COVID definition at a proportion of at least 59.7% (based on all COVID positive patients). Patients with more severe cases of COVID-19 and multiple comorbidities were more likely to develop long COVID. Conclusions and Relevance An actionable, empirical definition for long COVID can help clinicians screen for and diagnose long COVID, allowing identified patients to be admitted into appropriate monitoring and treatment programs. An actionable long COVID definition can also support public health, research and policy initiatives. COVID patients with low oxygen saturation levels or multiple co-morbidities should be preferentially watched for the development of long COVID.
Article
Full-text available
Importance: Post-COVID-19 condition (PCC) is a complex heterogeneous disorder that has affected the lives of millions of people globally. Identification of potential risk factors to better understand who is at risk of developing PCC is important because it would allow for early and appropriate clinical support. Objective: To evaluate the demographic characteristics and comorbidities that have been found to be associated with an increased risk of developing PCC. Data sources: Medline and Embase databases were systematically searched from inception to December 5, 2022. Study selection: The meta-analysis included all published studies that investigated the risk factors and/or predictors of PCC in adult (≥18 years) patients. Data extraction and synthesis: Odds ratios (ORs) for each risk factor were pooled from the selected studies. For each potential risk factor, the random-effects model was used to compare the risk of developing PCC between individuals with and without the risk factor. Data analyses were performed from December 5, 2022, to February 10, 2023. Main outcomes and measures: The risk factors for PCC included patient age; sex; body mass index, calculated as weight in kilograms divided by height in meters squared; smoking status; comorbidities, including anxiety and/or depression, asthma, chronic kidney disease, chronic obstructive pulmonary disease, diabetes, immunosuppression, and ischemic heart disease; previous hospitalization or ICU (intensive care unit) admission with COVID-19; and previous vaccination against COVID-19. Results: The initial search yielded 5334 records of which 255 articles underwent full-text evaluation, which identified 41 articles and a total of 860 783 patients that were included. The findings of the meta-analysis showed that female sex (OR, 1.56; 95% CI, 1.41-1.73), age (OR, 1.21; 95% CI, 1.11-1.33), high BMI (OR, 1.15; 95% CI, 1.08-1.23), and smoking (OR, 1.10; 95% CI, 1.07-1.13) were associated with an increased risk of developing PCC. In addition, the presence of comorbidities and previous hospitalization or ICU admission were found to be associated with high risk of PCC (OR, 2.48; 95% CI, 1.97-3.13 and OR, 2.37; 95% CI, 2.18-2.56, respectively). Patients who had been vaccinated against COVID-19 with 2 doses had a significantly lower risk of developing PCC compared with patients who were not vaccinated (OR, 0.57; 95% CI, 0.43-0.76). Conclusions and relevance: This systematic review and meta-analysis demonstrated that certain demographic characteristics (eg, age and sex), comorbidities, and severe COVID-19 were associated with an increased risk of PCC, whereas vaccination had a protective role against developing PCC sequelae. These findings may enable a better understanding of who may develop PCC and provide additional evidence for the benefits of vaccination. Trial registration: PROSPERO Identifier: CRD42022381002.
Article
Full-text available
Objectives To determine the clinical sequelae of long covid for a year after infection in patients with mild disease and to evaluate its association with age, sex, SARS-CoV-2 variants, and vaccination status. Design Retrospective nationwide cohort study. Setting Electronic medical records from an Israeli nationwide healthcare organisation. Population 1 913 234 Maccabi Healthcare Services members of all ages who did a polymerase chain reaction test for SARS-CoV-2 between 1 March 2020 and 1 October 2021. Main outcome measures Risk of an evidence based list of 70 reported long covid outcomes in unvaccinated patients infected with SARS-CoV-2 matched to uninfected people, adjusted for age and sex and stratified by SARS-CoV-2 variants, and risk in patients with a breakthrough SARS-CoV-2 infection compared with unvaccinated infected controls. Risks were compared using hazard ratios and risk differences per 10 000 patients measured during the early (30-180 days) and late (180-360 days) time periods after infection. Results Covid-19 infection was significantly associated with increased risks in early and late periods for anosmia and dysgeusia (hazard ratio 4.59 (95% confidence interval 3.63 to 5.80), risk difference 19.6 (95% confidence interval 16.9 to 22.4) in early period; 2.96 (2.29 to 3.82), 11.0 (8.5 to 13.6) in late period), cognitive impairment (1.85 (1.58 to 2.17), 12.8, (9.6 to 16.1); 1.69 (1.45 to 1.96), 13.3 (9.4 to 17.3)), dyspnoea (1.79 (1.68 to 1.90), 85.7 (76.9 to 94.5); 1.30 (1.22 to 1.38), 35.4 (26.3 to 44.6)), weakness (1.78 (1.69 to 1.88), 108.5, 98.4 to 118.6; 1.30 (1.22 to 1.37), 50.2 (39.4 to 61.1)), and palpitations (1.49 (1.35 to 1.64), 22.1 (16.8 to 27.4); 1.16 (1.05 to 1.27), 8.3 (2.4 to 14.1)) and with significant but lower excess risk for streptococcal tonsillitis and dizziness. Hair loss, chest pain, cough, myalgia, and respiratory disorders were significantly increased only during the early phase. Male and female patients showed minor differences, and children had fewer outcomes than adults during the early phase of covid-19, which mostly resolved in the late period. Findings remained consistent across SARS-CoV-2 variants. Vaccinated patients with a breakthrough SARS-CoV-2 infection had a lower risk for dyspnoea and similar risk for other outcomes compared with unvaccinated infected patients. Conclusions This nationwide study suggests that patients with mild covid-19 are at risk for a small number of health outcomes, most of which are resolved within a year from diagnosis.
Article
Full-text available
Importance: Some individuals experience persistent symptoms after initial symptomatic SARS-CoV-2 infection (often referred to as Long COVID). Objective: To estimate the proportion of males and females with COVID-19, younger or older than 20 years of age, who had Long COVID symptoms in 2020 and 2021 and their Long COVID symptom duration. Design, setting, and participants: Bayesian meta-regression and pooling of 54 studies and 2 medical record databases with data for 1.2 million individuals (from 22 countries) who had symptomatic SARS-CoV-2 infection. Of the 54 studies, 44 were published and 10 were collaborating cohorts (conducted in Austria, the Faroe Islands, Germany, Iran, Italy, the Netherlands, Russia, Sweden, Switzerland, and the US). The participant data were derived from the 44 published studies (10 501 hospitalized individuals and 42 891 nonhospitalized individuals), the 10 collaborating cohort studies (10 526 and 1906), and the 2 US electronic medical record databases (250 928 and 846 046). Data collection spanned March 2020 to January 2022. Exposures: Symptomatic SARS-CoV-2 infection. Main outcomes and measures: Proportion of individuals with at least 1 of the 3 self-reported Long COVID symptom clusters (persistent fatigue with bodily pain or mood swings; cognitive problems; or ongoing respiratory problems) 3 months after SARS-CoV-2 infection in 2020 and 2021, estimated separately for hospitalized and nonhospitalized individuals aged 20 years or older by sex and for both sexes of nonhospitalized individuals younger than 20 years of age. Results: A total of 1.2 million individuals who had symptomatic SARS-CoV-2 infection were included (mean age, 4-66 years; males, 26%-88%). In the modeled estimates, 6.2% (95% uncertainty interval [UI], 2.4%-13.3%) of individuals who had symptomatic SARS-CoV-2 infection experienced at least 1 of the 3 Long COVID symptom clusters in 2020 and 2021, including 3.2% (95% UI, 0.6%-10.0%) for persistent fatigue with bodily pain or mood swings, 3.7% (95% UI, 0.9%-9.6%) for ongoing respiratory problems, and 2.2% (95% UI, 0.3%-7.6%) for cognitive problems after adjusting for health status before COVID-19, comprising an estimated 51.0% (95% UI, 16.9%-92.4%), 60.4% (95% UI, 18.9%-89.1%), and 35.4% (95% UI, 9.4%-75.1%), respectively, of Long COVID cases. The Long COVID symptom clusters were more common in women aged 20 years or older (10.6% [95% UI, 4.3%-22.2%]) 3 months after symptomatic SARS-CoV-2 infection than in men aged 20 years or older (5.4% [95% UI, 2.2%-11.7%]). Both sexes younger than 20 years of age were estimated to be affected in 2.8% (95% UI, 0.9%-7.0%) of symptomatic SARS-CoV-2 infections. The estimated mean Long COVID symptom cluster duration was 9.0 months (95% UI, 7.0-12.0 months) among hospitalized individuals and 4.0 months (95% UI, 3.6-4.6 months) among nonhospitalized individuals. Among individuals with Long COVID symptoms 3 months after symptomatic SARS-CoV-2 infection, an estimated 15.1% (95% UI, 10.3%-21.1%) continued to experience symptoms at 12 months. Conclusions and relevance: This study presents modeled estimates of the proportion of individuals with at least 1 of 3 self-reported Long COVID symptom clusters (persistent fatigue with bodily pain or mood swings; cognitive problems; or ongoing respiratory problems) 3 months after symptomatic SARS-CoV-2 infection.
Article
Full-text available
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) infection is associated with a range of persistent symptoms impacting everyday functioning, known as post-COVID-19 condition or long COVID. We undertook a retrospective matched cohort study using a UK-based primary care database, Clinical Practice Research Datalink Aurum, to determine symptoms that are associated with confirmed SARS-CoV-2 infection beyond 12 weeks in non-hospitalized adults and the risk factors associated with developing persistent symptoms. We selected 486,149 adults with confirmed SARS-CoV-2 infection and 1,944,580 propensity score-matched adults with no recorded evidence of SARS-CoV-2 infection. Outcomes included 115 individual symptoms, as well as long COVID, defined as a composite outcome of 33 symptoms by the World Health Organization clinical case definition. Cox proportional hazards models were used to estimate adjusted hazard ratios (aHRs) for the outcomes. A total of 62 symptoms were significantly associated with SARS-CoV-2 infection after 12 weeks. The largest aHRs were for anosmia (aHR 6.49, 95% CI 5.02–8.39), hair loss (3.99, 3.63–4.39), sneezing (2.77, 1.40–5.50), ejaculation difficulty (2.63, 1.61–4.28) and reduced libido (2.36, 1.61–3.47). Among the cohort of patients infected with SARS-CoV-2, risk factors for long COVID included female sex, belonging to an ethnic minority, socioeconomic deprivation, smoking, obesity and a wide range of comorbidities. The risk of developing long COVID was also found to be increased along a gradient of decreasing age. SARS-CoV-2 infection is associated with a plethora of symptoms that are associated with a range of sociodemographic and clinical risk factors. A retrospective analysis of primary care records in the United Kingdom reveals individual symptoms associated with SARS-CoV-2 infections, which persisted for 12 weeks or more after infection, as well as risk factors associated with developing long COVID.
Article
Full-text available
Airway fibrosis (AF) is a common disease that can severely affect patient prognosis. Epithelial‑mesenchymal transition (EMT) participates in the pathophysiological development of AF and several studies have demonstrated that some microRNAs (miRNAs) contribute to the development of EMT. The aim of this study was to investigate the function of miR‑423‑5p in the EMT process and its possible underlying mechanism in BEAS‑2B cells. The present study utilized the BEAS‑2B cell line to model EMT in AF. Online tools, fluorescence in situ hybridization analysis and an RNA pull‑down assay were used to identify potential target genes of miR‑423‑5p. In addition, immunohistochemistry, wound healing assays, Transwell migration assays, flow cytometry, enzyme‑linked immunosorbent assay, reverse transcription‑quantitative PCR, western blot analysis and immunofluorescence staining were used to determine the function of miR‑423‑5p and its target gene in the EMT process in AF. The results indicated that the miR‑423‑5p expression in AF tissues and BEAS‑2B cells stimulated with 10 ng/ml TGF‑β1 for 24 h was significantly increased compared with that in the control group. Overexpression of miR‑423‑5p facilitated TGF‑β1‑induced EMT in BEAS‑2B cells; by contrast, downregulation of miR‑423‑5p suppressed TGF‑β1‑induced EMT in BEAS‑2B cells. Furthermore, forkhead box p4 (FOXP4) was identified as a potential target gene of miR‑423‑5p and changes in the miR‑423‑5p and FOXP4 expression were shown to significantly affect the expression of PI3K/AKT/mTOR pathway members. In summary, overexpression of miR‑423‑5P promoted the EMT process in AF by downregulating FOXP4 expression and the underlying mechanism may partly involve activation of the PI3K/AKT/mTOR pathway.
Article
Full-text available
Background: After almost 2 years of fighting against SARS-CoV-2 pandemic, the number of patients enduring persistent symptoms long after acute infection is a matter of concern. This set of symptoms was referred to as "long COVID", and it was defined more recently as "Post COVID-19 condition" by the World health Organization (WHO). Although studies have revealed that long COVID can manifest whatever the severity of inaugural illness, the underlying pathophysiology is still enigmatic. Aim: To conduct a comprehensive review to address the putative pathophysiology underlying the persisting symptoms of long COVID. Method: We searched 11 bibliographic databases (Cochrane Library, JBI EBP Database, Medline, Embase, PsycInfo, CINHAL, Ovid Nursing Database, Journals@Ovid, SciLit, EuropePMC, and CoronaCentral). We selected studies that put forward hypotheses on the pathophysiology, as well as those that encompassed long COVID patients in their research investigation. Results: A total of 98 articles were included in the systematic review, 54 of which exclusively addressed hypotheses on pathophysiology, while 44 involved COVID patients. Studies that included patients displayed heterogeneity with respect to the severity of initial illness, timing of analysis, or presence of a control group. Although long COVID likely results from long-term organ damage due to acute-phase infection, specific mechanisms following the initial illness could contribute to the later symptoms possibly affecting many organs. As such, autonomic nervous system damage could account for many symptoms without clear evidence of organ damage. Immune dysregulation, auto-immunity, endothelial dysfunction, occult viral persistence, as well as coagulation activation are the main underlying pathophysiological mechanisms so far. Conclusion: Evidence on why persistent symptoms occur is still limited, and available studies are heterogeneous. Apart from long-term organ damage, many hints suggest that specific mechanisms following acute illness could be involved in long COVID symptoms. KEY MESSAGESLong-COVID is a multisystem disease that develops regardless of the initial disease severity. Its clinical spectrum comprises a wide range of symptoms.The mechanisms underlying its pathophysiology are still unclear. Although organ damage from the acute infection phase likely accounts for symptoms, specific long-lasting inflammatory mechanisms have been proposed, as well.Existing studies involving Long-COVID patients are highly heterogeneous, as they include patients with various COVID-19 severity levels and different time frame analysis, as well.
Article
Full-text available
Molecular characterization of cell types using single-cell transcriptome sequencing is revolutionizing cell biology and enabling new insights into the physiology of human organs. We created a human reference atlas comprising nearly 500,000 cells from 24 different tissues and organs, many from the same donor. This atlas enabled molecular characterization of more than 400 cell types, their distribution across tissues, and tissue-specific variation in gene expression. Using multiple tissues from a single donor enabled identification of the clonal distribution of T cells between tissues, identification of the tissue-specific mutation rate in B cells, and analysis of the cell cycle state and proliferative potential of shared cell types across tissues. Cell type–specific RNA splicing was discovered and analyzed across tissues within an individual.
Article
Long COVID is an often debilitating illness that occurs in at least 10% of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections. More than 200 symptoms have been identified with impacts on multiple organ systems. At least 65 million individuals worldwide are estimated to have long COVID, with cases increasing daily. Biomedical research has made substantial progress in identifying various pathophysiological changes and risk factors and in characterizing the illness; further, similarities with other viral-onset illnesses such as myalgic encephalomyelitis/chronic fatigue syndrome and postural orthostatic tachycardia syndrome have laid the groundwork for research in the field. In this Review, we explore the current literature and highlight key findings, the overlap with other conditions, the variable onset of symptoms, long COVID in children and the impact of vaccinations. Although these key findings are critical to understanding long COVID, current diagnostic and treatment options are insufficient, and clinical trials must be prioritized that address leading hypotheses. Additionally, to strengthen long COVID research, future studies must account for biases and SARS-CoV-2 testing issues, build on viral-onset research, be inclusive of marginalized populations and meaningfully engage patients throughout the research process. Long COVID is an often debilitating illness of severe symptoms that can develop during or following COVID-19. In this Review, Davis, McCorkell, Vogel and Topol explore our knowledge of long COVID and highlight key findings, including potential mechanisms, the overlap with other conditions and potential treatments. They also discuss challenges and recommendations for long COVID research and care.