Alexey Kamelin’s research while affiliated with National Research University Higher School of Economics and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (3)


Graphs of AUC-ROC values measured for different machine learning methods. Each value at the X axis corresponds to a phenotype with a certain contribution of epistasis. For each AUC-ROC value, the boundaries of the 95% confidence interval are indicated. Each graph corresponds to a different dataset composition and feature-to-instance ratio. In all cases, 3-loci epistasis model with heritability of 0.25 was used.
Distributions of metrics measured for different machine learning models. Three theoretical forms of epistasis and their corresponding datasets were generated using GAMETES (LR, Lasso regression; GB, Gradient Boosting).
Deep learning captures the effect of epistasis in multifactorial diseases
  • Article
  • Full-text available

January 2025

·

13 Reads

Vladislav Perelygin

·

Alexey Kamelin

·

·

[...]

·

Maria Poptsova

Background Polygenic risk score (PRS) prediction is widely used to assess the risk of diagnosis and progression of many diseases. Routinely, the weights of individual SNPs are estimated by the linear regression model that assumes independent and linear contribution of each SNP to the phenotype. However, for complex multifactorial diseases such as Alzheimer’s disease, diabetes, cardiovascular disease, cancer, and others, association between individual SNPs and disease could be non-linear due to epistatic interactions. The aim of the presented study is to explore the power of non-linear machine learning algorithms and deep learning models to predict the risk of multifactorial diseases with epistasis. Methods Simulated data with 2- and 3-loci interactions and tested three different models of epistasis: additive, multiplicative and threshold, were generated using the GAMETES. Penetrance tables were generated using PyTOXO package. For machine learning methods we used multilayer perceptron (MLP), convolutional neural network (CNN) and recurrent neural network (RNN), Lasso regression, random forest and gradient boosting models. Performance of machine learning models were assessed using accuracy, AUC-ROC, AUC-PR, recall, precision, and F1 score. Results First, we tested ensemble tree methods and deep learning neural networks against LASSO linear regression model on simulated data with different types and strength of epistasis. The results showed that with the increase of strength of epistasis effect, non-linear models significantly outperform linear. Then the higher performance of non-linear models over linear was confirmed on real genetic data for multifactorial phenotypes such as obesity, type 1 diabetes, and psoriasis. From non-linear models, gradient boosting appeared to be the best model in obesity and psoriasis while deep learning methods significantly outperform linear approaches in type 1 diabetes. Conclusion Overall, our study underscores the efficacy of non-linear models and deep learning approaches in more accurately accounting for the effects of epistasis in simulations with specific configurations and in the context of certain diseases.

Download

Manhattan plot with the GWAS results for the severe COVID-19 phenotype (P-value < 5 × 10⁻⁸).
Scatter plot of comparison of odds ratios (OR) from the top associated SNPs from the COVID-19 Host Genetics Initiative with our results (OR_hgi indicates results from COVID-19 HGI GWAS and OR_gwas indicates results from the current study). Horizontal and vertical bars represent 95% confidence intervals.
Quantile plot of the severe course of COVID-19 PRS developed by LDPred2. The odds ratio represents comparison of PRS odds from different quantiles with the reference quantile (40%−60%). The bars represent the standard deviation (SD).
GWAS and polygenic risk score of severe COVID-19 in Eastern Europe

September 2024

·

45 Reads

·

2 Citations

Background COVID-19 disease has infected more than 772 million people, leading to 7 million deaths. Although the severe course of COVID-19 can be prevented using appropriate treatments, effective interventions require a thorough research of the genetic factors involved in its pathogenesis. Methods We conducted a genome-wide association study (GWAS) on 7,124 individuals (comprising 6,400 controls who had mild to moderate COVID-19 and 724 cases with severe COVID-19). The inclusion criteria were acute respiratory distress syndrome (ARDS), acute respiratory failure (ARF) requiring respiratory support, or CT scans indicative of severe COVID-19 infection without any competing diseases. We also developed a polygenic risk score (PRS) model to identify individuals at high risk. Results We identified two genome-wide significant loci (P-value <5 × 10⁻⁸) and one locus with approximately genome-wide significance (P-value = 5.92 × 10⁻⁸-6.15 × 10⁻⁸). The most genome-wide significant variants were located in the leucine zipper transcription factor like 1 (LZTFL1) gene, which has been highlighted in several previous GWAS studies. Our PRS model results indicated that individuals in the top 10% group of the PRS had twice the risk of severe course of the disease compared to those at median risk [odds ratio = 2.18 (1.66, 2.86), P-value = 8.9 × 10⁻⁹]. Conclusion We conducted one of the largest studies to date on the genetics of severe COVID-19 in an Eastern European cohort. Our results are consistent with previous research and will guide further epidemiologic studies on host genetics, as well as for the development of targeted treatments.


Figure 2
Deep Learning captures the effect of epistasis in multifactorial diseases

March 2024

·

106 Reads

Background Polygenic risk score (PRS) prediction is widely used to assess the risk of diagnosis and progression of many diseases. Routinely, the weights of individual SNPs are estimated by the linear regression model that assumes independent and linear contribution of each SNP to the phenotype. However, for complex multifactorial diseases such as Alzheimer's disease, diabetes, cardiovascular disease, cancer, and others, association between individual SNPs and disease could be non-linear due to epistatic interactions. The aim of the presented study is to explore the power of non-linear machine learning algorithms and deep learning models to predict the risk of multifactorial diseases with epistasis. Results First, we tested ensemble tree methods and deep learning neural networks against LASSO linear regression model on simulated data with different types and strength of epistasis. The results showed that with the increase of strength of epistasis effect, non-linear models significantly outperform linear. Then the higher performance of non-linear models over linear was confirmed on real genetic data for multifactorial phenotypes such as obesity, type 1 diabetes, and psoriasis. From non-linear models, gradient boosting appeared to be the best model in obesity and psoriasis while deep learning methods significantly outperform linear approaches in type 1 diabetes. Conclusions Overall, our study underscores the efficacy of non-linear models and deep learning approaches in more accurately accounting for the effects of epistasis in simulations with specific configurations and in the context of certain diseases.

Citations (1)


... The LZTFL1 gene participates in several biological processes, such as cell differentiation and the control of immunological responses; it can be found in the epithelium of the normal lung and is involved in the transport of proteins to the cilia of the respiratory epithelial cells [156]. The SNP rs10490770 (chr3:45823240, T>C) in the LZTFL1 gene has been recognized as having a direct causative link with the severity of both COVID-19 and CHD [145]. ...

Reference:

Genetic and Epigenetic Intersections in COVID-19-Associated Cardiovascular Disease: Emerging Insights and Future Directions
GWAS and polygenic risk score of severe COVID-19 in Eastern Europe