Evgeny Karpulevich’s research while affiliated with Russian Academy of Sciences and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (23)


Structure of the wild-type ALK gene and the corresponding protein. MAM—methylthioalkymalate synthase-like domain; LDLa—low-density lipoprotein receptor class A; EGF-like—epidermal growth factor-like domain; TM—transmembrane domain; and TK—tyrosine kinase domain.
Schematic representation of the formation and functional role of an ALK fusion. FAM150—ALK ligand Augmentor α (FAM150A) or Augmentor β (FAM150B); DD—dimerization domain; and TK—tyrosine kinase domain.
The ALK coverage plots are based on RNAseq data and normalized to the exon length and total number of reads in the sample. The coverage of the ALK sense reads is shown on a positive scale, while the ALK antisense reads are shown on a negative scale. (a) The ALK_9 sample showed pronounced coverage asymmetry and overexpression of exons 20–29. (b) The NS_20 sample showed uniformly high coverage of all exons. TK—tyrosine kinase domain-related exons; non-TK—exons not related to the tyrosine kinase domain; and non-TK/TK coverage—ratios of the mean coverage of five non-TK exons (exons 2–6) and five TK exons (exons 20–24).
ALK immunostaining using clone D5F3 (Ventana) for sample LuC_103.
ALK coverage plots based on targeted RNAseq data, obtained with the TruSight panel, normalized to exon length and the total number of reads in the sample. Coverage of ALK sense reads is shown on a positive scale, while ALK antisense reads are shown on a negative scale. (a) The ALK_10 sample demonstrates clear ALK coverage asymmetry and a high number of fusion-supporting reads. (b) The NS_20 sample shows the detectable expression of all ALK exons. (c–f) ALK_4, ALK_5, ALK_12, and ALK_16 samples, respectively, with ALK coverage asymmetry but very few (or no) fusion-supporting reads. TK—tyrosine kinase domain-related exons; non-TK—exons not related to the kinase domain; and non-TK and TK coverage—mean coverage of five non-TK exons (exons 2–6) and of five TK exons (exons 20–24), respectively.

+1

A New Approach of Detecting ALK Fusion Oncogenes by RNA Sequencing Exon Coverage Analysis
  • Article
  • Full-text available

November 2024

·

18 Reads

Galina Zakharova

·

·

·

[...]

·

Anton Buzdin

Background: In clinical practice, various methods are used to identify ALK gene rearrangements in tumor samples, ranging from “classic” techniques, such as IHC, FISH, and RT-qPCR, to more advanced highly multiplexed approaches, such as NanoString technology and NGS panels. Each of these methods has its own advantages and disadvantages, but they share the drawback of detecting only a restricted (although sometimes quite extensive) set of preselected biomarkers. At the same time, whole transcriptome sequencing (WTS, RNAseq) can, in principle, be used to detect gene fusions while simultaneously analyzing an incomparably wide range of tumor characteristics. However, WTS is not widely used in practice due to purely analytical limitations and the high complexity of bioinformatic analysis, which requires considerable expertise. In particular, methods to detect gene fusions in RNAseq data rely on the identification of chimeric reads. However, the typically low number of true fusion reads in RNAseq limits its sensitivity. In a previous study, we observed asymmetry in the RNAseq exon coverage of the 3′ partners of some fusion transcripts. In this study, we conducted a comprehensive evaluation of the accuracy of ALK fusion detection through an analysis of differences in the coverage of its tyrosine kinase exons. Methods: A total of 906 human cancer biosamples were subjected to analysis using experimental RNAseq data, with the objective of determining the extent of asymmetry in ALK coverage. A total of 50 samples were analyzed, comprising 13 samples with predicted ALK fusions and 37 samples without predicted ALK fusions. These samples were assessed by targeted sequencing with two NGS panels that were specifically designed to detect fusion transcripts (the TruSight RNA Fusion Panel and the OncoFu Elite panel). Results: ALK fusions were confirmed in 11 out of the 13 predicted cases, with an overall accuracy of 96% (sensitivity 100%, specificity 94.9%). Two discordant cases exhibited low ALK coverage depth, which could be addressed algorithmically to enhance the accuracy of the results. It was also important to consider read strand specificity due to the presence of antisense transcripts involving parts of ALK. In a limited patient sample undergoing ALK-targeted therapy, the algorithm successfully predicted treatment efficacy. Conclusions: RNAseq exon coverage analysis can effectively detect ALK rearrangements.

Download

MamT4^4: Multi-view Attention Networks for Mammography Cancer Classification

November 2024

·

14 Reads

In this study, we introduce a novel method, called MamT4^4, which is used for simultaneous analysis of four mammography images. A decision is made based on one image of a breast, with attention also devoted to three additional images: another view of the same breast and two images of the other breast. This approach enables the algorithm to closely replicate the practice of a radiologist who reviews the entire set of mammograms for a patient. Furthermore, this paper emphasizes the preprocessing of images, specifically proposing a cropping model (U-Net based on ResNet-34) to help the method remove image artifacts and focus on the breast region. To the best of our knowledge, this study is the first to achieve a ROC-AUC of 84.0 ±\pm 1.7 and an F1 score of 56.0 ±\pm 1.3 on an independent test dataset of Vietnam digital mammography (VinDr-Mammo), which is preprocessed with the cropping model.


General pipeline of the study.
Principal component analysis of the groups from different regions of the world. PCA was conducted on whole-genome data consisting of 8,137,497 SNPs from groups representing different regions of the world (A), and on whole-genome data comprising 6,996,820 SNPs from European ancestor populations and the Utah population (B).
Heatmap showing intersecting genes among revealed pathways. The heatmap was created by Heatmap (v. 3.6.2.) (https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/heatmap).
Europeans and Americans of European origin show differences between their biological pathways related to the major histocompatibility complex

September 2024

·

21 Reads

·

1 Citation

In this study, we analysed biological pathway diversity among Europeans and Northern Americans of European origin, the groups of people that share a common genetic ancestry but live in different geographic regions. We used a novel complex approach for analysing genomic data: we studied the total effects of multiple weak selection signals, accumulated from independent SNPs within a pathway. We found significant differences between immunity-related biological pathways from the two groups. All identified pathways included genes belonging to the major histocompatibility complex (MHC) system, which plays an important role in adaptive immune responses. We suggest that the ways of evolution were different for the MHC-I and MHC-II gene groups at least in Europeans and Americans of European origin. We hypothesise that the observed variability between the two populations was triggered by selection pressures due to the different pathogen landscapes and pathogen loads on the two continents. Our findings can be important for epidemic prevention and control, as well as for analysing processes related to allergies, organ transplantation, and autoimmune diseases.



Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index

July 2024

·

81 Reads

·

1 Citation

BMC Bioinformatics

Motivation Alignment of reads to a reference genome sequence is one of the key steps in the analysis of human whole-genome sequencing data obtained through Next-generation sequencing (NGS) technologies. The quality of the subsequent steps of the analysis, such as the results of clinical interpretation of genetic variants or the results of a genome-wide association study, depends on the correct identification of the position of the read as a result of its alignment. The amount of human NGS whole-genome sequencing data is constantly growing. There are a number of human genome sequencing projects worldwide that have resulted in the creation of large-scale databases of genetic variants of sequenced human genomes. Such information about known genetic variants can be used to improve the quality of alignment at the read alignment stage when analysing sequencing data obtained for a new individual, for example, by creating a genomic graph. While existing methods for aligning reads to a linear reference genome have high alignment speed, methods for aligning reads to a genomic graph have greater accuracy in variable regions of the genome. The development of a read alignment method that takes into account known genetic variants in the linear reference sequence index allows combining the advantages of both sets of methods. Results In this paper, we present the minimap2_index_modifier tool, which enables the construction of a modified index of a reference genome using known single nucleotide variants and insertions/deletions (indels) specific to a given human population. The use of the modified minimap2 index improves variant calling quality without modifying the bioinformatics pipeline and without significant additional computational overhead. Using the PrecisionFDA Truth Challenge V2 benchmark data (for HG002 short-read data aligned to the GRCh38 linear reference (GCA_000001405.15) with parameters k = 27 and w = 14) it was demonstrated that the number of false negative genetic variants decreased by more than 9500, and the number of false positives decreased by more than 7000 when modifying the index with genetic variants from the Human Pangenome Reference Consortium.



Figure 2
Figure 3
Figure 4
Figure 5
Figure 8
Enhancing SNV identification in whole-genome sequencing data through the incorporation of known population genetic variants into the minimap2 index

February 2024

·

27 Reads

Motivation Alignment of reads to a reference genome sequence is one of the key steps in the analysis of human NGS whole-genome sequencing data. The quality of the subsequent steps of the analysis, such as the results of clinical interpretation of genetic variants or the results of a genome-wide association study (GWAS), depends on the correct identification of the position of the read as a result of its alignment. The amount of human NGS whole-genome sequencing data is constantly growing. There are a number of human genome sequencing projects worldwide that have resulted in the creation of large-scale databases of genetic variants of sequenced human genomes. Such information about known genetic variants can be used to improve the quality of alignment at the read alignment stage when analysing sequencing data obtained for a new individual, for example, by creating a genomic graph. While existing methods for aligning reads to a linear reference genome have high alignment speed, methods for aligning reads to a genomic graph have greater accuracy in variable regions of the genome. The development of a read alignment method that takes into account known genetic variants in the linear reference sequence index allows combining the advantages of both sets of methods. Results In this paper we present the minimap2_index_modifier tool, which allows the construction of a modified index of a reference genome using known SNVs and indels of a given human population. The use of the modified minimap2 index improves variant calling quality without modifying the bioinformatics pipeline and without significant additional computational overhead. Using the FDA Precision Truth Challenge V2 benchmark data (for HG002 short-read data aligned to the GRCh38 linear reference (GCA_000001405.15) with parameters k = 27 and w = 14) it was demonstrated that the number of false negative genetic variants decreased by more than 2000, and the number of false positives decreased by more than 200.



EndoNet: A Model for the Automatic Calculation of H-Score on Histological Slides

December 2023

·

97 Reads

·

4 Citations

Informatics

H-score is a semi-quantitative method used to assess the presence and distribution of proteins in tissue samples by combining the intensity of staining and the percentage of stained nuclei. It is widely used but time-consuming and can be limited in terms of accuracy and precision. Computer-aided methods may help overcome these limitations and improve the efficiency of pathologists’ workflows. In this work, we developed a model EndoNet for automatic H-score calculation on histological slides. Our proposed method uses neural networks and consists of two main parts. The first is a detection model which predicts the keypoints of centers of nuclei. The second is an H-score module that calculates the value of the H-score using mean pixel values of predicted keypoints. Our model was trained and validated on 1780 annotated tiles with a shape of 100 × 100 µm and we achieved 0.77 mAP on a test dataset. We obtained our best results in H-score calculation; these results proved superior to QuPath predictions. Moreover, the model can be adjusted to a specific specialist or whole laboratory to reproduce the manner of calculating the H-score. Thus, EndoNet is effective and robust in the analysis of histology slides, which can improve and significantly accelerate the work of pathologists.


Figure 2: Architecture of EndoNet model. Tiles go through Image-to-Image model to be converted into heatmaps, Keypoint Extractor gets coordinates and classes of the centers of nuclei and passes them to H-score Model to calculate H-score in stroma and epithelium.
Figure 3: General pipeline of pre-training process.
Figure 4: Pre-training process with SimCLR[26]. Here t ∈ τ and t ′ ∈ τ are two augmentations taken from the same family of augmentations. f (·) is a base encoding network and g(·) is a projection head that maps hidden representation to another space, where contrastive loss is applied. X is initial image, x i and x j are augmented images, h i and h j are hidden representations of corresponding augmented images, and z i and z j are output of decoding network. Optimization task here is to maximize agreement between z j and z i .
Figure 5: Distributions of pixels of a) the whole tile, b) blue and brown nuclei, stained in their colors, c) blue and brown nuclei, where red are brown nuclei, blue are blue nuclei.
Figure 8: H-scores by pathologists and EndoNet model for 6 slides in a) stroma and b) epithelium.
EndoNet: model for automatic calculation of H-score on histological slides

August 2023

·

47 Reads

H-score is a semi-quantitative method used to assess the presence and distribution of proteins in tissue samples by combining the intensity of staining and percentage of stained nuclei. It is widely used but time-consuming and can be limited in accuracy and precision. Computer-aided methods may help overcome these limitations and improve the efficiency of pathologists' workflows. In this work, we developed a model EndoNet for automatic calculation of H-score on histological slides. Our proposed method uses neural networks and consists of two main parts. The first is a detection model which predicts keypoints of centers of nuclei. The second is a H-score module which calculates the value of the H-score using mean pixel values of predicted keypoints. Our model was trained and validated on 1780 annotated tiles with a shape of 100x100 μm\mu m and performed 0.77 mAP on a test dataset. Moreover, the model can be adjusted to a specific specialist or whole laboratory to reproduce the manner of calculating the H-score. Thus, EndoNet is effective and robust in the analysis of histology slides, which can improve and significantly accelerate the work of pathologists.


Citations (8)


... Furthermore, as the study population is mainly European, the findings may not generalize to other populations. Some studies have identified genetic differences between populations in different regions, which may influence the associations between diseases [53,54]. Future research should aim to replicate this study in ethnically and geographically diverse populations to enhance the validity and applicability of the results globally. ...

Reference:

Exploring the Genetic Relationship Between Type 2 Diabetes and Cardiovascular Disease: A Large-Scale Genetic Association and Polygenic Risk Score Study
Europeans and Americans of European origin show differences between their biological pathways related to the major histocompatibility complex

... Ushakov et al. [12] presented EndoNet, a model that automates H-score. H-score is a semi-quantitative method for assessing protein presence in tissue samples, but it can be timeconsuming and limited in accuracy. ...

EndoNet: A Model for the Automatic Calculation of H-Score on Histological Slides

Informatics

... The architecture efficiently maps low-resolution encoder features to high-resolution inputs through a decoder that uses pooling indices from the encoder for precise pixelwise classification. This setup enables U-Net to accurately delineate detailed features in medical images, crucial for identifying and segmenting various anatomical structures and abnormalities [11]. Its ability to handle small datasets effectively and its adaptability to various medical imaging modalities have made U-Net a standard choice in medical image analysis, enhancing diagnostic accuracy and aiding in clinical decision-making. ...

Deep Semantic Segmentation of Angiogenesis Images

International Journal of Molecular Sciences

... While existing point annotation-based approaches have made commendable progress in improving annotation efficiency by significantly reducing manual effort, they remain insufficient for adapting to the unique characteristics of histopathological images. First, histopathological images exhibit a high degree of visual complexity, with cells varying in size, density, and morphological characteristics [17]. This complexity necessitates the analysis of features at different receptive fields and the integration of contextual information from multiple scales. ...

EndoNuke: Nuclei Detection Dataset for Estrogen and Progesterone Stained IHC Endometrium Scans

Data

... 9 More recent alternative, patient-derived tumor organoids (PDTOs) and spheroids, have emerged as a robust and reliable in vitro model for precision medicine. 10 Increasing evidence confirms the phenotypic and genotypic correspondence between original tumor tissues and PDTOs across various cancers. [11][12][13][14][15] Some clinical studies have demonstrated the high prognostic value of PDTOs for evaluation of patient responses to therapies. ...

Biomedical Applications of Non-Small Cell Lung Cancer Spheroids

... proprietary algorithms due to access limitations. [44] CNN -ResNet1D50 ...

Assessment of the impact of non-architectural changes in the predictive model on the quality of ECG classification

Proceedings of the Institute for System Programming of RAS

... Literature data concerning this issue are quite controversial. Increasing the expression of the CD86 is most often mentioned in connection with monocyte differentiating into macrophages, which is most characteristic for dominate classical subset [44]. Moreover, Borst et al., 2018 revealed the protective and immunomodu lating role of stimulating type I IFN signaling in peripheral blood monocytes. ...

The response of two polar monocyte subsets to inflammation

Biomedicine & Pharmacotherapy

... For example, we have shown a greater sensitivity to activation factors of monocyte-derived macrophages compared with Kupffer cells. 13,19 This difference in sensitivity persisted despite the preliminary long-term maintenance of macrophages in culture. It was also found that upon activation, Kupffer cells demonstrated a faster increase in the expression of anti-inflammatory cytokine genes compared with monocyte-derived macrophages. ...

Comparative Analysis of the Transcriptome, Proteome, and miRNA Profile of Kupffer Cells and Monocytes