Gaurav Pandey

Gaurav Pandey
  • Student at The NorthCap University

About

122
Publications
23,846
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,170
Citations
Introduction
Current institution
The NorthCap University
Current position
  • Student

Publications

Publications (122)
Preprint
Full-text available
Several machine learning algorithms have been developed for the prediction of Alzheimer's disease and related dementia (ADRD) from spontaneous speech. However, none of these algorithms have been translated for the prediction of broader cognitive impairment (CI), which in some cases is a precursor and risk factor of ADRD. In this paper, we evaluated...
Preprint
Full-text available
Objectives. To identify youth subgroups based on lifestyle, BMI, and sociodemographic characteristics and examine the association between group membership and prediabetes/diabetes (preDM/DM) status. Methods. We analyzed data from 1,278 adolescents (ages 12-17) from the 2011-2018 National Health and Nutrition Examination Surveys. PreDM/DM was define...
Preprint
Full-text available
Effectively modeling multimodal longitudinal data is a pressing need in various application areas, especially biomedicine. Despite this, few approaches exist in the literature for this problem, with most not adequately taking into account the multimodality of the data. In this study, we developed multiple configurations of a novel multimodal and lo...
Article
Full-text available
Interactions between protein kinases and their substrates are critical for the modulation of complex signaling pathways. Currently, there is a large amount of information available about kinases and their substrates in disparate public databases. However, these data are difficult to interpret in the context of cellular systems, which can be facilit...
Article
Full-text available
Efficiently and objectively analyzing the complex, diverse multimodal data collected from patients at risk for dementia can be difficult in the clinical setting, contributing to high rates of underdiagnosis or misdiagnosis of this serious disorder. Patients with mild cognitive impairment (MCI) are especially at risk of developing dementia in the fu...
Article
Full-text available
Protein kinase function and interactions with drugs are controlled in part by the movement of the DFG and ɑC-Helix motifs that are related to the catalytic activity of the kinase. Small molecule ligands elicit therapeutic effects with distinct selectivity profiles and residence times that often depend on the active or inactive kinase conformation(s...
Article
Full-text available
Understanding the dynamics of intracellular signaling pathways, such as ERK1/2 (ERK) and Akt1/2 (Akt), in the context of cell fate decisions is important for advancing our knowledge of cellular processes and diseases, particularly cancer. While previous studies have established associations between ERK and Akt activities and proliferative cell fate...
Article
Full-text available
Background Allergic rhinitis is a common inflammatory condition of the nasal mucosa that imposes a considerable health burden. Air pollution has been observed to increase the risk of developing allergic rhinitis. We addressed the hypotheses that early life exposure to air toxics is associated with developing allergic rhinitis, and that these effect...
Article
Full-text available
Background: The prevalence of Type 2 diabetes (DM) and prediabetes (preDM) has been increasing among youth in recent decades in the United States, prompting an urgent need for understanding and identifying their associated risk factors. Such efforts, however, have been hindered by the lack of easily accessible youth preDM/DM data. Objective: We...
Article
Full-text available
Objectives: To determine rates of previously undetected cognitive impairment among patients with depression in primary care. Methods: Patients ages 55 and older with no documented history of dementia or mild cognitive impairment were recruited from primary care practices in New York City, NY and Chicago, IL (n = 855). Cognitive function was assesse...
Preprint
Full-text available
Understanding the dynamics of intracellular signaling pathways, such as ERK1/2 (ERK) and Akt1/2 (Akt), in the context of cell fate decisions is important for advancing our knowledge of cellular processes and diseases, particularly cancer. While previous studies have established associations between ERK and Akt activities and proliferative cell fate...
Article
Background Prediction of future dementia for patients with mild cognitive impairment (MCI) is a significant clinical goal, so that the identified cases can benefit from treatments. Currently, the clinical standard for diagnosing dementia utilizes multimodal data like cognitive tests, MRI and PET scans, and other biomarkers. However, efficiently and...
Preprint
Interactions between protein kinases and their substrates are critical for the modulation of complex signaling pathways. Currently, there is a large amount of information available about kinases and their substrates in disparate public databases. However, these data are difficult to interpret in the context of cellular systems, which can be facilit...
Preprint
BACKGROUND The prevalence of type 2 diabetes mellitus (DM) and pre–diabetes mellitus (pre-DM) has been increasing among youth in recent decades in the United States, prompting an urgent need for understanding and identifying their associated risk factors. Such efforts, however, have been hindered by the lack of easily accessible youth pre-DM/DM dat...
Preprint
Protein kinase function and interactions with drugs are controlled in part by the movement of the DFG and ɑC-Helix motifs, which enable kinases to adopt various conformational states. Small molecule ligands elicit therapeutic effects with distinct selectivity profiles and residence times that often depend on the kinase conformation(s) they bind. Ho...
Preprint
Full-text available
The prevalence of type 2 diabetes mellitus (DM) and prediabetes (preDM) is rapidly increasing among youth, posing significant health and economic consequences. To address this growing concern, we created the most comprehensive youth-focused diabetes dataset to date derived from National Health and Nutrition Examination Survey (NHANES) data from 199...
Article
Full-text available
Background: The Society of Thoracic Surgeons risk scores are widely used to assess risk of morbidity and mortality in specific cardiac surgeries but may not perform optimally in all patients. In a cohort of patients undergoing cardiac surgery, we developed a data-driven, institution-specific machine learning-based model inferred from multi-modal e...
Article
Full-text available
One of the promising opportunities of digital health is its potential to lead to more holistic understandings of diseases by interacting with the daily life of patients and through the collection of large amounts of real-world data. Validating and benchmarking indicators of disease severity in the home setting is difficult, however, given the large...
Preprint
Full-text available
A growing number of algorithms are being developed to automatically identify disorders or disease biomarkers from digitally recorded audio of patient speech. An important step in these analyses is to identify and isolate the patient's speech from that of other speakers or noise that are captured in a recording. However, current algorithms, such as...
Article
Full-text available
Biochemical correlates of stochastic single-cell fates have been elusive, even for the well-studied mammalian cell cycle. We monitored single-cell dynamics of the ERK and Akt pathways, critical cell cycle progression hubs and anti-cancer drug targets, and paired them to division events in the same single cells using the non-transformed MCF10A epith...
Article
Full-text available
Motivation Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a un...
Preprint
Full-text available
Audio and speech have several implicit characteristics that have the potential for the identification and quantification of clinical disorders. This PRISMA-guided review is designed to provide an overview of the landscape of automated clinical audio processing to build data-driven predictive models and infer phenotypes of a variety of neuropsychiat...
Preprint
Full-text available
One of the promising opportunities of digital health is its potential to lead to more holistic understandings of diseases by interacting with the daily life of patients and through the collection of large amounts of real world data. Validating and benchmarking indicators of disease severity in the home setting is difficult, however, given the large...
Article
Full-text available
Air pollution is a well-known contributor to asthma. Air toxics are hazardous air pollutants that cause or may cause serious health effects. While individual air toxics have been associated with asthma, only a limited number of studies have specifically examined combinations of air toxics associated with the disease. We geocoded air toxic levels fr...
Preprint
Full-text available
Predictive determinants of stochastic single-cell fates have been elusive, even for the well-studied mammalian cell cycle. What drives proliferation decisions of single cells at any given time? We monitored single-cell dynamics of the ERK and Akt pathways, critical cell cycle progression hubs and anti-cancer drug targets, and paired them to divisio...
Article
Full-text available
Prediabetes and diabetes mellitus (preDM/DM) have become alarmingly prevalent among youth in recent years. However, simple questionnaire-based screening tools to reliably assess diabetes risk are only available for adults, not youth. As a first step in developing such a tool, we used a large-scale dataset from the National Health and Nutritional Ex...
Preprint
Full-text available
Heterogeneous ensembles that can aggregate an unrestricted number and variety of base predictors can effectively address challenging prediction problems. In particular, accurate ensembles that are also parsimonious, i.e., consist of as few base predictors as possible, can help reveal potentially useful knowledge about the target problem domain. Alt...
Article
Full-text available
Introduction Despite the availability of several pre-processing software, poor peak integration remains a prevalent problem in untargeted metabolomics data generated using liquid chromatography high–resolution mass spectrometry (LC–MS). As a result, the output of these pre-processing software may retain incorrectly calculated metabolite abundances...
Article
Full-text available
Background: The COVID-19 pandemic has affected millions of individuals and caused hundreds of thousands of deaths worldwide. Predicting mortality among patients with COVID-19 who present with a spectrum of complications is very difficult, hindering the prognostication and management of the disease. We aimed to develop an accurate prediction model...
Preprint
In this paper, we will show that the recently introduced graphical model: Conditional Random Fields (CRF) provides a template to integrate micro-level information about biological entities into a mathematical model to understand their macro-level behavior. More specifically, we will apply the CRF model to an important classification problem in prot...
Preprint
Full-text available
Motivation Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a un...
Preprint
Full-text available
Background: The coronavirus disease 2019 (COVID-19) pandemic has affected over millions of individuals and caused hundreds of thousands of deaths worldwide. It can be difficult to accurately predict mortality among COVID-19 patients presenting with a spectrum of complications, hindering the prognostication and management of the disease. Methods: We...
Article
Full-text available
Importance Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives. Objective To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased eva...
Preprint
Full-text available
Background Prediabetes and diabetes mellitus (preDM/DM) have become alarmingly prevalent among youth in recent years. However, simple questionnaire-based screening tools to reliably assess diabetes risk are only available for adults, not youth. Methods As a first step in developing such a tool, we used a large-scale dataset from the National Healt...
Article
Full-text available
Biological and regulatory mechanisms underlying many multi-gene expression-based disease biomarkers are often not readily evident. We describe an innovative framework, NeTFactor, that combines network analyses with gene expression data to identify transcription factors (TFs) that significantly and maximally regulate such a biomarker. NeTFactor uses...
Article
Full-text available
Background: 10-20% of patients develop long-term toxicity following radiotherapy for prostate cancer. Identification of common genetic variants associated with susceptibility to radiotoxicity might improve risk prediction and inform functional mechanistic studies. Methods: We conducted an individual patient data meta-analysis of six genome-wide...
Article
Full-text available
Multiparametric magnetic resonance imaging (mpMRI) has become increasingly important for the clinical assessment of prostate cancer (PCa), but its interpretation is generally variable due to its relatively subjective nature. Radiomics and classification methods have shown potential for improving the accuracy and objectivity of mpMRI-based PCa asses...
Article
Full-text available
The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral...
Article
Full-text available
Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this abil...
Article
Full-text available
Asthma is a common, under-diagnosed disease affecting all ages. We sought to identify a nasal brush-based classifier of mild/moderate asthma. 190 subjects with mild/moderate asthma and controls underwent nasal brushing and RNA sequencing of nasal samples. A machine learning-based pipeline identified an asthma classifier consisting of 90 genes inter...
Preprint
Heterogeneous ensembles built from the predictions of a wide variety and large number of diverse base predictors represent a potent approach to building predictive models for problems where the ideal base/individual predictor may not be obvious. Ensemble selection is an especially promising approach here, not only for improving prediction performan...
Preprint
Full-text available
Respiratory viruses are highly infectious; however, the variation of individuals' physiologic responses to viral exposure is poorly understood. Most studies examining molecular predictors of response focus on late stage predictors, typically near the time of peak symptoms. To determine whether pre- or early post-exposure factors could predict respo...
Preprint
Asthma is a common, under-diagnosed disease affecting all ages. We sought to identify a nasal brush-based classifier of mild/moderate asthma. 190 subjects with mild/moderate asthma and controls underwent nasal brushing and RNA sequencing of nasal samples. A machine learning-based pipeline identified an asthma classifier consisting of 90 genes inter...
Article
Data-driven machine learning methods present an opportunity to simultaneously assess the impact of multiple air pollutants on health outcomes. The goal of this study was to apply a two-stage, data-driven approach to identify associations between air pollutant exposure profiles and children's cognitive skills. Data from 6900 children enrolled in the...
Conference Paper
Prediction problems in biomedical sciences are generally quite difficult, partially due to incomplete knowledge of how the phenomenon of interest is influenced by the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor(s) for specific problems. In these situations, a powerful approach to impr...
Article
Variability in induced pluripotent stem cell (iPSC) lines remains a concern for disease modeling and regenerative medicine. We have used RNA-sequencing analysis and linear mixed models to examine the sources of gene expression variability in 317 human iPSC lines from 101 individuals. We found that ∼50% of genome-wide expression variability is expla...
Article
Full-text available
Nature Communications 7 : Article number: 12460 10.1038/ncomms12460 ( 2016 ); Published: 23 August 2016 ; Updated: 10 October 2016 . The HTML version of this Article incorrectly duplicated the authors S.
Article
Full-text available
Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment eff...
Data
Supplementary Figures 1-6, Supplementary Tables 1-4, Supplementary Note 1 and Supplementary References
Article
Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment eff...
Article
Full-text available
A genetic interaction (GI) is a type of interaction where the effect of one gene is modified by the effect of one or several other genes. These interactions are important for delineating functional relationships among genes and their corresponding proteins, as well as elucidating complex biological processes and diseases. An important type of GI –...
Article
Prediction problems in biomedical sciences, including protein function prediction (PFP), are generally quite difficult. This is due in part to incomplete knowledge of the cellular phenomenon of interest, the appropriateness and data quality of the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal pre...
Article
Full-text available
The ability to computationally predict the effects of toxic compounds on humans could help address the deficiencies of current chemical safety testing. Here, we report the results from a community-based DREAM challenge to predict toxicities of environmental compounds with potential adverse health effects for human populations. We measured the cytot...
Conference Paper
Induced pluripotent stem cells (iPSCs) offer a unique opportunity to model human diseases and serve as an unlimited cell source for regenerative medicine. However, patient based and iPSC clonal variability remain a limitation for their use. Additionally, the effect of genetic variation on the molecular pathways driving self-renewal and differentiat...
Article
The study of complex cardiovascular disease (CVD) has been hampered by the lack of appropriate human cellular model systems. In response, the NHBLI sponsored the NextGen Consortium, which encompasses 9 independent efforts spanning the portfolio of NHLBI related phenotypes. The goals of the consortium include: 1. To develop and improve methods for l...
Conference Paper
The study of complex cardiovascular disease has often been hampered by the lack of appropriate human cellular model systems. In response to this need the NHBLI sponsored the NextGen Consortium, which encompasses nine independent efforts spanning a representative portfolio of NHLBI related phenotypes. The overall goals of the consortium include: 1....
Conference Paper
Insulin resistance (IR) is a major health issue in western countries and a risk factor for the development of cardiovascular disease. The vasculature of insulin resistant individuals shows a set of common features, mainly defined by a reduction in the bioactivity of nitric oxide, an increase in reactive oxygen species production and a chronic pro-i...
Article
The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their effi...
Data
Model scores in METABRIC2 and MicMa evaluations. Models, and corresponding model scores, used in the METABRIC2 and MicMa evaluations are at syn1646909 and syn1642232, respectively. (DOCX)
Data
Complete details of all the models submitted to the pilot competition in the uncontrolled experimental design. Source code for all models are available using the Synapse IDs listed in this table (see Methods for description of how to view model source code). (XLSX)
Data
Complete details of all the models evaluated in the controlled experiment. Source code for all models is available using the Synapse IDs listed in this table. (XLSX)
Data
Association of gene expression and CNA with survival and p-values of the association between gene expression and survival and between CNA and survival for the 10 probes with lowest P-value. (a) Top ten gene expression probes associated with survival marginally. (b) Top ten copy number probes associated with survival marginally. (c) Top ten gene exp...
Data
Performance of models from the controlled experiment in the METABRIC2 (A) and MicMa (B) dataset. (PNG)
Data
Top 50 oncogenes and transcription factors inferred by the MASP feature selection algorithm. (XLSX)
Article
Full-text available
Breast cancer is the most common malignancy in women and is responsible for hundreds of thousands of deaths annually. As with most cancers, it is a heterogeneous disease and different breast cancer subtypes are treated differently. Understanding the difference in prognosis for breast cancer based on its molecular and phenotypic features is one aven...
Article
Full-text available
The regulation of air pollutant levels is rapidly becoming one of the most important tasks for the governments of developing countries, especially China. Submicron particles, such as ultrafine particles (UFP, aerodynamic diameter ≤ 100 nm) and particulate matter ≤ 1.0 micrometers (PM1.0), are an unregulated emerging health threat to humans, but the...
Article
Full-text available
Protein interaction networks are a promising type of data for studying complex biological systems. However, despite the rich information embedded in these networks, they face important data quality challenges of noise and incompleteness that adversely affect the results obtained from their analysis. Here, we explore the use of the concept of common...
Article
Systems biology approaches that utilize large genomic data sets hold great potential for deciphering complex immunological process. In this paper, we propose such an approach to derive informative modules and networks from large gene expression data sets. Our approach starts with the clustering of such data sets to derive groups of tightly co-expre...
Article
Full-text available
Although much progress has been made in the understanding of the ontogeny and function of dendritic cells (DCs), the transcriptional regulation of the lineage commitment and functional specialization of DCs in vivo remains poorly understood. We made a comprehensive comparative analysis of CD8(+), CD103(+), CD11b(+) and plasmacytoid DC subsets, as w...
Article
Full-text available
Discriminative patterns can provide valuable insights into data sets with class labels, that may not be available from the individual features or the predictive models built using them. Most existing approaches work efficiently for sparse or low-dimensional data sets. However, for dense and high-dimensional data sets, they have to use high threshol...
Article
Full-text available
Genetic interactions provide a powerful perspective into gene function, but our knowledge of the specific mechanisms that give rise to these interactions is still relatively limited. The availability of a global genetic interaction map in Saccharomyces cerevisiae, covering ∼30% of all possible double mutant combinations, provides an unprecedented o...
Data
Full-text available
Supplementary figures and tables. (0.22 MB PDF)
Article
Full-text available
Genetic interactions occur when a combination of mutations results in a surprising phenotype. These interactions capture functional redundancy, and thus are important for predicting function, dissecting protein complexes into functional pathways, and exploring the mechanistic underpinnings of common human diseases. Synthetic sickness and lethality...
Article
Full-text available
Genetic Interaction (GI) data provides a means for exploring the structure and function of pathways in a cell. Coherent value bicliques (submatrices) in GI data represents func-tionally similar gene modules or protein complexes. How-ever, no systematic approach has been proposed for exhaus-tively enumerating all coherent value submatrices in such d...
Article
In this paper, we study methods to identify differential coexpression patterns in case-control gene expression data. A differential coexpression pattern consists of a set of genes that have substantially different levels of coherence of their expression profiles across the two sample-classes, i.e., highly coherent in one class, but not in the other...
Conference Paper
Full-text available
The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on real-valued data sets in various domains, such as biology. Several algorithms have been proposed tofinddifferent types of biclusters insuch data sets. How- ever, the...
Conference Paper
Full-text available
Association analysis is one of the most popular analysis paradigms in data mining. In this paper, we present different types of association patterns and discuss some of their applications in bioinformatics. We present a case study showing the usefulness of association analysis-based techniques for pre-processing protein interaction networks. Finall...

Network

Cited By