• Home
  • IBM
  • Computational Biology Center
  • Daniel E Platt
Daniel E Platt

Daniel E Platt
  • PhD
  • Research Staff Member at IBM

About

154
Publications
132,833
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,044
Citations
Current institution
IBM
Current position
  • Research Staff Member

Publications

Publications (154)
Article
Full-text available
While a broad consensus about the first successful migration modern humans out of Africa seems established, the peopling of Arabia remains somewhat enigmatic. Identifying the ancestral populations that contributed to the gene pool of the current populations inhabiting Arabia and the impact of their contributions remains a challenging task. We inves...
Article
Principal Component Analysis (PCA) is a powerful multivariate tool allowing the projection of data in low-dimensional representations. Nevertheless, datapoint distances on these low-dimensional projections are challenging to interpret. Here, we propose a computationally simple heuristic to transform a map based on standard PCA (when the variables a...
Preprint
Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, that integrates molecular views of graph, image and t...
Preprint
Full-text available
Chronic kidney disease (CKD) is a complex condition where the kid- neys are damaged and progressively lose their ability to filter blood, 10% of the world population have the disease that often goes undetected un- til it is too late for intervention. Using the UK Biobank (UKBB) we constructed a CKD cohort of patients (n=46,986) with genomic, clinic...
Article
Full-text available
Lebanon’s rich history as a cultural crossroad spanning millennia has significantly impacted the genetic composition of its population through successive waves of migration and conquests from surrounding regions. Within modern-day Lebanon, the Koura district stands out with its unique cultural foundations, primarily characterized by a notably high...
Article
Motivation The emergence of COVID-19 (C19) created incredible worldwide challenges but offers unique opportunities to understand the physiology of its risk factors and their interactions with complex disease conditions, such as metabolic syndrome. To address the challenges of discovering clinically relevant interactions, we employed a unique approa...
Article
Full-text available
GWAS focuses on significance loosing false positives; machine learning probes sub-significant features relying on predictivity. Yet, these are far from orthogonal. We sought to explore how these inform each other in sub-genome-wide significant situations to define relevance for predictive features. We introduce the SVM-based RubricOE that selects h...
Preprint
Full-text available
In recent years, there has been tremendous progress in the development of quantum computing hardware, algorithms and services leading to the expectation that in the near future quantum computers will be capable of performing simulations for natural science applications, operations research, and machine learning at scales mostly inaccessible to clas...
Article
Full-text available
Background and objectives: High homocysteine levels are associated with increased risk of hypertension and stroke. Homocysteine is metabolized by the methylenetetrahydrofolate reductase (MTHFR). We aimed to investigate the levels of homocysteine and their association with hypertension, stroke, and antihypertensive medication usage in patients with...
Article
Full-text available
Background: Forced displacement and war trauma cause high rates of post-traumatic stress, anxiety disorders and depression in refugee populations. We investigated the impact of forced displacement on mental health status, gender, presentation of type 2 diabetes (T2D) and associated inflammatory markers among Syrian refugees in Lebanon. Methods:...
Preprint
Full-text available
We discuss the inadequacy of covariances/correlations and other measures in L2 as relative distance metrics under some conditions. We propose a computationally simple heuristic to transform a map based on standard principal component analysis (PCA) (when the variables are asymptotically Gaussian) into an entropy-based map where distances are based...
Preprint
Full-text available
The role of race in medical decision-making has been a contentious issue. Insights from history and population genetics suggest considering race as a differentiating marker for medical practices can be influenced by systemic bias, leading to serious errors. This may negatively impact treatment of complex diseases such as cardiovascular disease (CVD...
Article
Full-text available
Backgrounds and Aims The role of Lipoprotein(a) (Lp(a)) in increasing the risk of cardiovascular diseases is reported in several populations. The aim of this study is to investigate the correlation of high Lp(a) levels with the degree of coronary artery stenosis. Methods Two hundred and sixty-eight patients were enrolled for this study. Patients w...
Article
Biological pathways play a crucial role in the properties of diseases and are important in drug discovery. Identifying the logical relationships among distinctive phenotypic clusters could reveal possible connections to the underlying pathways. However, this process is challenging since clinical phenotypes are often available through unstructured e...
Article
Full-text available
Background The COVID-19 pandemic claimed millions of lives worldwide without clear signs of abating despite several mitigation efforts and vaccination campaigns. There have been tremendous interests in understanding the etiology of the disease particularly in what makes it severe and fatal in certain patients. Studies have shown that COVID-19 patie...
Preprint
Full-text available
Background: The COVID-19 pandemic claimed millions of lives worldwide without clear signs of abating. There have been tremendous interests in understanding the etiology of the disease particularly in what makes it fatal in certain patients. Methods: This study investigated 819 COVID-19 patients admitted to the COVID-19 ward at a tertiary care cente...
Preprint
Full-text available
The emergence of COVID19 created incredible worldwide challenges but offers unique opportunities to understand the physiology of its risk factors and their interactions with complex disease conditions, such as metabolic syndrome. Epidemiological analysis powered by topological data analysis (TDA) is a novel approach to uncover these clinically rele...
Chapter
COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop t...
Article
Full-text available
Papillomaviruses (PVs) are a heterogeneous group of DNA viruses that can infect fish, birds, reptiles, and mammals. PVs infecting humans (HPVs) phylogenetically cluster into five genera (Alpha-, Beta-, Gamma-, Mu- and Nu-PV), with differences in tissue tropism and carcinogenicity. The evolutionary features associated with the divergence of Papillom...
Preprint
Full-text available
Parkinson's Disease (PD) is a progressive neurodegenerative movement disorder characterized by loss of striatal dopaminergic neurons. Progression of PD is usually captured by a host of clinical features represented in different rating scales. PD diagnosis is associated with a broad spectrum of non-motor symptoms such as depression, sleep disorder a...
Article
Full-text available
As studies move into deeper characterization of the impact of selection through non-neutral mutations in whole genome population genetics, modeling for selection becomes crucial. Moreover, epistasis has long been recognized as a significant component in understanding the evolution of complex genetic systems. We present a backward coalescent model,...
Preprint
Full-text available
Genetic epidemiology is a growing area of interest in the past years due to the availability of genetic data with the decreasing cost of sequencing. Machine learning (ML) algorithms can be a very useful tool to study the genetic factors on disease incidence or on different traits characterizing a population. There are many challenges that plagues t...
Preprint
Full-text available
As studies move into deeper characterization of the impact of selection through non-neutral mutations in whole genome population genetics, modeling for selection becomes crucial. Moreover, epistasis has long been recognized as a significant component in understanding evolution of complex genetic systems. We present a backward coalescent model EpiSi...
Article
Full-text available
India represents an intricate tapestry of population substructure shaped by geography, language, culture and social stratification. While geography closely correlates with genetic structure in other parts of the world, the strict endogamy imposed by the Indian caste system and the large number of spoken languages add further levels of complexity to...
Preprint
Full-text available
COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop t...
Article
Full-text available
We sought to investigate whether epidemiological parameters that define epidemic models could be determined from the epidemic trajectory of infections, recovery, and hospitalizations prior to peak, and also to evaluate the comparability of data between jurisdictions reporting their statistics. We found that, analytically, the pre-peak growth of an...
Article
Full-text available
Currently, there are 18 different religious communities living in Lebanon. While evolving primarily within Lebanon, these communities show a level of local isolation as demonstrated previously from their Y-haplogroup distributions. In order to trace the origins and migratory patterns that may have led to the genetic isolation and autosomal clusteri...
Article
Full-text available
Objective To analyse genome variants of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). Methods Between 1 February and 1 May 2020, we downloaded 10 022 SARS CoV-2 genomes from four databases. The genomes were from infected patients in 68 countries. We identified variants by extracting pairwise alignment to the reference genome NC_0455...
Preprint
Full-text available
An opportunity exists in exploring epidemic modeling as a novel way to determine physiological and demic parameters for genetic association studies on a population/environmental (quasi) epidemiological study level. First, the spread of SARS-COV-2 has produced population specific lineages; second, epidemic spread model parameters are tied directly t...
Preprint
Full-text available
We have analyzed COVID-19 variants from publicly available 48 genomes. Co-occurrence of 8782C>T and 28144T>C variants are frequently found among travelers but not from Wuhan samples. Thus, we named it traveler substrain.
Preprint
Full-text available
We report high coverage whole genome sequencing data from 46 Yemeni individuals as well as genome wide genotyping data from 169 Yemenis from diverse locations. We use this dataset to define the genetic diversity in Yemen and how it relates to people elsewhere in the Near East. Yemen is a vast region with substantial cultural and geographic diversit...
Article
Full-text available
The Phoenicians emerged in the Northern Levant around 1800 BCE and by the 9th century BCE had spread their culture across the Mediterranean Basin, establishing trading posts, and settlements in various European Mediterranean and North African locations. Despite their widespread influence, what is known of the Phoenicians comes from what was written...
Data
Median average read depth and coverage across genomes of 14 ancient Phoenician samples. (PDF)
Data
DNA damage patterns for Phoenician samples. Base frequency of 5’ and 3’ of strand breaks (top) and C to T nucleotide misincorporations for the first and last 25 bases of endogenous mtDNA fragments for merged reads (bottom), red = C to T and blue = G to A misincorporation. (PDF)
Data
Aerial view of the site of Monte Sirai. (TIF)
Data
DNA fragment length distribution for each of 14 ancient Phoenician samples. (PDF)
Data
Modern Lebanese haplogroup assignments and Genbank accession numbers. (XLSX)
Data
Haplogroup assignments, coverage information and variable sites identified for modern Lebanese samples. (XLSX)
Data
Haplogroup assignments, coverage information, ContamMix results and variable sites identified for all ancient samples sequenced. (XLSX)
Data
Maximum parsimony tree with 14 ancient Phoenician samples placed within the 438 published ancient mitogenomes shown in Figure S5 of Olivieri et al. (2017). (XLSX)
Article
Full-text available
Background: Waterpipe smoking is a rising global public health epidemic perceived by many users to be less harmful, though its toxicity overlaps or even exceeds that of cigarette smoking. Short-term cardiovascular changes due to waterpipe smoking are well established, but longer-term health impacts are still not fully elucidated. Objective: We aim...
Preprint
Full-text available
India represents an intricate tapestry of population sub-structure shaped by geography, language , culture and social stratification operating in concert [1-3]. To date, no study has attempted to model and evaluate how these evolutionary forces have interacted to shape the patterns of genetic diversity within India. Geography has been shown to clos...
Article
Full-text available
Background Elevated homocysteine (Hc) levels have a well-established and clear causal relationship to epithelial damage leading to coronary artery disease. Furthermore, it is strongly associated with other metabolic syndrome variables, such as hypertension, which is correlated with type II diabetes mellitus (T2DM). Studies on T2DM in relation to Hc...
Article
Full-text available
Aboriginal Australians represent one of the oldest continuous cultures outside Africa, with evidence indicating that their ancestors arrived in the ancient landmass of Sahul (present-day New Guinea and Australia) ~55 thousand years ago. Genetic studies, though limited, have demonstrated both the uniqueness and antiquity of Aboriginal Australian gen...
Article
Full-text available
Archaeological, palaeontological and geological evidence shows that post-glacial warming released human populations from their various climate-bound refugia. Yet specific connections between these refugia and the timing and routes of post-glacial migrations that ultimately established modern patterns of genetic variation remain elusive. Here, we us...
Article
Full-text available
Background In a cohort of children in Cyprus, we recently reported low levels of high density lipoprotein cholesterol (HDL-C) to be associated with asthma. We examined whether genetic polymorphisms that were previously linked individually to asthma, obesity, or HDL-C are associated with both asthma and HDL-C levels in the Cyprus cohort. Methods We...
Article
Full-text available
Aboriginal Australians are one of the more poorly studied populations from the standpoint of human evolution and genetic diversity. Thus, to investigate their genetic diversity, the possible date of their ancestors’ arrival and their relationships with neighboring populations, we analyzed mitochondrial DNA (mtDNA) diversity in a large sample of Abo...
Article
Full-text available
Background Complex diseases may have multiple pathways leading to disease. E.g. coronary artery disease evolves from arterial damage to their epithelial layers, but has multiple causal pathways. More challenging, those pathways are highly correlated within metabolic syndrome. The challenge is to identify specific clusters of phenotype characteristi...
Article
Full-text available
Cultural, dietary, and lifestyle factors are the main modulators of type 2 diabetes mellitus (T2DM) disease risk. Coffee is one of the most popular worldwide beverages, and recent epidemiological studies have showed that coffee consumption is associated with a lower risk of T2DM. This study investigates the impact of coffee intake on T2DM risk and...
Article
Full-text available
Cultural, dietary, and lifestyle factors are main modulators of Type 2 Diabetes Mellitus disease (T2DM) risk. Coffee is one the most popular worldwide beverages and recent epidemiological studies showed that coffee consumption is associated with a lower risk of T2DM. This study investigates coffee intake impact on T2DM risk and assesses the effect...
Article
Full-text available
Background: More evidence is emerging on the strong association between chronic kidney disease (CKD) and cardiovascular disease. We assessed the relationship between coronary artery disease (CAD) and renal dysfunction level (RDL) in a group of Lebanese patients. Methods: A total of 1268 patients undergoing cardiac catheterization were sequential...
Article
Full-text available
The role of inflammation in coronary artery disease (CAD) pathogenesis is well recognized. Moreover, smoking inhalation increases the activity of inflammatory mediators through an increase in leukotriene synthesis essential in atherosclerosis pathogenesis. The aim of this study is to investigate the effect of "selected" genetic variants within the...
Article
Full-text available
Genome-wide association studies (GWAS) of multiple populations with distinctive genetic and lifestyle backgrounds are crucial to the understanding of Type 2 Diabetes Mellitus (T2DM) pathophysiology. We report a GWAS on the genetic basis of T2DM in a 3,286 Lebanese participants. More than 5,000,000 SNPs were directly genotyped or imputed using the 1...
Article
Full-text available
The onset of coronary artery disease (CAD) is influenced by cardiovascular risk factors that often occur in clusters and may build on one another. The objective of this study is to examine the relationship between hypertension and CAD age of onset in the Lebanese population. This retrospective analysis was performed on data extracted from Lebanese...
Article
Full-text available
The burden of diabetes in Lebanon requires well-targeted interventions for screening type 2 diabetes mellitus (T2DM) and prediabetes and prevention of risk factors. Newly recruited 998 Lebanese individuals, in addition to 7,292 already available, were studied to investigate the prevalence of diabetes, prediabetes and their associated risk factors....
Article
Full-text available
A main underlying pathology of coronary artery disease is the deposition of cholesterol in the arteries supplying blood to the heart that leads to stenosis and myocardial infarction. We tested if dyslipidemia is a risk factor for coronary artery disease in the Lebanese population, and studied the role of the total cholesterol/HDL cholesterol (TC/HD...
Article
Accessible biotechnology is enabling the cataloging of genetic variants in individuals in populations at unprecedented scales. The use of phylogeny of the individuals within populations allows a model-based approach to studying these variations, which is important in understanding relationships between and across populations. For the somatic genome...
Article
Full-text available
Haplogroup H dominates present-day Western European mitochondrial DNA variability (>40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. Here we investigate this major component of the maternal population history of modern Europeans and sequence 39 complete haplogroup H mitoc...
Data
Raw coancestry matrix shows relationships between the Levantines and the world populations. A) Intensity of the colors reflects the number of haplotype chunks donated to the Levantines. The vertical line is a visual aid to reflect the Levantine split observed in the tree. Horizontal lines distinguish the major geographic regions. B) coancestry matr...
Data
World population structure inferred by ADMIXTURE analysis of >240K autosomal SNPs. A) Each horizontal line represents ancestry probabilities of an individual in 2–10 constructed ancestral populations. Levantine population names are shown in blue. B) Cross-validation plot for the world dataset. (TIF)
Data
Ancestry probabilities of individuals considering 10 ancestral populations. Highlighted cells indicate individuals have >60% of one component. Standard errors were estimated using 200 bootstrap replicates. (XLS)
Data
Full-text available
Description of the ROLLOFF analysis. (PDF)
Data
Stratified random sampling of 75 Lebanese samples. A) 25 samples from each of the three main religion groups in Lebanon were randomly chosen from the 1,341 samples illustrated in Figure 1. B) Map of Lebanon showing the distribution of the samples. (TIF)
Data
Principle component analysis generated with fineSTRUCTURE using ChromoPainter's coancestry matrix showing the top two components. A) Plot shows global diversity using 50 populations. B) Magnification of West Asia region showing the Levantine populations in their regional and religion context. (TIF)
Article
Full-text available
The Levant is a region in the Near East with an impressive record of continuous human existence and major cultural developments since the Paleolithic period. Genetic and archeological studies present solid evidence placing the Middle East and the Arabian Peninsula as the first stepping-stone outside Africa. There is, however, little understanding o...
Article
Full-text available
The Middle East was a funnel of human expansion out of Africa, a staging area for the Neolithic Agricultural Revolution, and the home to some of the earliest world empires. Post LGM expansions into the region and subsequent population movements created a striking genetic mosaic with distinct sex-based genetic differentiation. While prior studies ha...
Data
Fisher exact tests for haplogroup frequencies vs. population within the Middle East. (XLS)
Data
Populations comparison based on Y haplogroups a) Principal Component Analysis of relative frequencies of Y haplogroups within populations, b) with mean-linkage (UPGMA) dendrogram determined from Euclidean distances. (TIF)
Data
mtDNA FST distances between populations. (XLS)
Data
Y STR RST distances between populations. (XLS)
Article
Full-text available
The Middle East was a funnel of human expansion out of Africa, a staging area for the Neolithic Agricultural Revolution, and the home to some of the earliest world empires. Post LGM expansions into the region and subsequent population movements created a striking genetic mosaic with distinct sex-based genetic differentiation. While prior studies ha...
Chapter
The dispersal of the human population to all the continents of the globe is a compelling story that can possibly be unravelled from the genetic landscape of the current populations. Indeed, a grasp on this strengthens the understanding of relationship between populations for anthropological as well as medical applications. While the collective geno...
Article
Full-text available
Previous studies that pooled Indian populations from a wide variety of geographical locations, have obtained contradictory conclusions about the processes of the establishment of the Varna caste system and its genetic impact on the origins and demographic histories of Indian populations. To further investigate these questions we took advantage that...
Data
Modal tree obtained by BATWING indicating the coalescence time divergence estimates (in years) among Major Populations Groups (MPG) using 17 STRs from haplogroup (a) F-M89, (b) H1-M52, (c) L1-M26/M72. (TIFF)
Data
List of Y chromosome SNPS and haplotype data for the 1680 individuals from 31 tribal and non-tribal populations presented in this study. (XLS)
Data
Modal tree obtained by BATWING indicating the coalescence time divergence estimates (in years) among endogamous populations within (a) HTF and HTK groups, (b) DLF, (c) BRH and HTC, using 17 STRs from all haplogroups. (TIFF)
Data
AMOVA analysis of various population groupings based on the 17STR haplotype & 95%CI based on re-sampling of the samples across populations. (XLS)
Data
PCA plot showing the first two principal components of haplogroup frequencies for 97 non-tribal (circles) and tribal (squares) populations of India and nearby regions from previous publications, compared to the non-tribal (horizontal ovals) and tribal (diamonds) populations from the present study. Symbols have been colored according to linguistic c...
Data
Reduced median network of 17 microsatellite haplotypes within haplogroup. (a) HG C-M130 using 74 chromosomes, (b) HG H1-M52 using 292 chromosomes (c) HG H- M69 using 79 chromosomes, (d) HG L1 – M27/M76 using 235 chromosomes, (e) HG R1a1-M17 using 214 chromosomes. Circles are colored based on the 7 Major Population Groups as shown in Figure 1, and t...
Data
List of population codes and their publication references used in Figure S1. (XLS)
Data
Fishers exact test p -values for the NRY HG frequencies among the 7 Major Populations Groups and among the 31 sampled populations. (XLS)
Article
Full-text available
The manifestation of coronary artery disease (CAD) follows a well-choreographed series of events that includes damage of arterial endothelial cells and deposition of lipids in the sub-endothelial layers. Genome-wide association studies (GWAS) of multiple populations with distinctive genetic and lifestyle backgrounds are a crucial step in understand...
Article
Full-text available
For decades, the peopling of the Americas has been explored through the analysis of uniparentally inherited genetic systems in Native American populations and the comparison of these genetic data with current linguistic groupings. In northern North America, two language families predominate: Eskimo-Aleut and Na-Dene. Although the genetic evidence f...
Article
Full-text available
Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and later became a crossroad for expanding civilizations and empires. Afghanistan's location, history, and diverse ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how major cultural evolutions an...
Data
Suggested origins of the main ethnic groups in Afghanistan. (DOC)
Data
Populations selected for this study. (XLS)
Data
AMOVA results. Comparing populations grouped according to their country or region of origin with populations grouped according to Barrier structures. (DOC)
Data
Y-chromosome haplogroups frequencies in Afghanistan's ethnic groups. (XLS)
Data
BATWING topologies and dates with 95% confidence intervals of population splits derived from multiple combinations of population subsets. (XLS)
Data
Reduced median networks. (A) C-M130, (B) R1a1a-M17, (C) E1b1b1-M35, and (D) B-M60 showing STR haplotype distributions among populations; area is proportional to haplotype frequency, and color indicates populations. Connecting lines represent putative phylogenetic relationships between haplotypes. (TIF)
Data
Y-chromosome haplogroups and haplotypes in 204 unrelated individuals from Afghanistan. (XLS)
Article
Full-text available
Basque people have received considerable attention from anthropologists, geneticists, and linguists during the last century due to the singularity of their language and to other cultural and biological characteristics. Despite the multidisciplinary efforts performed to address the questions of the origin, uniqueness, and heterogeneity of Basques, t...

Network

Cited By