
Daniel E Platt- PhD
- Research Staff Member at IBM
Daniel E Platt
- PhD
- Research Staff Member at IBM
About
154
Publications
132,833
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,044
Citations
Introduction
Current institution
Publications
Publications (154)
While a broad consensus about the first successful migration modern humans out of Africa seems established, the peopling of Arabia remains somewhat enigmatic. Identifying the ancestral populations that contributed to the gene pool of the current populations inhabiting Arabia and the impact of their contributions remains a challenging task. We inves...
Principal Component Analysis (PCA) is a powerful multivariate tool allowing the projection of data in low-dimensional representations. Nevertheless, datapoint distances on these low-dimensional projections are challenging to interpret. Here, we propose a computationally simple heuristic to transform a map based on standard PCA (when the variables a...
Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, that integrates molecular views of graph, image and t...
Chronic kidney disease (CKD) is a complex condition where the kid-
neys are damaged and progressively lose their ability to filter blood, 10%
of the world population have the disease that often goes undetected un-
til it is too late for intervention. Using the UK Biobank (UKBB) we
constructed a CKD cohort of patients (n=46,986) with genomic, clinic...
Lebanon’s rich history as a cultural crossroad spanning millennia has significantly impacted the genetic composition of its population through successive waves of migration and conquests from surrounding regions. Within modern-day Lebanon, the Koura district stands out with its unique cultural foundations, primarily characterized by a notably high...
Motivation
The emergence of COVID-19 (C19) created incredible worldwide challenges but offers unique opportunities to understand the physiology of its risk factors and their interactions with complex disease conditions, such as metabolic syndrome. To address the challenges of discovering clinically relevant interactions, we employed a unique approa...
GWAS focuses on significance loosing false positives; machine learning probes sub-significant features relying on predictivity. Yet, these are far from orthogonal. We sought to explore how these inform each other in sub-genome-wide significant situations to define relevance for predictive features. We introduce the SVM-based RubricOE that selects h...
In recent years, there has been tremendous progress in the development of quantum computing hardware, algorithms and services leading to the expectation that in the near future quantum computers will be capable of performing simulations for natural science applications, operations research, and machine learning at scales mostly inaccessible to clas...
Background and objectives:
High homocysteine levels are associated with increased risk of hypertension and stroke. Homocysteine is metabolized by the methylenetetrahydrofolate reductase (MTHFR). We aimed to investigate the levels of homocysteine and their association with hypertension, stroke, and antihypertensive medication usage in patients with...
Background:
Forced displacement and war trauma cause high rates of post-traumatic stress, anxiety disorders and depression in refugee populations. We investigated the impact of forced displacement on mental health status, gender, presentation of type 2 diabetes (T2D) and associated inflammatory markers among Syrian refugees in Lebanon.
Methods:...
We discuss the inadequacy of covariances/correlations and other measures in L2 as relative distance metrics under some conditions. We propose a computationally simple heuristic to transform a map based on standard principal component analysis (PCA) (when the variables are asymptotically Gaussian) into an entropy-based map where distances are based...
The role of race in medical decision-making has been a contentious issue. Insights from history and population genetics suggest considering race as a differentiating marker for medical practices can be influenced by systemic bias, leading to serious errors. This may negatively impact treatment of complex diseases such as cardiovascular disease (CVD...
Backgrounds and Aims
The role of Lipoprotein(a) (Lp(a)) in increasing the risk of cardiovascular diseases is reported in several populations. The aim of this study is to investigate the correlation of high Lp(a) levels with the degree of coronary artery stenosis.
Methods
Two hundred and sixty-eight patients were enrolled for this study. Patients w...
Biological pathways play a crucial role in the properties of diseases and are important in drug discovery. Identifying the logical relationships among distinctive phenotypic clusters could reveal possible connections to the underlying pathways. However, this process is challenging since clinical phenotypes are often available through unstructured e...
Background
The COVID-19 pandemic claimed millions of lives worldwide without clear signs of abating despite several mitigation efforts and vaccination campaigns. There have been tremendous interests in understanding the etiology of the disease particularly in what makes it severe and fatal in certain patients. Studies have shown that COVID-19 patie...
Background: The COVID-19 pandemic claimed millions of lives worldwide without clear signs of abating. There have been tremendous interests in understanding the etiology of the disease particularly in what makes it fatal in certain patients.
Methods: This study investigated 819 COVID-19 patients admitted to the COVID-19 ward at a tertiary care cente...
The emergence of COVID19 created incredible worldwide challenges but offers unique opportunities to understand the physiology of its risk factors and their interactions with complex disease conditions, such as metabolic syndrome. Epidemiological analysis powered by topological data analysis (TDA) is a novel approach to uncover these clinically rele...
COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop t...
Papillomaviruses (PVs) are a heterogeneous group of DNA viruses that can infect fish, birds, reptiles, and mammals. PVs infecting humans (HPVs) phylogenetically cluster into five genera (Alpha-, Beta-, Gamma-, Mu- and Nu-PV), with differences in tissue tropism and carcinogenicity. The evolutionary features associated with the divergence of Papillom...
Parkinson's Disease (PD) is a progressive neurodegenerative movement disorder characterized by loss of striatal dopaminergic neurons. Progression of PD is usually captured by a host of clinical features represented in different rating scales. PD diagnosis is associated with a broad spectrum of non-motor symptoms such as depression, sleep disorder a...
As studies move into deeper characterization of the impact of selection through non-neutral mutations in whole genome population genetics, modeling for selection becomes crucial. Moreover, epistasis has long been recognized as a significant component in understanding the evolution of complex genetic systems. We present a backward coalescent model,...
Genetic epidemiology is a growing area of interest in the past years due to the availability of genetic data with the decreasing cost of sequencing. Machine learning (ML) algorithms can be a very useful tool to study the genetic factors on disease incidence or on different traits characterizing a population. There are many challenges that plagues t...
As studies move into deeper characterization of the impact of selection through non-neutral mutations in whole genome population genetics, modeling for selection becomes crucial. Moreover, epistasis has long been recognized as a significant component in understanding evolution of complex genetic systems. We present a backward coalescent model EpiSi...
India represents an intricate tapestry of population substructure shaped by geography, language, culture and social stratification. While geography closely correlates with genetic structure in other parts of the world, the strict endogamy imposed by the Indian caste system and the large number of spoken languages add further levels of complexity to...
COVID-19 has caused thousands of deaths around the world and also resulted in a large international economic disruption. Identifying the pathways associated with this illness can help medical researchers to better understand the properties of the condition. This process can be carried out by analyzing the medical records. It is crucial to develop t...
We sought to investigate whether epidemiological parameters that define epidemic models could be determined from the epidemic trajectory of infections, recovery, and hospitalizations prior to peak, and also to evaluate the comparability of data between jurisdictions reporting their statistics. We found that, analytically, the pre-peak growth of an...
Currently, there are 18 different religious communities living in Lebanon. While evolving primarily within Lebanon, these communities show a level of local isolation as demonstrated previously from their Y-haplogroup distributions. In order to trace the origins and migratory patterns that may have led to the genetic isolation and autosomal clusteri...
Objective To analyse genome variants of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2).
Methods Between 1 February and 1 May 2020, we downloaded 10 022 SARS CoV-2 genomes from four databases. The genomes were
from infected patients in 68 countries. We identified variants by extracting pairwise alignment to the reference genome NC_0455...
An opportunity exists in exploring epidemic modeling as a novel way to determine physiological and demic parameters for genetic association studies on a population/environmental (quasi) epidemiological study level. First, the spread of SARS-COV-2 has produced population specific lineages; second, epidemic spread model parameters are tied directly t...
We have analyzed COVID-19 variants from publicly available 48 genomes. Co-occurrence of 8782C>T and 28144T>C variants are frequently found among travelers but not from Wuhan samples. Thus, we named it traveler substrain.
We report high coverage whole genome sequencing data from 46 Yemeni individuals as well as genome wide genotyping data from 169 Yemenis from diverse locations. We use this dataset to define the genetic diversity in Yemen and how it relates to people elsewhere in the Near East. Yemen is a vast region with substantial cultural and geographic diversit...
The Phoenicians emerged in the Northern Levant around 1800 BCE and by the 9th century BCE had spread their culture across the Mediterranean Basin, establishing trading posts, and settlements in various European Mediterranean and North African locations. Despite their widespread influence, what is known of the Phoenicians comes from what was written...
Median average read depth and coverage across genomes of 14 ancient Phoenician samples.
(PDF)
DNA damage patterns for Phoenician samples.
Base frequency of 5’ and 3’ of strand breaks (top) and C to T nucleotide misincorporations for the first and last 25 bases of endogenous mtDNA fragments for merged reads (bottom), red = C to T and blue = G to A misincorporation.
(PDF)
Aerial view of the site of Monte Sirai.
(TIF)
DNA fragment length distribution for each of 14 ancient Phoenician samples.
(PDF)
Modern Lebanese haplogroup assignments and Genbank accession numbers.
(XLSX)
Haplogroup assignments, coverage information and variable sites identified for modern Lebanese samples.
(XLSX)
Haplogroup assignments, coverage information, ContamMix results and variable sites identified for all ancient samples sequenced.
(XLSX)
Maximum parsimony tree with 14 ancient Phoenician samples placed within the 438 published ancient mitogenomes shown in Figure S5 of Olivieri et al. (2017).
(XLSX)
Background: Waterpipe smoking is a rising global public health epidemic perceived by many users to be less harmful, though its toxicity overlaps or even exceeds that of cigarette smoking. Short-term cardiovascular changes due to waterpipe smoking are well established, but longer-term health impacts are still not fully elucidated.
Objective: We aim...
India represents an intricate tapestry of population sub-structure shaped by geography, language , culture and social stratification operating in concert [1-3]. To date, no study has attempted to model and evaluate how these evolutionary forces have interacted to shape the patterns of genetic diversity within India. Geography has been shown to clos...
Background
Elevated homocysteine (Hc) levels have a well-established and clear causal relationship to epithelial damage leading to coronary artery disease. Furthermore, it is strongly associated with other metabolic syndrome variables, such as hypertension, which is correlated with type II diabetes mellitus (T2DM). Studies on T2DM in relation to Hc...
Aboriginal Australians represent one of the oldest continuous cultures outside Africa, with evidence indicating that their ancestors arrived in the ancient landmass of Sahul (present-day New Guinea and Australia) ~55 thousand years ago. Genetic studies, though limited, have demonstrated both the uniqueness and antiquity of Aboriginal Australian gen...
Archaeological, palaeontological and geological evidence shows that post-glacial warming released
human populations from their various climate-bound refugia. Yet specific connections between these
refugia and the timing and routes of post-glacial migrations that ultimately established modern
patterns of genetic variation remain elusive. Here, we us...
Background
In a cohort of children in Cyprus, we recently reported low levels of high density lipoprotein cholesterol (HDL-C) to be associated with asthma. We examined whether genetic polymorphisms that were previously linked individually to asthma, obesity, or HDL-C are associated with both asthma and HDL-C levels in the Cyprus cohort.
Methods
We...
Aboriginal Australians are one of the more poorly studied populations from the standpoint of human evolution and genetic diversity. Thus, to investigate their genetic diversity, the possible date of their ancestors’ arrival and their relationships with neighboring populations, we analyzed mitochondrial DNA (mtDNA) diversity in a large sample of Abo...
Background
Complex diseases may have multiple pathways leading to disease. E.g. coronary artery disease evolves from arterial damage to their epithelial layers, but has multiple causal pathways. More challenging, those pathways are highly correlated within metabolic syndrome. The challenge is to identify specific clusters of phenotype characteristi...
Cultural, dietary, and lifestyle factors are the main modulators of type 2 diabetes mellitus (T2DM) disease risk. Coffee is one of the most popular worldwide beverages, and recent epidemiological studies have showed that coffee consumption is associated with a lower risk of T2DM. This study investigates the impact of coffee intake on T2DM risk and...
Cultural, dietary, and lifestyle factors are main modulators of Type 2 Diabetes Mellitus
disease (T2DM) risk. Coffee is one the most popular worldwide beverages and recent
epidemiological studies showed that coffee consumption is associated with a lower risk
of T2DM. This study investigates coffee intake impact on T2DM risk and assesses the
effect...
Background:
More evidence is emerging on the strong association between chronic kidney disease (CKD) and cardiovascular disease. We assessed the relationship between coronary artery disease (CAD) and renal dysfunction level (RDL) in a group of Lebanese patients.
Methods:
A total of 1268 patients undergoing cardiac catheterization were sequential...
The role of inflammation in coronary artery disease (CAD) pathogenesis is well recognized. Moreover, smoking inhalation increases the activity of inflammatory mediators through an increase in leukotriene synthesis essential in atherosclerosis pathogenesis.
The aim of this study is to investigate the effect of "selected" genetic variants within the...
Genome-wide association studies (GWAS) of multiple populations with distinctive genetic and lifestyle backgrounds are crucial to the understanding of Type 2 Diabetes Mellitus (T2DM) pathophysiology. We report a GWAS on the genetic basis of T2DM in a 3,286 Lebanese participants. More than 5,000,000 SNPs were directly genotyped or imputed using the 1...
Supplementary Info
The onset of coronary artery disease (CAD) is influenced by cardiovascular risk factors that often occur in clusters and may build on one another. The objective of this study is to examine the relationship between hypertension and CAD age of onset in the Lebanese population.
This retrospective analysis was performed on data extracted from Lebanese...
The burden of diabetes in Lebanon requires well-targeted interventions for screening type 2 diabetes mellitus (T2DM) and prediabetes and prevention of risk factors. Newly recruited 998 Lebanese individuals, in addition to 7,292 already available, were studied to investigate the prevalence of diabetes, prediabetes and their associated risk factors....
A main underlying pathology of coronary artery disease is the deposition of cholesterol in the arteries supplying blood to the heart that leads to stenosis and myocardial infarction. We tested if dyslipidemia is a risk factor for coronary artery disease in the Lebanese population, and studied the role of the total cholesterol/HDL cholesterol (TC/HD...
Accessible biotechnology is enabling the cataloging of genetic variants in individuals in populations at unprecedented scales. The use of phylogeny of the individuals within populations allows a model-based approach to studying these variations, which is important in understanding relationships between and across populations. For the somatic genome...
Haplogroup H dominates present-day Western European mitochondrial DNA variability (>40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. Here we investigate this major component of the maternal population history of modern Europeans and sequence 39 complete haplogroup H mitoc...
Raw coancestry matrix shows relationships between the Levantines and the world populations. A) Intensity of the colors reflects the number of haplotype chunks donated to the Levantines. The vertical line is a visual aid to reflect the Levantine split observed in the tree. Horizontal lines distinguish the major geographic regions. B) coancestry matr...
World population structure inferred by ADMIXTURE analysis of >240K autosomal SNPs. A) Each horizontal line represents ancestry probabilities of an individual in 2–10 constructed ancestral populations. Levantine population names are shown in blue. B) Cross-validation plot for the world dataset.
(TIF)
Ancestry probabilities of individuals considering 10 ancestral populations. Highlighted cells indicate individuals have >60% of one component. Standard errors were estimated using 200 bootstrap replicates.
(XLS)
Description of the ROLLOFF analysis.
(PDF)
Stratified random sampling of 75 Lebanese samples. A) 25 samples from each of the three main religion groups in Lebanon were randomly chosen from the 1,341 samples illustrated in Figure 1. B) Map of Lebanon showing the distribution of the samples.
(TIF)
Principle component analysis generated with fineSTRUCTURE using ChromoPainter's coancestry matrix showing the top two components. A) Plot shows global diversity using 50 populations. B) Magnification of West Asia region showing the Levantine populations in their regional and religion context.
(TIF)
The Levant is a region in the Near East with an impressive record of continuous human existence and major cultural developments since the Paleolithic period. Genetic and archeological studies present solid evidence placing the Middle East and the Arabian Peninsula as the first stepping-stone outside Africa. There is, however, little understanding o...
The Middle East was a funnel of human expansion out of Africa, a staging area for the Neolithic Agricultural Revolution, and the home to some of the earliest world empires. Post LGM expansions into the region and subsequent population movements created a striking genetic mosaic with distinct sex-based genetic differentiation. While prior studies ha...
Fisher exact tests for haplogroup frequencies vs. population within the Middle East.
(XLS)
Populations comparison based on Y haplogroups a) Principal Component Analysis of relative frequencies of Y haplogroups within populations, b) with mean-linkage (UPGMA) dendrogram determined from Euclidean distances.
(TIF)
mtDNA
FST
distances between populations.
(XLS)
Y STR
RST
distances between populations.
(XLS)
The Middle East was a funnel of human expansion out of Africa, a staging area for the Neolithic Agricultural Revolution, and the home to some of the earliest world empires. Post LGM expansions into the region and subsequent population movements created a striking genetic mosaic with distinct sex-based genetic differentiation. While prior studies ha...
The dispersal of the human population to all the continents of the globe is a compelling story that can possibly be unravelled from the genetic landscape of the current populations. Indeed, a grasp on this strengthens the understanding of relationship between populations for anthropological as well as medical applications. While the collective geno...
Previous studies that pooled Indian populations from a wide variety of geographical locations, have obtained contradictory conclusions about the processes of the establishment of the Varna caste system and its genetic impact on the origins and demographic histories of Indian populations. To further investigate these questions we took advantage that...
Modal tree obtained by BATWING indicating the coalescence time divergence estimates (in years) among Major Populations Groups (MPG) using 17 STRs from haplogroup (a) F-M89, (b) H1-M52, (c) L1-M26/M72.
(TIFF)
List of Y chromosome SNPS and haplotype data for the 1680 individuals from 31 tribal and non-tribal populations presented in this study.
(XLS)
Modal tree obtained by BATWING indicating the coalescence time divergence estimates (in years) among endogamous populations within (a) HTF and HTK groups, (b) DLF, (c) BRH and HTC, using 17 STRs from all haplogroups.
(TIFF)
AMOVA analysis of various population groupings based on the 17STR haplotype & 95%CI based on re-sampling of the samples across populations.
(XLS)
PCA plot showing the first two principal components of haplogroup frequencies for 97 non-tribal (circles) and tribal (squares) populations of India and nearby regions from previous publications, compared to the non-tribal (horizontal ovals) and tribal (diamonds) populations from the present study. Symbols have been colored according to linguistic c...
Reduced median network of 17 microsatellite haplotypes within haplogroup. (a) HG C-M130 using 74 chromosomes, (b) HG H1-M52 using 292 chromosomes (c) HG H- M69 using 79 chromosomes, (d) HG L1 – M27/M76 using 235 chromosomes, (e) HG R1a1-M17 using 214 chromosomes. Circles are colored based on the 7 Major Population Groups as shown in Figure 1, and t...
List of population codes and their publication references used in Figure S1.
(XLS)
Fishers exact test
p
-values for the NRY HG frequencies among the 7 Major Populations Groups and among the 31 sampled populations.
(XLS)
The manifestation of coronary artery disease (CAD) follows a well-choreographed series of events that includes damage of arterial endothelial cells and deposition of lipids in the sub-endothelial layers. Genome-wide association studies (GWAS) of multiple populations with distinctive genetic and lifestyle backgrounds are a crucial step in understand...
For decades, the peopling of the Americas has been explored through the analysis of uniparentally inherited genetic systems in Native American populations and the comparison of these genetic data with current linguistic groupings. In northern North America, two language families predominate: Eskimo-Aleut and Na-Dene. Although the genetic evidence f...
Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and later became a crossroad for expanding civilizations and empires. Afghanistan's location, history, and diverse ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how major cultural evolutions an...
Suggested origins of the main ethnic groups in Afghanistan.
(DOC)
Populations selected for this study.
(XLS)
AMOVA results. Comparing populations grouped according to their country or region of origin with populations grouped according to Barrier structures.
(DOC)
Y-chromosome haplogroups frequencies in Afghanistan's ethnic groups.
(XLS)
BATWING topologies and dates with 95% confidence intervals of population splits derived from multiple combinations of population subsets.
(XLS)
Reduced median networks. (A) C-M130, (B) R1a1a-M17, (C) E1b1b1-M35, and (D) B-M60 showing STR haplotype distributions among populations; area is proportional to haplotype frequency, and color indicates populations. Connecting lines represent putative phylogenetic relationships between haplotypes.
(TIF)
Y-chromosome haplogroups and haplotypes in 204 unrelated individuals from Afghanistan.
(XLS)
Basque people have received considerable attention from anthropologists, geneticists, and linguists during the last century due to the singularity of their language and to other cultural and biological characteristics. Despite the multidisciplinary efforts performed to address the questions of the origin, uniqueness, and heterogeneity of Basques, t...