About
83
Publications
13,044
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,103
Citations
Introduction
Highly motivated individual with a penchant for problem solving in biomedical research. Background and interests include big data analytics, machine learning, bioinformatics, health informatics, ontologies, data mining, artificial intelligence, biomedical data integration, epigenetics and omics in general.
Current institution
Additional affiliations
Education
August 2012 - June 2013
October 2010 - April 2014
August 2003 - September 2008
Publications
Publications (83)
Purpose
Next-generation sequencing has implicated some risk variants for human spina bifida (SB), but the genome-wide contribution of structural variation to this complex genetic disorder remains largely unknown. We examined copy-number variant (CNV) participation in the genetic architecture underlying SB risk.
Methods
A high-confidence ensemble a...
The novel betacoronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused a worldwide pandemic (COVID-19) after emerging in Wuhan, China. Here we analyzed public host and viral RNA sequencing data to better understand how SARS-CoV-2 interacts with human respiratory cells. We identified genes, isoforms and transposable element fa...
Significance
Genetic investigations of most structural birth defects, including spina bifida (SB), congenital heart disease, and craniofacial anomalies, have been underpowered for genome-wide association studies because of their rarity, genetic heterogeneity, incomplete penetrance, and environmental influences. Our systems biology strategy to inves...
Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRedict...
Background
Polygenic Risk Scores (PRS) are important in predicting disease risk and are usually rely on markers selected by thresholding p‐values from genome‐wide association studies (GWAS). In traditional approaches, one single model is built to calculate risk scores, employing effect size to determine additive risk. However, this traditional meth...
Purpose
Spina bifida (SB) arises from complex genetic interactions that converge to interfere with neural tube closure. Understanding the precise patterns conferring SB risk requires a deep exploration of the genomic networks and molecular pathways that govern neurulation. This study aims to delineate genome-wide regulatory signatures underlying SB...
Minor intron-containing genes (MIGs) account for less than 2% of all human protein coding genes and are uniquely dependent on the minor spliceosome for proper excision. Despite their low numbers, we surprisingly found significant enrichment of MIG-encoded proteins (MIG-Ps) in protein-protein interactomes and host factors of positive sense RNA virus...
Autism Spectrum Disorder (ASD) is a group of complex neurodevelopmental disorders that affects about 1% of the world’s population, impacting the quality of life of not only the diagnosed individuals but also their communities. Early detection and intervention are paramount to limit its effect on a child’s development, however overlap with other dis...
Tandem repeats (TRs) are polymorphic sequences of DNA that are composed of repeating units of motifs ranging from 2-6 base pairs in length. Expansions of TRs are responsible for approximately 50 monogenic diseases, compared to over 4,300 disease causing genes disrupted by single nucleotide variants and small indels. It appears thus reasonable to ex...
The lung microbiome impacts on lung function, making any smoking-induced changes in the lung microbiome potentially significant. The complex co-occurrence and co-avoidance patterns between the bacterial taxa in the lower respiratory tract (LRT) microbiome were explored for a cohort of active (AS), former (FS) and never (NS) smokers. Bronchoalveolar...
The lung microbiome impacts lung function, making any smoking-induced changes in the lung microbiome potentially significant. The complex co-occurrence and co-avoidance patterns between the bacterial taxa in the LRT microbiome were explored for a cohort of active (AS), former (FS), and never (NS) smokers. Bronchoalveolar lavages (BAL) were collecte...
The pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) revealed the world's unpreparedness to deal with the emergence of novel pathogenic viruses, pointing to the urgent need to identify targets for broad-spectrum antiviral strategies. Here, we report that proteins encoded by Minor Intron-containing Genes (MIGs) are...
The lung microbiome impacts lung function, making any smoking-induced changes in the lung microbiome potentially significant. The complex co-occurrence and co-avoidance patterns between the bacterial taxa in the LRT microbiome were explored for a cohort of active (AS), former (FS), and never (NS) smokers. Bronchoalveolar lavages (BAL) were collecte...
Neural Tube Defects (NTDs) are congenital malformations resulting from abnormal embryonic development of the brain, spine, or spinal column. The genetic etiology of human NTDs remains poorly understood despite intensive investigation. CIC, homolog of the Capicua transcription repressor, has been reported to interact with ataxin‐1 (ATXN1) and partic...
Neural Tube Defects (NTDs) are congenital malformations resulting from abnormal embryonic development of the brain, spine, or spinal column. The genetic etiology of human NTDs remains poorly understood despite intensive investigation. CIC, homolog of the Capicua transcription repressor, has been reported to interact with ataxin-1 (ATXN1) and partic...
Information extracted from electronic health records (EHRs) is used for predictive tasks and clinical pattern recognition. Machine learning techniques also allow the extraction of knowledge from EHR. This study is a continuation of previous work in which EHRs were exploited to make predictions about patients with respiratory diseases. In this study...
Spina bifida (SB) is a debilitating birth defect caused by multiple gene and environment interactions. Though SB shows non-Mendelian inheritance, genetic factors contribute to an estimated 70% of cases. Nevertheless, identifying human mutations conferring SB risk is challenging due to its relative rarity, genetic heterogeneity, incomplete penetranc...
A Correction to this paper has been published: https://doi.org/10.1038/s41422-021-00475-z
Tissue‐specific differentially methylated regions (tDMRs) are regions of the genome with methylation patterns that modulate gene expression in those tissue types. The detection of tDMRs in forensic evidence can permit the identification of body fluids at trace levels. In this report we have performed a bioinformatic analysis of an existing array da...
The human sperm is one of the smallest cells in the body, but also one of the most important, as it serves as the entire paternal genetic contribution to a child. Investigating RNA and mutations in sperm is especially relevant for diseases such as autism spectrum disorders (ASD), which have been correlated with advanced paternal age. Historically,...
The exploitation of electronic health records (EHRs) has multiple utilities, from predictive tasks and clinical decision support to pattern recognition. Artificial Intelligence (AI) allows to extract knowledge from EHR data in a practical way. In this study, we aim to construct a Machine Learning model from EHR data to make predictions about patien...
The novel betacoronavirus named Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) caused a worldwide pandemic (COVID-19) after initially emerging in Wuhan, China. Here we applied a novel, comprehensive bioinformatic strategy to public RNA sequencing and viral genome sequencing data, to better understand how SARS-CoV-2 interacts with huma...
The novel betacoronavirus named Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) caused a worldwide pandemic (COVID-19) after initially emerging in Wuhan, China. Here we applied a novel, comprehensive bioinformatic strategy to public RNA sequencing and viral genome sequencing data, to better understand how SARS-CoV-2 interacts with huma...
As part of the virtual BioHackathon 2020, we formed a working group that focused on the analysis of gene expression in the context of COVID-19. More specifically, we performed transcriptome analyses on published datasets in order to better understand the interaction between the human host and the SARS-CoV-2 virus.The ideas proposed during this hack...
DNA damage response (DDR) genes orchestrating the network of DNA repair, cell cycle control, are essential for the rapid proliferation of neural progenitor cells (NPC). To date, the potential association between specific DDR genes and the risk of human neural tube defects (NTDs) has not been investigated. Using whole‐genome sequencing (WGS) and tar...
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Background
Computing centrality is a foundational concept in social networking that involves finding the most “central” or important nodes. In some biological networks defining importance is difficult, which then creates challenges in finding an appropriate centrality algorithm.
Results
We instead generalize the results of any k centrality algorit...
The proliferation of technology to collect and generate data has opened new opportunities for global health informatics. In this context, applications that use data analytics and decision support methods are becoming increasingly important. In this chapter we review recent literature that fall under these two categories. Within the knowledge discov...
Objectives: To identify common methodological challenges and review relevant initiatives related to the re-use of patient data collected in routine clinical care, as well as to analyze the economic benefits derived from the secondary use of this data. Through the use of several examples, this article aims to provide a glimpse into the different are...
Background
The notion of centrality is used to identify “important” nodes in social networks. Importance of nodes is not well-defined, and many different notions exist in the literature. The challenge of defining centrality in meaningful ways when network edges can be positively or negatively weighted has not been adequately addressed in the litera...
Objectives:
To identify common methodological challenges and review relevant initiatives related to the re-use of patient data collected in routine clinical care, as well as to analyze the economic benefits derived from the secondary use of this data. Through the use of several examples, this article aims to provide a glimpse into the different ar...
This work aims at predicting the symptom severity and contagiousness of a person infected with a respiratory virus, using time series gene expression data. Four different respiratory viruses were studied – RSV, H1N1, H3N2, and Rhinovirus. Predictive models were built for each virus for each time point. Partial least squares discriminant analysis wa...
Microbiomes are ubiquitous and are found in the ocean, the soil, and in/on other living organisms. Changes in the microbiome can impact the health of the environmental niche in which they reside. In order to learn more about these communities, different approaches based on data from multiple omics have been pursued. Metagenomics produces a taxonomi...
Using microarray and bioinformatics, we examined the gene expression profiles in transgenic mouse hearts expressing mutations in the myosin regulatory light chain shown to cause hypertrophic cardiomyopathy (HCM). We focused on two malignant RLC-mutations, Arginine58→Glutamine (R58Q) and Aspartic Acid166→Valine (D166V), and one benign, Lysine104→Glu...
Epigenetics is the study of heritable changes in gene expression resulting from modifications in chromatin structure, without involving changes in the genetic information stored in DNA. In addition to their critical role in regulating cell differentiation and development, epigenetic modifications have also been linked to human diseases, most notabl...
Background. Harmful Algal Blooms (HABs) responsible for Diarrhetic Shellfish Poisoning (DSP) represent a major threat for human consumers of shellfish. The biotoxin Okadaic Acid (OA), a well-known phosphatase inhibitor and tumor promoter, is the primary cause of acute DSP intoxications. Although several studies have described the molecular effects...
The complex information encoded into the element connectivity of a system gives rise to the possibility of graphical processing of divisible systems by using the Graph theory. An application in this sense is the quantitative characterization of molecule topologies of drugs, proteins and nucleic acids, in order to build mathematical models as Quanti...
Background: Diarrhetic Shellfish Poisoning (DSP) Harmful Algal Blooms (HABs) represent a major threat for human consumers of shellfish. The biotoxin Okadaic Acid (OA), a well-known phosphatase inhibitor and tumor promoter, is the main responsible of acute DSP intoxications. Although several studies have described the molecular effects of high OA co...
Background: Diarrhetic Shellfish Poisoning (DSP) Harmful Algal Blooms (HABs) represent a major threat for human consumers of shellfish. The biotoxin Okadaic Acid (OA), a well-known phosphatase inhibitor and tumor promoter, is the main responsible of acute DSP intoxications. Although several studies have described the molecular effects of high OA co...
Knowledge Management (KM) can be seen as the process of capturing, developing, sharing, and effectively using organizational knowledge. In this context, the work presented here proposes a KM System to be used in the scope of chronic patient control and monitoring for distributed research projects. It was designed in order to enable communication be...
This work presents the results of applying two clustering techniques
to gene expression data from the mussel Mytilus galloprovincialis. The
objective of the study presented in this paper was to cluster the different genes
involved in the experiment, in order to find those most closely related based on
their expression patterns. A self-organising ma...
The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecu...
Schizophrenia is a complex disease, with both genetic and environmental influence. Machine learning techniques can be used to associate different genetic variations at different genes with a (schizophrenic or non-schizophrenic) phenotype. Several machine learning techniques were applied to schizophrenia data to obtain the results presented in this...
Okadaic Acid (OA) constitutes the main active principle in Diarrhetic Shellfish Poisoning (DSP) toxins produced during Harmful Algal Blooms (HABs), representing a serious threat for human consumers of edible shellfish. Furthermore, OA conveys critical deleterious effects for marine organisms due to its genotoxic potential. Many efforts have been de...
Supplementary Materials (PDF, 526 KB)
In recent years, in the post genomic era, more and more data is being generated by biological high throughput technologies, such as proteomics and transcriptomics. This omics data can be very useful, but the real challenge is to analyze all this data, as a whole, after integrating it. Biomedical data integration enables making queries to different,...
In this work, a data integration approach using a federated model based on a service oriented architecture (SOA) is presented. The BioMOBY middleware was used to implement each service which is part of the integration process. As an example of usage of this architecture, a web tool for candidate SNP selection has been developed. Thus, several BioMO...
ANNs are one of the most successful learning systems. For this reason, many techniques have been published that allow the obtaining of feed-forward networks. However, few works describe techniques for developing recurrent networks. This work uses a genetic algorithm for automatic recurrent ANN development. This system has been applied to solve a we...
In recent years, in the post genomic era, more and more data is being generated by biological high throughput technologies, such as proteomics and transcriptomics. This omics data can be very useful, but the real challenge is to analyze all this data, as a whole, after integrating it. Biomedical data integration enables making queries to different,...
Aging and life quality is an important research topic nowadays in areas such as life sciences, chemistry, pharmacology, etc. People live longer, and, thus, they want to spend that extra time with a better quality of life. At this regard, there exists a tiny subset of molecules in nature, named antioxidant proteins, that may influence the aging proc...
Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of...
In a near future, each person will incorporate his/her own sequenced genome in his/her electronic health record. In that precise moment, genomic medicine will be fundamental for clinical practice, as an essential key of personalized medicine. All the genomic data, as well as other 'omics' and clinical data necessary for personalized medicine, are s...
Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of biomedical data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of...
Fast cancer diagnosis represents a real necessity in applied medicine due to the importance of this disease. Thus, theoretical models can help as prediction tools. Graph theory representation is one option because it permits us to numerically describe any real system such as the protein macromolecules by transforming real properties into molecular...
En este trabajo, los autores han adaptado de manera sencilla el comportamiento de un ADN en un modelo artificial. Dicho modelo tiene como objetivo el ser utilizado para extraer reglas de clasificación para un conjunto de problemas bien conocidos. Esta aproximación ha mostrado en las pruebas un comportamiento similar a las referencias más importante...
Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of biomedical data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of...
This work presents a method based on genetic algorithms (GAs) which follows the Iterative Rule Learning (IRL) approach for association rule mining. It was applied to real data from schizophrenic patients, as well as simulated data generated with the HAP-SAMPLE software. A comparison with a widely-used software based on dimensionality reduction, (MD...
This work presents a tool which is an online implementation of the best machine learning-based model obtained after an exhaustive computational study. Twelve techniques were applied to schizophrenia data to obtain the results of this study and, with these, Quantitative Genotype – Disease Relationships (QDGRs) for disease prediction. Thus, the tool...
Randic connectivity index (X1) is a well known quantitative measure of connectedness patterns in molecular graphs, as developed by Randic. This index has been demonstrated to be a successful predictor in quantitative structure – activity/property studies for molecules. Different authors have used X1 to predict carbonic anhydrase inhibitors, lipophi...
Nowadays, medical practice needs, at the patient Point-of-Care (POC), personalised knowledge adjustable in each moment to the clinical needs of each patient, in order to provide support to decision-making processes, taking into account personalised information. To achieve this, adapting the hospital information systems is necessary. Thus, there is...
En este artículo, se ha desarrollado un nuevo método basado en el paradigma de evolución diferencial, añadiendo la característica de longitud de genotipo variable adaptado para trabajar con problemas de predicción de series temporales especiales. Esta aproximación se ha probado sobre datos de precipitaciones de lluvia para poder predecir en tiempo...
Single nucleotide polymorphisms (SNPs) can be used as inputs in disease computational studies such as pattern searching and classification models. Schizophrenia is an example of a complex disease with an important social impact. The multiple causes of this disease create the need of new genetic or proteomic patterns that can diagnose patients using...
A new algorithm is presented for finding genotype-phenotype association rules from data related to complex diseases. The algorithm was based on genetic algorithms, a technique of evolutionary computation. The algorithm was compared to several traditional data mining techniques and it was proved that it obtained better classification scores and foun...
The theoretical study of van der Waals interactions by transforming ab initio Coupled Cluster interaction energies of NeAr , N 2-Ar, acetylene-Ar, cyclopropane-Ar and fluorobenzene-Ar in complex networks is proposed. The topics include the general topology, the local structure (triadic census), the node degree distribution, and the shortest van der...
Differential evolution is a successful approach to solve optimization problems. The way it performs the creation of the individual allows a spontaneous self-adaptability to the function. In this paper, a new method based on differential evolution paradigm has been developed. An innovative feature has been added: the variable length of the genotype,...
A new algorithm is presented for finding genotype-phenotype association rules from data related to complex diseases. The algorithm was based on genetic algorithms, a technique of evolutionary computation. The algorithm was compared to several traditional data mining techniques and it was proved that it obtained better classification scores and foun...
The impact of cancer in the society has created the necessity of new and faster theoretical models that may allow earlier cancer detection. The present review gives the prediction of cancer by using the star graphs of the protein sequences and proteome mass spectra by building a Quantitative Protein - Disease Relationships (QPDRs), similar to Quant...
A new algorithm is presented for finding genotype-phenotype association rules from data related to complex diseases. The algorithm was based on Genetic Algorithms, a technique of Evolutionary Computation. The algorithm was compared to several traditional data mining techniques and it was proved that it obtained similar classification scores but fou...