About
74
Publications
9,014
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,113
Citations
Introduction
Additional affiliations
April 2011 - December 2013
February 2008 - February 2011
September 2006 - February 2011
Publications
Publications (74)
The human microbiome has become an area of intense research due to its potential impact on human health. However, the analysis and interpretation of this data have proven to be challenging due to its complexity and high dimensionality. Machine learning (ML) algorithms can process vast amounts of data to uncover informative patterns and relationship...
Harmful algal blooms are natural phenomena that cause shellfish contamination due to the rapid accumulation of marine biotoxins. To prevent public health risks, the Portuguese Institute of the Ocean and the Atmosphere (IPMA) regularly monitors toxic phytoplankton in shellfish production areas and temporarily closes shellfish production when biotoxi...
Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preproce...
Gliomas are primary malignant brain tumors with poor survival and high resistance to available treatments. Improving the molecular understanding of glioma and disclosing novel biomarkers of tumor development and progression could help to find novel targeted therapies for this type of cancer. Public databases such as The Cancer Genome Atlas (TCGA) p...
The rapid development of machine learning (ML) techniques has opened up the data-dense field of microbiome research for novel therapeutic, diagnostic, and prognostic applications targeting a wide range of disorders, which could substantially improve healthcare practices in the era of precision medicine. However, several challenges must be addressed...
Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommenda...
Recent studies have shown that gut microbiome is associated with colorectal cancer (CRC) progression and anti-cancer therapy efficacy. This study aims to optimize the ridge, elastic net, and lasso regularized generalized linear models (GLM), widely used for supervised machine learning, for multiclass classification tasks (healthy/adenoma/carcinoma)...
Tumor heterogeneity is a challenge to designing effective and targeted therapies. Glioma-type identification depends on specific molecular and histological features, which are defined by the official WHO classification CNS. These guidelines are constantly updated to support the diagnosis process, which affects all the successive clinical decisions....
The understanding of glioma disease has been evolving drastically with the dedicated research into the genetic and molecular profiling of glioma tumour tissue. Molecular biomarkers have gained progressive and substantial importance in providing diagnostic information, leading to groundbreaking changes in the tumour classification system and taxonom...
Colorectal cancer (CRC) is the third most common cancer and the second most deathly worldwide. It is a very heterogeneous disease that can develop via distinct pathways where metastasis is the primary cause of death. Therefore, it is crucial to understand the molecular mechanisms underlying metastasis. RNA-sequencing is an essential tool used for s...
Effective diagnosis and treatment in cancer is a barrier for the development of personalized medicine, mostly due to tumor heterogeneity. In the particular case of gliomas, highly heterogeneous brain tumors at the histological, cellular and molecular levels, and exhibiting poor prognosis, the mechanisms behind tumor heterogeneity and progression re...
Diarrhetic Shellfish Poisoning (DSP) is an acute intoxication caused by the consumption of contaminated shellfish, which is common in many regions of the world. To safeguard human health, most countries implement programs focused on the surveillance of toxic phytoplankton abundance and shellfish toxicity levels, an effort that can be complemented b...
Harmful algal blooms (HABs) and the consequent contamination of shellfish are complex processes depending on several biotic and abiotic variables, turning prediction of shellfish contamination into a challenging task. Not only the information of interest is dispersed among multiple sources, but also the complex temporal relationships between the ti...
The extraction of novel information from omics data is a challenging task, in particular, since the number of features (e.g. genes) often far exceeds the number of samples. In such a setting, conventional parameter estimation leads to ill-posed optimization problems, and regularization may be required. In addition, outliers can largely impact class...
Precision medicine has emerged to tailor clinical decisions based on patient genetic features in a personalized healthcare perspective. The ultimate goal is to drive disease diagnosis and treatment selection based on the patient molecular profiles, usually given by large volumes of data, which is intrinsically high-dimensional, heterogeneous, noisy...
Harmful algal blooms (HABs) are among the most severe ecological marine problems worldwide. Under favorable climate and oceanographic conditions, toxin-producing microalgae species may proliferate, reach increasingly high cell concentrations in seawater, accumulate in shellfish, and threaten the health of seafood consumers. There is an urgent need...
Network science has long been recognized as a well-established discipline across many biological domains. In the particular case of cancer genomics, network discovery is challenged by the multitude of available high-dimensional heterogeneous views of data. Glioblastoma (GBM) is an example of such a complex and heterogeneous disease that can be tack...
The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address...
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed...
Random sample consensus (Ransac) is a technique that has been widely used for modeling data with a large amount of noise. Although successfully employed in areas such as computer vision, extensive testing and applications to clinical data, particularly in oncology, are still lacking. We applied this technique to synthetic and biomedical datasets, p...
Colorectal cancer (CRC) is one of the leading causes of mortality and morbidity in the world. Being a heterogeneous disease, cancer therapy and prognosis represent a significant challenge to medical care. The molecular information improves the accuracy with which patients are classified and treated since similar pathologies may show different clini...
Microarray and RNA-sequencing (RNA-seq) gene expression data alongside machine learning algorithms are promising in the discovery of new cancer biomarkers. However, even though they are similar in purpose, there are some fundamental differences between the two techniques. We propose a methodology for cross-platform integration, and biomarker discov...
The accessibility to “big data” sets down an ambitious challenge in the medical field, especially in personalized medicine, where gene expression data are increasingly being used to establish a diagnosis and optimize treatment of oncological patients. However, the high-dimensionality nature of the data brings many constraints, for which several app...
Background:
Understanding cellular and molecular heterogeneity in glioblastoma (GBM), the most common and aggressive primary brain malignancy, is a crucial step towards the development of effective therapies. Besides the inter-patient variability, the presence of multiple cell populations within tumors calls for the need to develop modeling strate...
Breast invasive carcinoma (BRCA) and prostate adenocarcinoma (PRAD) are two of the most common types of cancer in women and men, respectively. As hormone-dependent tumours, BRCA and PRAD share considerable underlying biological similarities worth being exploited. The disclosure of gene networks regulating both types of cancers would potentially all...
Background
Breast and prostate cancers are typical examples of hormone-dependent cancers, showing remarkable similarities at the hormone-related signaling pathways level, and exhibiting a high tropism to bone. While the identification of genes playing a specific role in each cancer type brings invaluable insights for gene therapy research by target...
Network information is gaining importance in the generation of predictive models in cancer genomics, with the premise that prior biological knowledge offers the models interpretability and reproducibility, an invaluable contribution in precision medicine. This work evaluates the usefulness of accounting for gene network information provided by the...
Harmful algal blooms are responsible worldwide for the contamination of fishery resources, with potential impacts on seafood safety and public health. Most coastal countries rely on an intense monitoring program for the surveillance of toxic algae occurrence and shellfish contamination. The present study investigates the use of near infrared (NIR)...
Data availability by modern sequencing technologies represents a major challenge in oncological survival analysis, as the increasing amount of molecular data hampers the generation of models that are both accurate and interpretable. To tackle this problem, this work evaluates the introduction of graph centrality measures in classical sparse surviva...
Correct classification of breast cancer subtypes is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we analyze Breast Invasive Carcinoma transcriptomic data publicly av...
(Statistical Methods in Medical Research, in press.) Correct classification of breast cancer sub-types is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer (TNBC) which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we anal...
Background:
Learning accurate models from 'omics data is bringing many challenges due to their inherent high-dimensionality, e.g. the number of gene expression variables, and comparatively lower sample sizes, which leads to ill-posed inverse problems. Furthermore, the presence of outliers, either experimental errors or interesting abnormal clinica...
BACKGROUND
A considerable effort has been put forth by the biopharmaceutical industry to guarantee high‐quality products and patient safety. Ensuring the reproducibility of microbial cultures is essential to achieve such high standards. Reproducibility is usually assessed by offline, costly and time‐consuming analyses of the final product, disregar...
Background
Survival analysis is a statistical technique widely used in many fields of science, in particular in the medical area, and which studies the time until an event of interest occurs. Outlier detection in this context has gained great importance due to the fact that the identification of long or short-term survivors may lead to the detectio...
The monitoring of biopharmaceutical products using Fourier transform infrared (FT-IR) spectroscopy relies on calibration techniques involving the acquisition of spectra of bioprocess samples along the process. The most commonly used method for that purpose is partial least squares (PLS) regression, under the assumption that a linear model is valid....
To increase the knowledge of the recombinant cyprosin production process in Saccharomyces cerevisiae cultures, it is relevant to implement efficient bioprocess monitoring techniques. The present work focuses on the implementation of a mid-infrared (MIR) spectroscopy-based tool for monitoring the recombinant culture in a rapid, economic, and high-th...
Escherichia coli is one of the most used host microorganism for the production of recombinant products, such as heterologous proteins and plasmids. However genetic, physiological and environmental factors influence the plasmid replication and cloned gene expression in a highly complex way. To control and optimize the recombinant expression system p...
Human mesenchymal stem/stromal cells (MSCs) have received considerable attention in the field of cell-based therapies due to their high differentiation potential and ability to modulate immune responses. However, since these cells can only be isolated in very low quantities, successful realization of these therapies requires MSCs ex-vivo expansion...
Reporter genes are routinely used in every laboratory for molecular and cellular biology for studying heterologous gene expression and general cellular biological mechanisms, such as transfection processes. Although well characterized and broadly implemented, reporter genes present serious limitations, either by involving time-consuming procedures...
The development of biopharmaceutical manufacturing processes presents critical constraints, with the major constraint being that living cells synthesize these molecules, presenting inherent behavior variability due to their high sensitivity to small fluctuations in the cultivation environment.
To speed up the development process and to control this...
Helicobacter pylori infection represents a serious health problem, given its association with serious gastric diseases as gastric ulcers, cancer and MALT lymphoma. Currently no vaccine exists and antibiotic-based eradication therapy is already failing in more than 20% of cases. To increase the knowledge on the infection process diverse gastric cell...
Background
While the pharmaceutical industry keeps an eye on plasmid DNA production for new generation gene therapies, real-time monitoring techniques for plasmid bioproduction are yet unavailable. This work shows the possibility of in-situ monitoring of plasmid production in Escherichia coli cultures using a near infrared (NIR) fiber optic probe.R...
Near Infrared (NIR) spectroscopy was used to in situ monitoring the cultivation of two recombinant Saccharomyces cerevisiae strains producing heterologous cyprosin B. NIR spectroscopy is a fast and non-destructive technique, that by being based on overtones and combinations of molecular vibrations requires chemometrics tools, such as partial least...
The need for the development of economic high plasmid production in Escherichia coli cultures is emerging, as a result of the latest advances in DNA vaccination and gene therapy. In order to contribute to achieve that, a model describing the kinetics involved in the bioproduction of plasmid by recombinant E. coli DH5α is presented, as an attempt to...
The development of in-situ monitoring techniques enabling the real-time acquisition of information concerning the key variables over different Escherichia coli cultivation conditions and strategies is a crucial step towards the optimization of a plasmid bioproduction process. This work shows the use of a Near-InfraRed (NIR) fiber optic probe immers...
An integrated approach for modelling, monitoring and control the plasmid bioproduction in Escherichia coli cultures is presented. In a first stage, by the implementation of a kinetic model for E. coli cultures, a better bioprocess understanding was reached, concerning the availability of nutrients and products along the bioprocess, and their effect...
Methods for fast and accurate identification and quantification of the composition of pharmaceutical mixtures are important in many scientific and industrial applications. When this goal is approached via hyperspectral data analysis, the problem becomes one of hyperspectral unmixing, where the goal is to identify the pure materials (also called end...
Many pharmaceutical problems require chemical identification of the ingredients present in a drug product, e.g., a tablet. Examples include the identification of the compounds present in many steps of the manufacturing process and the chemical characterization of counterfeit and third-party tablets. Hyperspectral unmixing of near-infrared images is...
Methods for fast and accurate identification and quantification of the composition of pharmaceutical mixtures are important in many scientific and industrial applications. When this goal is approached via hyperspectral data analysis, the problem becomes one of hyperspectral unmixing, where the goal is to identify the pure materials (also called end...
The adequacy of quantification of the components in non-homogeneous pharmaceutical tablets using near infrared (NIR) linear hyperspectral unmixing has been studied with and without the presence of the tablet coating and an inefficient blending process. NIR images of six coated tablets of different formulations and of sections thereof, extracted at...
A rapid detection of the nonauthenticity of suspect tablets is a key first step in the fight against pharmaceutical counterfeiting. The chemical characterization of these tablets is the logical next step to evaluate their impact on patient health and help authorities in tracking their source. Hyperspectral unmixing of near-infrared (NIR) image data...
Counterfeit pharmaceutical products pose a serious public health problem. It is thus important not only to detect them, but also to identify their composition and assess the risk for the patient. Identifying the spectral signatures of the pure compounds present in a (maybe counterfeit) tablet of unknown origin is clearly a hyperspectral unmixing pr...
According to the WHO definition for counterfeit medicines, several categories can be established, e.g., medicines containing the correct active pharmaceutical ingredient (API) but different excipients, medicines containing low levels of API, no API or even a substitute API. Obviously, these different scenarios will have different detrimental effect...
Near infrared chemical imaging (NIR-CI) analysis was performed on 55 counterfeit Heptodin tablets obtained from a market survey and an additional 11 authentic Heptodin tablets for comparison. The aim of the study was to investigate whether NIR-CI can be used to detect the counterfeit tablets and to classify/source them so as to understand the possi...
Bootstrap-based methods have been applied for spectral variable selection in near (NIR) and mid-infrared (MIR) spectroscopy applications. In this paper, an extension of those methods for the selection of spectral intervals instead of single spectral variables is proposed. This approach, interval partial least square (PLS)-Bootstrap (iPLS-Bootstrap)...
The relative importance of nursery areas and their relationships with several environmental variables were evaluated in nine estuarine systems along the Portuguese coast based on trawl surveys. Historical data were used to outline changes and trends in the nursery function of some of these estuaries over the past decades. The dominant flatfish spec...
The diets of slender snipefish Macroramphosus gracilis, longspine snipefish Macroramphosus scolopax and boarfish Capros aper, three very abundant species on the Portuguese coast, were studied from samples collected between July 2002 and April 2003. Variations in the diet with fish size, season and area, as well as diet overlap and diversity, are ex...
The existence of two species of the genus Macroramphosus Lacepède 1803, has been discussed based on morphometric characters, diet composition and depth distribution. Another species,
the boarfish Capros aper (Linnaeus 1758), caugth along the Portuguese coast, shows two different morphotypes, one type with smaller eyes and a deeper
body than the oth...
The diets and the trophic niche overlap between seven flatfish species were studied in a coastal nursery adjoining to the Tagus estuary (Portugal). Fish were sampled monthly, from March to November 1999, using a beach seine. Arnoglossus imperialis (Rafinesque, 1810), Arnoglossus laterna (Walbaum, 1792) and Arnoglossus thori Kyle, 1913, fed mainly o...