## About

43

Publications

3,848

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

328

Citations

Introduction

**Skills and Expertise**

## Publications

Publications (43)

The k -Means algorithm is one of the most popular choices for clustering data but is well-known to be sensitive to the initialization process. There is a substantial number of methods that aim at finding optimal initial seeds for k -Means, though none of them is universally valid. This paper presents an extension to longitudinal data of one of such...

The concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensional gene expression data. This depth definition relates the...

The $k$-Means algorithm is one of the most popular choices for clustering data but is well-known to be sensitive to the initialization process. There is a substantial number of methods that aim at finding optimal initial seeds for $k$-Means, though none of them are universally valid. This paper presents an extension to longitudinal data of one of s...

We study a granular gas of viscoelastic particles (kinetic energy loss upon collision is a function of the particles' relative velocities at impact) subject to a stochastic thermostat. We show that the system displays anomalous cooling and heating rates during thermal relaxation processes, this causing the emergence of thermal memory. In particular...

The k-means algorithm is widely used in various research fields because of its fast convergence to the cost function minima; however, it frequently gets stuck in local optima as it is sensitive to initial conditions. This paper explores a simple, computationally feasible method, which provides k-means with a set of initial seeds to cluster datasets...

We study a granular gas of viscoelastic particles, i.e, the kinetic energy loss upon collision, characteristic of granular materials, is a function of the particles relative velocities at impact. In order to characterize thermal memory in this system, we study the temperature relaxation curves when the granular gas is subject to sudden thermostat c...

A system of smooth “frozen” Janus-type disks is studied. Such disks cannot rotate and are divided by their diameter into two sides of different inelasticities. Taking as a reference a system of colored elastic disks, we find differences in the behavior of the collisions once the anisotropy is included. A homogeneous state, akin to the homogeneous c...

A system of smooth "frozen" Janus-type disks is studied. Such disks cannot rotate and are divided by their diameter into two sides of different inelasticities. Taking as a reference a system of colored elastic disks, we find differences in the behavior of the collisions once the anisotropy is included. A homogeneous state, akin to the homogeneous c...

We report the emergence of a giant Mpemba effect in the uniformly heated gas of inelastic rough hard spheres: The initially hotter sample may cool sooner than the colder one, even when the initial temperatures differ by more than one order of magnitude. In order to understand this behavior, it suffices to consider the simplest Maxwellian approximat...

We report the emergence of a giant Mpemba effect in the uniformly heated gas of inelastic rough hard spheres. For this purpose, it suffices to consider the simplest Maxwellian approximation for the velocity distribution. Within this framework, the rotational and translational granular temperatures obey two coupled evolution equations, which predict...

clustComp is an open source Bioconductor package that implements different techniques for the comparison of two gene expression clustering results. These include flat versus flat and hierarchical versus flat comparisons. The visualisation of the similarities is provided by means of a bipartite graph, whose layout is heuristically optimised. Its fle...

Rapid accumulation and availability of gene expression datasets in public repositories have enabled large-scale meta-analyses of combined data. The richness of cross-experiment data has provided new biological insights, including identification of new cancer genes. In this study, we compiled a human gene expression dataset from ∼40,000 publicly ava...

Heatmap; solid groups vs 1,000 most variable probesets.
Heatmap for the expression level of the 1,000 most variable probesets averaged over the samples included in each biological group with at least 20 observations. The range for this similarity measure is (2.6984, 14.4581). The colour labels display the same clusters as those in S11 Fig. The prob...

Significant probesets.
The list of 1,835 significant probes, for which there is a significant effect of the disease status, along with the corrected p-values, and the genes or set of genes they are mapping to. They are ordered according to increasing p-values. There are 1,285 unique genes and 97 multiple matchings.
(TXT)

Heatmap; all biological groups and 10,000 most variable probesets.
Heatmap for the average pairwise correlations between samples from any two biological groups with at least 20 observations. Only the 10,000 most variable probesets are accounted for in the computation of the correlations. The range for the similarity measure is (0.1352, 0.9938). The...

Heatmap; all biological groups and 1,000 most variable probesets.
Heatmap for the average pairwise correlations between samples from any two biological groups with at least 20 observations. Only the 1,000 most variable probesets are accounted for in the computation of the correlations. The range for the similarity measure is (−0.3591, 0.9960). The...

BGV for all probesets across paired tissues.
The BGV ranges from 0.051 to 2,047.575, but only 10.85% of the probesets show a BGV really high (greater than 128.1, the ‘maximum’ whisker).
(PDF)

a) Permutation test QQ-plot. Quantiles of the adjusted permutation and observed p-values in log10 scale. Except for very extreme results observed due to resolution of attainable p-values in the permutation test, the observed p-values are larger than those obtained with the permutation test. b) QQ-plot of correct vs shuffled disease labels. After ra...

Genes found in the Atlas.
The 135 unique genes found in the list L1 from the Atlas of Genetics and Cytogenetics in Oncology and Haematology are alphabetically ordered and displayed in bold-face. The types of cancer they have been related to are also shown.
(XLS)

Genes overexpressed in cancer.
The 210 unique genes found in the list L3 identified in [34] are alphabetically ordered and displayed in bold-face. The types of cancer they have been related to are also shown.
(XLS)

Samples and Biological groups.
Collection of 27,887 annotated samples retrieved from ArrayExpress along with the biological group; the original experiments and assay names are given in the format ‘Experiment_CELfile’.
(XLS)

Heatmap; all biological groups and 5,000 most variable probesets.
Heatmap for the average pairwise correlations between samples from any two biological groups with at least 20 observations. Only the 5,000 most variable probesets are accounted for in the computation of the correlations. The range for the similarity measure is (−0.0288, 0.9940). The...

Heatmap; all biological groups and 500 most variable probesets.
Heatmap for the average pairwise correlations between samples from any two biological groups with at least 20 observations. Only the 500 most variable probesets are accounted for in the computation of the correlations. The range for the similarity measure is (−0.4359, 0.9965). The colo...

Heatmap; solid groups and 20,000 most variable probesets.
Heatmap for the average pairwise correlations between samples from any two solid groups with at least 20 observations, accounting for the 20,000 most variable probesets in the computation of the correlations. The range for the similarity measure is (0.3869, 0.9907). The colour labels display...

Heatmap; solid groups and 10,000 most variable probesets.
Heatmap for the average pairwise correlations between samples from any two solid groups with at least 20 observations, accounting for the 10,000 most variable probesets in the computation of the correlations. The range for the similarity measure is (0.1704, 0.9896). The colour labels display...

Heatmap; solid groups and 5,000 most variable probesets.
Heatmap for the average pairwise correlations between samples from any two solid groups with at least 20 observations, accounting for the 5,000 most variable probesets in the computation of the correlations. The range for the similarity measure is (0.0247, 0.9893). The colour labels display s...

Heatmap; solid groups and 500 most variable probesets.
Heatmap for the average pairwise correlations between samples from any two solid groups with at least 20 observations, accounting for the 500 most variable probesets in the computation of the correlations. The range for the similarity measure is (−0.2424, 0.9955). The colour labels display smal...

Disease effect volcano plot.
Plot of the disease effect, irrespective of the tissue type, versus the negative log 10-transformed p-values.
(PDF)

Data pre-processing and quality control.
Description of the pre-processing and quality control steps and parameters.
(PDF)

Heatmap; all biological groups and all probesets.
Heatmaps for the average pairwise correlations between samples from any two biological groups with at least 20 observations. All probesets are accounted for in the computation of the correlations. The range for the similarity measure is (0.6317, 0.9953). The colour labels display smaller clusters in...

Heatmap; all biological groups and 20,000 most variable probesets.
Heatmap for the average pairwise correlations between samples from any two biological groups with at least 20 observations. Only the 20,000 most variable probesets are accounted for in the computation of the correlations. The range for the similarity measure is (0.3012, 0.9946). The...

Heatmap; solid groups and all probesets.
Heatmap for the average pairwise correlations between samples from any two solid groups with at least 20 observations, accounting for all the probesets in the computation of the correlations. The range for the similarity measure is (0.7164, 0.9920). The colour labels display smaller clusters in the hierarchi...

Heatmap; solid groups and 1,000 most variable probesets.
Heatmap for the average pairwise correlations between samples from any two solid groups with at least 20 observations, accounting for the 1,000 most variable probesets in the computation of the correlations. The range for the similarity measure is (−0.1312, 0.9938). The colour labels display...

The use of DNA microarrays and oligonucleotide chips of high density in modern biomedical research provides complex, high dimensional data which have been proven to convey crucial information about gene expression levels and to play an important role in disease diagnosis. Therefore, there is a need for developing new, robust statistical techniques...

Rapid accumulation of large and standardized microarray data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of these data resources. Although short oligonucleotide arrays constitute a ma...

Microwave tomographic imaging is an inexpensive, noninvasive modality of media dielectric properties reconstruction which can be utilized as a screening method in clinical applications such as breast cancer and brain stroke detection. For breast cancer detection, the iterative algorithm of structural inversion with level sets provides well-defined...

Microarray experiments provide data on the expression levels of thousands of genes and, therefore, statistical methods applicable to the analysis of such high-dimensional data are needed. In this paper, we propose robust nonparametric tools for the description and analysis of microarray data based on the concept of functional depth, which measures...

Clustering is one of the most widely used methods in unsupervised gene expression data analysis. The use of different clustering algorithms or different parameters often produces rather different results on the same data. Biological interpretation of multiple clustering results requires understanding how different clusters relate to each other. It...

Expression Profiler (EP, http://www.ebi.ac.uk/expressionprofiler) is a web-based platform for microarray gene expression and other functional genomics-related data analysis. The new architecture,
Expression Profiler: next generation (EP:NG), modularizes the original design and allows individual analysis-task-related
components to be developed by di...

Many iterative techniques are sensitive to the initial conditions, thus getting stuck in local optima. This paper explores two simple, computationally fast methods that allow the reflnement of the initial points of k-means to cluster a given data set. They are based on alternating k-means and the search of the deepest (most representative) point of...

## Projects

Project (1)