Effective similarity measures for expression profiles.

Department of Computer Science, Cornell University, NY, USA.
Bioinformatics (Impact Factor: 4.62). 08/2006; 22(13):1616-22. DOI: 10.1093/bioinformatics/btl127
Source: PubMed

ABSTRACT It is commonly accepted that genes with similar expression profiles are functionally related. However, there are many ways one can measure the similarity of expression profiles, and it is not clear a priori what is the most effective one. Moreover, so far no clear distinction has been made as for the type of the functional link between genes as suggested by microarray data. Similarly expressed genes can be part of the same complex as interacting partners; they can participate in the same pathway without interacting directly; they can perform similar functions; or they can simply have similar regulatory sequences. Here we conduct a study of the notion of functional link as implied from expression data. We analyze different similarity measures of gene expression profiles and assess their usefulness and robustness in detecting biological relationships by comparing the similarity scores with results obtained from databases of interacting proteins, promoter signals and cellular pathways, as well as through sequence comparisons. We also introduce variations on similarity measures that are based on statistical analysis and better discriminate genes which are functionally nearby and faraway. Our tools can be used to assess other similarity measures for expression profiles, and are accessible at

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: An algorithm for identifying differentially expressed genes in data from time course experiments is introduced. We evaluate our algorithm on a real microarray data set, in which 6% of genes have been identified as non-random based on expert knowledge. Our algorithm is related to subspace clustering based on axis-parallel projections. In contrast to existing subspace clustering techniques we sum over all possible 2d combinations of dimensions rather than identifying the locally most relevant dimensions. We scale attribute values based on rank order to avoid making assumptions on the distribution of random data. Information on the overall expression of a gene contributes towards the final result. We demonstrate that our algorithm consistently outperforms a conventional outlier detection algorithm.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Although a growing body of research has examined issues related to individuality in music performance, few studies have attempted to quantify markers of individuality that transcend pieces and musical styles. This study aims to identify such meta-markers by discriminating between influences linked to specific pieces or interpretive goals and performer-specific playing styles, using two complementary statistical approaches: linear mixed models (LMMs) to estimate fixed (piece and interpretation) and random (performer) effects, and similarity analyses to compare expressive profiles on a note-by-note basis across pieces and expressive parameters. Twelve professional harpsichordists recorded three pieces representative of the Baroque harpsichord repertoire, including three interpretations of one of these pieces, each emphasizing a different melodic line, on an instrument equipped with a MIDI console. Four expressive parameters were analyzed: articulation, note onset asynchrony, timing, and velocity. LMMs showed that piece-specific influences were much larger for articulation than for other parameters, for which performer-specific effects were predominant, and that piece-specific influences were generally larger than effects associated with interpretive goals. Some performers consistently deviated from the mean values for articulation and velocity across pieces and interpretations, suggesting that global measures of expressivity may in some cases constitute valid markers of artistic individuality. Similarity analyses detected significant associations among the magnitudes of the correlations between the expressive profiles of different performers. These associations were found both when comparing across parameters and within the same piece or interpretation, or on the same parameter and across pieces or interpretations. These findings suggest the existence of expressive meta-strategies that can manifest themselves across pieces, interpretive goals, or expressive devices.
    Frontiers in Psychology 01/2013; 4:895. · 2.80 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Analysis of genetic interaction networks often involves identifying genes with similar profiles, which is typically indicative of a common function. While several profile similarity measures have been applied in this context, they have never been systematically benchmarked. We compared a diverse set of correlation measures, including measures commonly used by the genetic interaction community as well as several other candidate measures, by assessing their utility in extracting functional information from genetic interaction data. We find that the dot product, one of the simplest vector operations, outperforms most other measures over a large range of gene pairs. More generally, linear similarity measures such as the dot product, Pearson correlation or cosine similarity perform better than set overlap measures such as Jaccard coefficient. Similarity measures that involve L2-normalization of the profiles tend to perform better for the top-most similar pairs but perform less favorably when a larger set of gene pairs is considered or when the genetic interaction data is thresholded. Such measures are also less robust to the presence of noise and batch effects in the genetic interaction data. Overall, the dot product measure performs consistently among the best measures under a variety of different conditions and genetic interaction datasets.
    PLoS ONE 07/2013; 8(7):e68664. · 3.53 Impact Factor

Full-text (2 Sources)

Available from
May 29, 2014