Magnus Fontes

Lund University, Lund, Skane, Sweden

Are you Magnus Fontes?

Claim your profile

Publications (10)48.15 Total impact

  • Source
    Article: A method for visual identification of small sample subgroups and potential biomarkers
    Charlotte Soneson, Magnus Fontes
    [show abstract] [hide abstract]
    ABSTRACT: In order to find previously unknown subgroups in biomedical data and generate testable hypotheses, visually guided exploratory analysis can be of tremendous importance. In this paper we propose a new dissimilarity measure that can be used within the Multidimensional Scaling framework to obtain a joint low-dimensional representation of both the samples and variables of a multivariate data set, thereby providing an alternative to conventional biplots. In comparison with biplots, the representations obtained by our approach are particularly useful for exploratory analysis of data sets where there are small groups of variables sharing unusually high or low values for a small group of samples.
    11/2011;
  • Source
    Article: A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities.
    Charlotte Soneson, Magnus Fontes
    [show abstract] [hide abstract]
    ABSTRACT: Analysis of multivariate data sets from, for example, microarray studies frequently results in lists of genes which are associated with some response of interest. The biological interpretation is often complicated by the statistical instability of the obtained gene lists, which may partly be due to the functional redundancy among genes, implying that multiple genes can play exchangeable roles in the cell. In this paper, we use the concept of exchangeability of random variables to model this functional redundancy and thereby account for the instability. We present a flexible framework to incorporate the exchangeability into the representation of lists. The proposed framework supports straightforward comparison between any 2 lists. It can also be used to generate new more stable gene rankings incorporating more information from the experimental data. Using 2 microarray data sets, we show that the proposed method provides more robust gene rankings than existing methods with respect to sampling variations, without compromising the biological significance of the rankings.
    Biostatistics 09/2011; 13(1):129-41. · 2.14 Impact Factor
  • Source
    Article: The projection score--an evaluation criterion for variable subset selection in PCA visualization.
    Magnus Fontes, Charlotte Soneson
    [show abstract] [hide abstract]
    ABSTRACT: In many scientific domains, it is becoming increasingly common to collect high-dimensional data sets, often with an exploratory aim, to generate new and relevant hypotheses. The exploratory perspective often makes statistically guided visualization methods, such as Principal Component Analysis (PCA), the methods of choice. However, the clarity of the obtained visualizations, and thereby the potential to use them to formulate relevant hypotheses, may be confounded by the presence of the many non-informative variables. For microarray data, more easily interpretable visualizations are often obtained by filtering the variable set, for example by removing the variables with the smallest variances or by only including the variables most highly related to a specific response. The resulting visualization may depend heavily on the inclusion criterion, that is, effectively the number of retained variables. To our knowledge, there exists no objective method for determining the optimal inclusion criterion in the context of visualization. We present the projection score, which is a straightforward, intuitively appealing measure of the informativeness of a variable subset with respect to PCA visualization. This measure can be universally applied to find suitable inclusion criteria for any type of variable filtering. We apply the presented measure to find optimal variable subsets for different filtering methods in both microarray data sets and synthetic data sets. We note also that the projection score can be applied in general contexts, to compare the informativeness of any variable subsets with respect to visualization by PCA. We conclude that the projection score provides an easily interpretable and universally applicable measure of the informativeness of a variable subset with respect to visualization by PCA, that can be used to systematically find the most interpretable PCA visualization in practical exploratory analysis.
    BMC Bioinformatics 07/2011; 12:307. · 2.75 Impact Factor
  • Article: Early changes in the hypothalamic region in prodromal Huntington disease revealed by MRI analysis.
    [show abstract] [hide abstract]
    ABSTRACT: Huntington disease (HD) is a fatal neurodegenerative disorder caused by an expanded CAG repeat. Its length can be used to estimate the time of clinical diagnosis, which is defined by overt motor symptoms. Non-motor symptoms begin before motor onset, and involve changes in hypothalamus-regulated functions such as sleep, emotion and metabolism. Therefore we hypothesized that hypothalamic changes occur already prior to the clinical diagnosis. We performed voxel-based morphometry and logistic regression analyses of cross-sectional MR images from 220 HD gene carriers and 75 controls in the Predict-HD study. We show that changes in the hypothalamic region are detectable before clinical diagnosis and that its grey matter contents alone are sufficient to distinguish HD gene carriers from control cases. In conclusion, our study shows, for the first time, that alterations in grey matter contents in the hypothalamic region occur at least a decade before clinical diagnosis in HD using MRI.
    Neurobiology of Disease 12/2010; 40(3):531-43. · 5.40 Impact Factor
  • Article: The correlation pattern of acquired copy number changes in 164 ETV6/RUNX1-positive childhood acute lymphoblastic leukemias.
    [show abstract] [hide abstract]
    ABSTRACT: The ETV6/RUNX1 fusion gene, present in 25% of B-lineage childhood acute lymphoblastic leukemia (ALL), is thought to represent an initiating event, which requires additional genetic changes for leukemia development. To identify additional genetic alterations, 24 ETV6/RUNX1-positive ALLs were analyzed using 500K single nucleotide polymorphism arrays. The results were combined with previously published data sets, allowing us to ascertain genomic copy number aberrations (CNAs) in 164 cases. In total, 45 recurrent CNAs were identified with an average number of 3.5 recurrent changes per case (range 0-13). Twenty-six percent of cases displayed a set of recurrent CNAs identical to that of other cases in the data set. The majority (74%), however, displayed a unique pattern of recurrent CNAs, indicating a large heterogeneity within this ALL subtype. As previously demonstrated, alterations targeting genes involved in B-cell development were common (present in 28% of cases). However, the combined analysis also identified alterations affecting nuclear hormone response (24%) to be a characteristic feature of ETV6/RUNX1-positive ALL. Studying the correlation pattern of the CNAs allowed us to highlight significant positive and negative correlations between specific aberrations. Furthermore, oncogenetic tree models identified ETV6, CDKN2A/B, PAX5, del(6q) and +16 as possible early events in the leukemogenic process.
    Human Molecular Genetics 08/2010; 19(16):3150-8. · 7.64 Impact Factor
  • Article: Transcriptional regulation of aquaporins in accessions of Arabidopsis in response to drought stress
    [show abstract] [hide abstract]
    ABSTRACT: Aquaporins facilitate water transport over cellular membranes, and are therefore believed to play an important role in water homeostasis. In higher plants aquaporin-like proteins, also called major intrinsic proteins (MIPs), are divided into five subfamilies. We have previously shown that MIP transcription in Arabidopsis thaliana is generally downregulated in leaves upon drought stress, apart from two members of the plasma membrane intrinsic protein (PIP) subfamily, AtPIP1;4 and AtPIP2;5, which are upregulated. In order to assess whether this regulation is general or accession-specific we monitored the gene expression of all PIPs in five Arabidopsis accessions. The overall drought regulation of PIPs was well conserved for all five accessions tested, suggesting a general and fundamental physiological role of this drought response. In addition, significant differences among accessions were identified for transcripts of three PIP genes. Principal component analysis showed that most of the PIP transcriptional variation during drought stress could be explained by one variable linked to leaf water content. Promoter-GUS constructs of AtPIP1;4, AtPIP2;5 and also AtPIP2;6, which is unresponsive to drought stress, had distinct expression patterns concentrated in the base of the leaf petioles and parts of the flowers. The presence of drought stress response elements within the 1.6-kb promoter regions of AtPIP1;4 and AtPIP2;5 was demonstrated by comparing transcription of the promoter reporter construct and the endogenous gene upon drought stress. Analysis by ATTED-II and other web-based bioinformatical tools showed that several of the MIPs downregulated upon drought are strongly co-expressed, whereas AtPIP1;4, AtPIP2;5 and AtPIP2;6 are not co-expressed.
    The Plant Journal 01/2010; 61(4):650 - 660. · 6.16 Impact Factor
  • Source
    Article: Integrative analysis of gene expression and copy number alterations using canonical correlation analysis.
    [show abstract] [hide abstract]
    ABSTRACT: With the rapid development of new genetic measurement methods, several types of genetic alterations can be quantified in a high-throughput manner. While the initial focus has been on investigating each data set separately, there is an increasing interest in studying the correlation structure between two or more data sets. Multivariate methods based on Canonical Correlation Analysis (CCA) have been proposed for integrating paired genetic data sets. The high dimensionality of microarray data imposes computational difficulties, which have been addressed for instance by studying the covariance structure of the data, or by reducing the number of variables prior to applying the CCA. In this work, we propose a new method for analyzing high-dimensional paired genetic data sets, which mainly emphasizes the correlation structure and still permits efficient application to very large data sets. The method is implemented by translating a regularized CCA to its dual form, where the computational complexity depends mainly on the number of samples instead of the number of variables. The optimal regularization parameters are chosen by cross-validation. We apply the regularized dual CCA, as well as a classical CCA preceded by a dimension-reducing Principal Components Analysis (PCA), to a paired data set of gene expression changes and copy number alterations in leukemia. Using the correlation-maximizing methods, regularized dual CCA and PCA+CCA, we show that without pre-selection of known disease-relevant genes, and without using information about clinical class membership, an exploratory analysis singles out two patient groups, corresponding to well-known leukemia subtypes. Furthermore, the variables showing the highest relevance to the extracted features agree with previous biological knowledge concerning copy number alterations and gene expression changes in these subtypes. Finally, the correlation-maximizing methods are shown to yield results which are more biologically interpretable than those resulting from a covariance-maximizing method, and provide different insight compared to when each variable set is studied separately using PCA. We conclude that regularized dual CCA as well as PCA+CCA are useful methods for exploratory analysis of paired genetic data sets, and can be efficiently implemented also when the number of variables is very large.
    BMC Bioinformatics 01/2010; 11:191. · 2.75 Impact Factor
  • Article: Transcriptional regulation of aquaporins in accessions of Arabidopsis in response to drought stress.
    [show abstract] [hide abstract]
    ABSTRACT: Aquaporins facilitate water transport over cellular membranes, and are therefore believed to play an important role in water homeostasis. In higher plants aquaporin-like proteins, also called major intrinsic proteins (MIPs), are divided into five subfamilies. We have previously shown that MIP transcription in Arabidopsis thaliana is generally downregulated in leaves upon drought stress, apart from two members of the plasma membrane intrinsic protein (PIP) subfamily, AtPIP1;4 and AtPIP2;5, which are upregulated. In order to assess whether this regulation is general or accession-specific we monitored the gene expression of all PIPs in five Arabidopsis accessions. The overall drought regulation of PIPs was well conserved for all five accessions tested, suggesting a general and fundamental physiological role of this drought response. In addition, significant differences among accessions were identified for transcripts of three PIP genes. Principal component analysis showed that most of the PIP transcriptional variation during drought stress could be explained by one variable linked to leaf water content. Promoter-GUS constructs of AtPIP1;4, AtPIP2;5 and also AtPIP2;6, which is unresponsive to drought stress, had distinct expression patterns concentrated in the base of the leaf petioles and parts of the flowers. The presence of drought stress response elements within the 1.6-kb promoter regions of AtPIP1;4 and AtPIP2;5 was demonstrated by comparing transcription of the promoter reporter construct and the endogenous gene upon drought stress. Analysis by ATTED-II and other web-based bioinformatical tools showed that several of the MIPs downregulated upon drought are strongly co-expressed, whereas AtPIP1;4, AtPIP2;5 and AtPIP2;6 are not co-expressed.
    The Plant Journal 11/2009; 61(4):650-60. · 6.16 Impact Factor
  • Article: Molecular signatures in childhood acute leukemia and their correlations to expression patterns in normal hematopoietic subpopulations.
    [show abstract] [hide abstract]
    ABSTRACT: Global expression profiles of a consecutive series of 121 childhood acute leukemias (87 B lineage acute lymphoblastic leukemias, 11 T cell acute lymphoblastic leukemias, and 23 acute myeloid leukemias), six normal bone marrows, and 10 normal hematopoietic subpopulations of different lineages and maturations were ascertained by using 27K cDNA microarrays. Unsupervised analyses revealed segregation according to lineages and primary genetic changes, i.e., TCF3(E2A)/PBX1, IGH@/MYC, ETV6(TEL)/RUNX1(AML1), 11q23/MLL, and hyperdiploidy (>50 chromosomes). Supervised discriminatory analyses were used to identify differentially expressed genes correlating with lineage and primary genetic change. The gene-expression profiles of normal hematopoietic cells were also studied. By using principal component analyses (PCA), a differentiation axis was exposed, reflecting lineages and maturation stages of normal hematopoietic cells. By applying the three principal components obtained from PCA of the normal cells on the leukemic samples, similarities between malignant and normal cell lineages and maturations were investigated. Apart from showing that leukemias segregate according to lineage and genetic subtype, we provide an extensive study of the genes correlating with primary genetic changes. We also investigated the expression pattern of these genes in normal hematopoietic cells of different lineages and maturations, identifying genes preferentially expressed by the leukemic cells, suggesting an ectopic activation of a large number of genes, likely to reflect regulatory networks of pathogenetic importance that also may provide attractive targets for future directed therapies.
    Proceedings of the National Academy of Sciences 12/2005; 102(52):19069-74. · 9.68 Impact Factor
  • Source
    Article: Approximate geodesic distances reveal biologically relevant structures in microarray data.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: Genome-wide gene expression measurements, as currently determined by the microarray technology, can be represented mathematically as points in a high-dimensional gene expression space. Genes interact with each other in regulatory networks, restricting the cellular gene expression profiles to a certain manifold, or surface, in gene expression space. To obtain knowledge about this manifold, various dimensionality reduction methods and distance metrics are used. For data points distributed on curved manifolds, a sensible distance measure would be the geodesic distance along the manifold. In this work, we examine whether an approximate geodesic distance measure captures biological similarities better than the traditionally used Euclidean distance. RESULTS: We computed approximate geodesic distances, determined by the Isomap algorithm, for one set of lymphoma and one set of lung cancer microarray samples. Compared with the ordinary Euclidean distance metric, this distance measure produced more instructive, biologically relevant, visualizations when applying multidimensional scaling. This suggests the Isomap algorithm as a promising tool for the interpretation of microarray data. Furthermore, the results demonstrate the benefit and importance of taking nonlinearities in gene expression data into account.
    Bioinformatics 05/2004; 20(6):874-80. · 5.47 Impact Factor