## About

62

Publications

2,280

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

599

Citations

Citations since 2017

Introduction

Marta Casanellas currently works at the Departament de Matemàtiques, Universitat Politècnica de Catalunya. Marta does research in Evolutionary Biology, Algebra, and Algebraic geometry.

**Skills and Expertise**

## Publications

Publications (62)

In the last years algebraic tools have been proven to be useful in phylogenetic reconstruction and model selection by means of the study of phylogenetic invariants. However, up to now, the models studied from an algebraic viewpoint are either too general or too restrictive (as group-based models with a uniform stationary distribution) to be used in...

Homogeneity across lineages is a general assumption in phylogenetics according to which nucleotide substitution rates are common to all lineages. Many phylogenetic methods relax this hypothesis but keep a simple enough model to make the process of sequence evolution more tractable. On the other hand, dealing successfully with the general case (hete...

Homogeneity across lineages is a common assumption in phylogenetics according to which nucleotide substitution rates remain constant in time and do not depend on lineages. This is a simplifying hypothesis which is often adopted to make the process of sequence evolution more tractable. However, its validity has been explored and put into question in...

We present the phylogenetic quartet reconstruction method SAQ (Semi-Algebraic Quartet reconstruction). SAQ is consistent with the most general Markov model of nucleotide substitution and, in particular, it allows for rate heterogeneity across lineages. Based on the algebraic and semi-algebraic description of distributions that arise from the genera...

Modelling the substitution of nucleotides along a phylogenetic tree is usually done by a hidden Markov process. This allows to define a distribution of characters at the leaves of the trees and one might be able to obtain polynomial relationships among the probabilities of different characters. The study of these polynomials and the geometry of the...

Consider the problem of learning undirected graphical models on trees from corrupted data. Recently Katiyar et al. showed that it is possible to recover trees from noisy binary data up to a small equivalence class of possible trees. Their other paper on the Gaussian case follows a similar pattern. By framing this as a special phylogenetic recovery...

A Markov matrix is embeddable if it can represent a homogeneous continuous-time Markov process. It is well known that if a Markov matrix has real and pairwise-different eigenvalues, then the embeddability can be determined by checking whether its principal logarithm is a rate matrix or not. The same holds for Markov matrices that are close enough t...

We present the phylogenetic quartet reconstruction method SAQ (Semi-algebraic quartet reconstruction). SAQ is consistent with the most general Markov model of nucleotide substitution and, in particular, it allows for rate heterogeneity across lineages. Based on the algebraic and semi-algebraic description of distributions that arise from the genera...

Characterizing whether a Markov process of discrete random variables has an homogeneous continuous-time realization is a hard problem. In practice, this problem reduces to deciding when a given Markov matrix can be written as the exponential of some rate matrix (a Markov generator). This is an old question known in the literature as the embedding p...

A Markov matrix is embeddable if it can represent a homogeneous continuous-time Markov process. It is well known that if a Markov matrix has real and pairwise-different eigenvalues, then the embeddability can be determined by checking whether its principal logarithm is a rate matrix or not. The same holds for Markov matrices close enough to the ide...

Less rigid than phylogenetic trees, phylogenetic networks allow the description of a wider range of evolutionary events. In this note, we explain how to extend the rank invariants from phylogenetic trees to phylogenetic networks evolving under the general Markov model and the equivariant models.

Deciding whether a substitution matrix is embeddable (i.e. the corresponding Markov process has a continuous-time realization) is an open problem even for \(4\times 4\) matrices. We study the embedding problem and rate identifiability for the K80 model of nucleotide substitution. For these \(4\times 4\) matrices, we fully characterize the set of em...

Modelling the substitution of nucleotides along a phylogenetic tree is usually done by a hidden Markov process. This allows to define a distribution of characters at the leaves of the trees and one might be able to obtain polynomial relationships among the probabilities of different characters. The study of these polynomials and the geometry of the...

Algebraic statistics uses tools from algebra (especially from multilinear algebra, commutative algebra and computational algebra), geometry and combinatorics to provide insight into knotty problems in mathematical statistics. In this survey we illustrate this on three problems related to networks, namely network models for relational data, causal s...

Deciding whether a Markov matrix is embeddable (i.e. can be written as the exponential of a rate matrix) is an open problem even for $4\times 4$ matrices. We study the embedding problem and rate identifiability for the K80 model of nucleotide substitution. For these $4\times 4$ matrices, we fully characterize the set of embeddable K80 Markov matric...

We present an algorithm for the unsupervised learning of latent variable models based on the method of moments. We give efficient estimates of the moments for two models that are well known, e.g., in text mining, the single-topic model and latent Dirichlet allocation, and we provide a tensor decomposition algorithm for the moments that proves to be...

In many areas of applied linear algebra, it is necessary to work with matrix approximations. A usual situation occurs when a matrix obtained from experimental or simulated data is needed to be approximated by a matrix that lies in a corresponding statistical model and satisfies some specific properties. In this short note, we focus on symmetric and...

Phylogenetic varieties related to equivariant substitution models have been studied largely in the last years. One of the main objectives has been finding a set of generators of the ideal of these varieties, but this has not yet been achieved in some cases (for example, for the general Markov model this involves the open “salmon conjecture”, see [2...

The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely long sequences. However, when the underlying Mark...

Algebraic statistics uses tools from algebra (especially from multilinear algebra, commutative algebra, and computational algebra), geometry, and combinatorics to provide insight into knotty problems in mathematical statistics. In this review, we illustrate this on three problems related to networks: network models for relational data, causal struc...

This paper presents an algorithm for the unsupervised learning of latent variable models from unlabeled sets of data. We base our technique on spectral decomposition, providing a technique that proves to be robust both in theory and in practice. We also describe how to use this algorithm to learn the parameters of two well known text mining models:...

Phylogenetic varieties related to equivariant substitution models have been
studied largely in the last years. One of the main objectives has been finding
a set of generators of the ideal of these varieties, but this has not yet been
achieved in some cases (for example, for the general Markov model this involves
the open "salmon conjecture") and it...

One reason why classical phylogenetic reconstruction methods fail to correctly infer the underlying topology is because they
assume oversimplified models. In this paper we propose a quartet reconstruction method consistent with the most general Markov
model of nucleotide substitution, which can also deal with data coming from mixtures on the same t...

Motivated by phylogenetics, our aim is to obtain a system of equations that
define a phylogenetic variety on an open set containing the biologically
meaningful points. In this paper we consider phylogenetic varieties defined via
group-based models. For any finite abelian group $G$, we provide an explicit
construction of $codim X$ phylogenetic invar...

Background
The reconstruction of the phylogenetic tree topology of four taxa is, still nowadays, one of the main challenges in phylogenetics. Its difficulties lie in considering not too restrictive evolutionary models, and correctly dealing with the long-branch attraction problem. The correct reconstruction of 4-taxon trees is crucial for making qu...

Background
The selection of an evolutionary model to best fit given molecular data is usually a heuristic choice. In his seminal book, J. Felsenstein suggested that certain linear equations satisfied by the expected probabilities of patterns observed at the leaves of a phylogenetic tree could be used for model selection. It remained an open questio...

Background
A number of software packages are available to generate DNA multiple sequence alignments (MSAs) evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts to the time-reversible models...

Is a zipped (extension .zip) file containing the C++ implementation of
GenNon-h.

The goal of branch length estimation in phylogenetic inference is to estimate
the divergence time between a set of sequences based on compositional
differences between them. A number of software is currently available
facilitating branch lengths estimation for homogeneous and stationary
evolutionary models. Homogeneity of the evolutionary process i...

In phylogenetic inference, an evolutionary model describes the substitution processes along each edge of a phylogenetic tree.
Misspecification of the model has important implications for the analysis of phylogenetic data. Conventionally, however, the
selection of a suitable evolutionary model is based on heuristics or relies on the choice of an app...

Under a markovian evolutionary process, the expected number of substitutions
per site (also called branch length) that have occurred when a sequence has
evolved from another according to a transition matrix $P$ can be approximated
by $-1/4log det P.$ When the Markov process is assumed to be continuous in
time, i.e. $P=\exp Qt$ it is easy to simulat...

Recently there have been several attempts to provide a whole set of generators of the ideal of the algebraic variety associated to a phylogenetic tree evolving under an algebraic model. These algebraic varieties have been proven to be useful in phylogenetics. In this paper we prove that, for phylogenetic reconstruction purposes, it is enough to con...

A new approach to phylogenetic reconstruction has been emerging in the last years. Given an evolutionary model, the joint probability distribution of the nucleotides for these species satisfy some algebraic constraints called invariants. These invariants have theoretical and practical interest, since they can be used to infer phylogenies. In this p...

In this paper we characterize non-connected Buchsbaum curves C in P^n and we give a sharp bound for the number of disjoint connected components of C.

We prove that, for every r 2, the moduli space Ms X.rI c1; c2/ of rank r stable vector bundles with Chern classes c1 D rH and c2 D 1 2 .3r2 r/ on a nonsingular cubic surface X P3 contains a nonempty smooth open subset formed by ACM bundles, i.e. vector bundles with no intermediate cohomology. The bundles we consider for this study are extremal for...

The Kimura 3-parameter model on a tree of n leaves is one of the most used in phylogenetics. The affine algebraic variety W associated to it is a toric variety. We study its geometry and we prove that it is isomorphic to a geometric quotient of the affine space by a finite group, which is completely described. As a consequence, we are able to study...

An attempt to use phylogenetic invariants for tree reconstruction was
made at the end of the 80s and the beginning of the 90s by several
authors (the initial idea due to Lake and Cavender and Felsenstein in
1987. However, the efficiency of methods based on invariants is still in
doubt, probably because these methods only used few generators of the...

In this paper we prove that the generalized version of the Minimal Resolution Conjecture stated by Mustata holds for certain general sets of points on a smooth cubic surface $X \subset \mathbb{P}^3$. The main tool used is Gorenstein liaison theory and, more precisely, the relationship between the free resolutions of two linked schemes.

For a finite set of points X⊆Pn and for a given point P∈X, the notion of a separator of P in X (a hypersurface containing all the points in X except P) and of the degree of P in X, (the minimum degree of these separators) has been largely studied. In this paper we extend these notions to a set of points X on a projectively normal surface S⊆Pn, cons...

This chapter is concerned with the description of the Small Trees website which can be found at the following web address: The goal of the website is to make available in a unified format various algebraic features of different phylogenetic models on small trees. By “small” we mean trees with at most 5 taxa. In the first two sections, we describe a...

This chapter is devoted to the study of strand symmetric Markov models on trees from the standpoint of algebraic statistics. A strand symmetric Markov model is one whose mutation probabilities reflect the symmetry induced by the double-stranded structure of DNA (see Chapter 4). In particular, a strand symmetric model for DNA must have the following...

En aquest article fem una introducci´o a les aplicacions de la geometria algebraica en filogen`etica. Gr`acies a qu`e gran part dels models evolutius usats en filogen`etica corresponen a varietats algebraiques, l’ideal associat a aquestes varietats pot ser usat per donar un nou enfocament a la infer`encia filogen`etica. Peer reviewed

Let X be a normal arithmetically Gorenstein scheme in . We give a criterion for all codimension two ACM subschemes of X to be in the same Gorenstein biliaison class on X, in terms of the category of ACM sheaves on X. These are sheaves that correspond to the graded maximal Cohen–Macaulay modules on the homogeneous coordinate ring of X. Using known r...

We prove that if $X \subset \mathbb{P}^N$ has dimension k and it is r-Buchsbaum with r > max (codim X - k, 0), then X is contained in at most one variety of minimal degree and dimension k + 1.

In this paper we compute the Hilbert functions of irreducible (or smooth) and reduced arithmetically Gorenstein schemes that are twisted anti-canonical divisors on arithmetically Cohen–Macaulay schemes. We also prove some folklore results characterizing the Hilbert functions of irreducible standard determinantal schemes, and we use them to produce...

We study Gorenstein liaison of codimension two subschemes of an arithmetically Gorenstein scheme X. Our main result is a criterion for two such subschemes to be in the same Gorenstein liaison class, in terms of the category of ACM sheaves on X. As a consequence we obtain a criterion for X to have the property that every codimension 2 arithmetically...

The theory of Gorenstein liaison has been developed during the last 3 years to generalize liaison theory of codimension 2
schemes to schemes of codimension ≥ 3 in a projective space. One of the main open questions in Gorenstein liaison theory is
whether any arithmetically Cohen-Macaulay subscheme of ℙ
n
is in the Gorenstein liaison class of a comp...

We answer a question proposed by Hartshorne about the Lazarsfeld–Rao property for even Gorenstein liaison classes.

Liaison theory has been extensively studied during the past
decades. In codimension 2, the theory has reached a very satisfactory
state, but in higher codimensions there are still many open
problems. In this paper we prove that two unions $V= \bigcup_{i=1}^k
L_i$ and $V'= \bigcup_{i=1}^{k'} L'_i$ of independent linear varieties
of dimension $d \geq...

Let be an arithmetically Cohen–Macaulay subscheme. In terms of Gorenstein liaison it is natural to ask whether C is in the Gorenstein liaison class of a complete intersection. In this paper, we study the Gorenstein liaison classes of arithmetically Cohen–Macaulay divisors on standard determinantal schemes and on rational normal scrolls. As main res...

We discuss the problem of whether arithmetically Gorenstein schemes are in the Gorenstein liaison class of a complete intersecti on. We present some axamples of arithmetically Gorenstein schenes that are indeed in the Gorenstein liaison class of a complete intersection. In the recent research on Gorenstein liaison theory, the question whether any a...

In this paper we characterize non-connected Buchsbaum curves C in P^n and we give a sharp bound for the number of disjoint connected components of C.

An attempt to use phylogenetic invariants for tree reconstruction was made at the end of the 80s and the beginning of the 90s by several au-thors (the initial idea due to Lake [Lake, 1987] and Cavender and Felsen-stein [Cavender and Felsenstein, 1987]). However, the efficiency of methods based on invariants is still in doubt ([Huelsenbeck, 1995], [...

"... Les varietats algebraiques apareixen de manera natural en considerar models estadístics empleats en genòmica i filogenètica. Explicarem quina és la relació entre aquests models estadístics i la geometria algebraica. Veurem també com utilitzar aquestes varietats algebraiques per a recuperar les relacions ancestrals entre espècies, és a dir, rec...