Article

ICLR 2021 Challenge for Computational Geometry & Topology: Design and Results

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper presents the computational challenge on differential geometry and topology that happened within the ICLR 2021 workshop "Geometric and Topolog-ical Representation Learning". The competition asked participants to provide creative contributions to the fields of computational geometry and topology through the open-source repositories Geomstats and Giotto-TDA. The challenge at-1

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We present an algorithm for the computation of Vietoris–Rips persistence barcodes and describe its implementation in the software Ripser. The method relies on implicit representations of the coboundary operator and the filtration order of the simplices, avoiding the explicit construction and storage of the filtration coboundary matrix. Moreover, it makes use of apparent pairs, a simple but powerful method for constructing a discrete gradient field from a total order on the simplices of a simplicial complex, which is also of independent interest. Our implementation shows substantial improvements over previous software both in time and memory usage.
Article
Full-text available
The chemical synthesis of polypeptides involves stepwise formation of amide bonds on an immobilized solid support. The high yields required for efficient incorporation of each individual amino acid in the growing chain are often impacted by sequence-dependent events such as aggregation. Here, we apply deep learning over ultraviolet–visible (UV–vis) analytical data collected from 35 427 individual fluorenylmethyloxycarbonyl (Fmoc) deprotection reactions performed with an automated fast-flow peptide synthesizer. The integral, height, and width of these time-resolved UV–vis deprotection traces indirectly allow for analysis of the iterative amide coupling cycles on resin. The computational model maps structural representations of amino acids and peptide sequences to experimental synthesis parameters and predicts the outcome of deprotection reactions with less than 6% error. Our deep-learning approach enables experimentally aware computational design for prediction of Fmoc deprotection efficiency and minimization of aggregation events, building the foundation for real-time optimization of peptide synthesis in flow.
Conference Paper
Full-text available
Computational notebooks are gaining widespread acceptance as a paradigm for storage, dissemination, and reproduction of experimental results. In this paper, we define the computational notebook paradigm (CNP) consisting of entities and processes and discuss how the reproducibility of the experimental process and results is enhanced by each element. This paper also details the interactions of CNP and multi-paradigm modeling (MPM), with an aim of understanding how to support MPM within the CNP, and improve the reproducibility aspects of both the CNP and MPM. Presentation available at: https://mybinder.org/v2/git/http%3A%2F%2Fmsdl.uantwerpen.be%2Fgit%2Fbentley%2FNotebookParadigmPaper2019-presentation.git/master
Article
Full-text available
A number of recent studies have shown that cell shape and cytoskeletal texture can be used as sensitive readouts of the physiological state of the cell. However, utilization of this information requires the development of quantitative measures that can describe relevant aspects of cell shape. In this paper we develop a toolbox, TISMorph, that calculates a set of quantitative measures to address this need. Some of the measures introduced here have been used previously, while others are new and have desirable properties for shape and texture quantification of cells. These measures, broadly classifiable into the categories of textural, irregularity and spreading measures, are tested by using them to discriminate between osteosarcoma cell lines treated with different cytoskeletal drugs. We find that even though specific classification tasks often rely on a few measures, these are not the same between all classification tasks, thus requiring the use of the entire suite of measures for classification and discrimination. We provide detailed descriptions of the measures, as well as the TISMorph package to implement them. Quantitative morphological measures that capture different aspects of cell morphology will help enhance large-scale image-based quantitative analysis, which is emerging as a new field of biological data.
Article
Full-text available
Most individuals exposed to hepatitis C virus (HCV) become persistently infected while a minority spontaneously eliminate the virus. Although early immune events influence infection outcome, the cellular composition, molecular effectors, and timeframe of the host response active shortly after viral exposure remain incompletely understood. Employing specimens collected from people who inject drugs (PWID) with high risk of HCV exposure, we utilized RNA-Seq and blood transcriptome module (BTM) analysis to characterize immune function in peripheral blood mononuclear cells (PBMC) before, during, and after acute HCV infection resulting in spontaneous resolution. Our results provide a detailed description of innate immune programs active in peripheral blood during acute HCV infection, which include prominent type I interferon and inflammatory signatures. Innate immune gene expression rapidly returns to pre-infection levels upon viral clearance. Comparative analyses using peripheral blood gene expression profiles from other viral and vaccine studies demonstrate similarities in the immune responses to acute HCV and flaviviruses. Of note, both acute dengue virus (DENV) infection and acute HCV infection elicit similar innate antiviral signatures. However, while transient in DENV infection, this signature was sustained for many weeks in the response to HCV. These results represent the first longitudinal transcriptomic characterization of human immune function in PBMC during acute HCV infection and identify several dynamically regulated features of the complex response to natural HCV exposure.
Article
Full-text available
A structural database of peptide-protein interactions is important for drug discovery targeting peptide-mediated interactions. Although some peptide databases, especially for special types of peptides, have been developed, a comprehensive database of cleaned peptide-protein complex structures is still not available. Such cleaned structures are valuable for docking and scoring studies in structure-based drug design. Here, we have developed PepBDB-a curated Peptide Binding DataBase of biological complex structures from the Protein Data Bank (PDB). PepBDB presents not only cleaned structures but also extensive informationabout biologicalpeptide-proteininteractions, and allows users to search the database with a variety of options and interactively visualize the search results. Availability and implementation: PepBDB is available at http://huanglab.phys.hust.edu.cn/pepbdb/.
Article
Full-text available
We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. The dataset is freely available at https://github.com/zalandoresearch/fashion-mnist.
Conference Paper
Full-text available
To model deformation of anatomical shapes, non-linear statistics are required to take into account the non-linear structure of the data space. Computer implementations of non-linear statistics and differential geometry algorithms often lead to long and complex code sequences. The aim of the paper is to show how the Theano framework can be used for simple and concise implementation of complex differential geometry algorithms while being able to handle complex and high-dimensional data structures. We show how the Theano framework meets both of these requirements. The framework provides a symbolic language that allows mathematical equations to be directly translated into Theano code, and it is able to perform both fast CPU and GPU computations on high-dimensional data. We show how different concepts from non-linear statistics and differential geometry can be implemented in Theano, and give examples of the implemented theory visualized on landmark representations of Corpus Callosum shapes.
Article
Full-text available
Based on the data of the NIH-funded Human Connectome Project, we have computed structural connectomes of 426 human subjects in five different resolutions of 83, 129, 234, 463 and 1015 nodes and several edge weights. The graphs are given in anatomically annotated GraphML format that facilitates better further processing and visualization. For 96 subjects, the anatomically classified sub-graphs can also be accessed, formed from the vertices corresponding to distinct lobes or even smaller regions of interests of the brain. For example, one can easily download and study the connectomes, restricted to the frontal lobes or just to the left precuneus of 96 subjects using the data. Partially directed connectomes of 423 subjects are also available for download. We also present a GitHub-deposited set of tools, called the Brain Graph Tools, for several processing tasks of the connectomes on the site \url{http://braingraph.org}.
Article
Full-text available
Infected hosts differ in their responses to pathogens; some hosts are resilient and recover their original health, whereas others follow a divergent path and die. To quantitate these differences, we propose mapping the routes infected individuals take through "disease space." We find that when plotting physiological parameters against each other, many pairs have hysteretic relationships that identify the current location of the host and predict the future route of the infection. These maps can readily be constructed from experimental longitudinal data, and we provide two methods to generate the maps from the cross-sectional data that is commonly gathered in field trials. We hypothesize that resilient hosts tend to take small loops through disease space, whereas nonresilient individuals take large loops. We support this hypothesis with experimental data in mice infected with Plasmodium chabaudi, finding that dying mice trace a large arc in red blood cells (RBCs) by reticulocyte space as compared to surviving mice. We find that human malaria patients who are heterozygous for sickle cell hemoglobin occupy a small area of RBCs by reticulocyte space, suggesting this approach can be used to distinguish resilience in human populations. This technique should be broadly useful in describing the in-host dynamics of infections in both model hosts and patients at both population and individual levels.
Article
Full-text available
Most studies of the human microbiome have focused on westernized people with life-style practices that decrease microbial survival and transmission, or on traditional societies that are currently in transition to westernization. We characterize the fecal, oral, and skin bacterial microbiome and resistome of members of an isolated Yanomami Amerindian village with no documented previous contact with Western people. These Yanomami harbor a microbiome with the highest diversity of bacteria and genetic functions ever reported in a human group. Despite their isolation, presumably for >11,000 years since their ancestors arrived in South America, and no known exposure to antibiotics, they harbor bacteria that carry functional antibiotic resistance (AR) genes, including those that confer resistance to synthetic antibiotics and are syntenic with mobilization elements. These results suggest that westernization significantly affects human microbiome diversity and that functional AR genes appear to be a feature of the human microbiome even in the absence of exposure to commercial antibiotics. AR genes are likely poised for mobilization and enrichment upon exposure to pharmacological levels of antibiotics. Our findings emphasize the need for extensive characterization of the function of the microbiome and resistome in remote nonwesternized populations before globalization of modern practices affects potentially beneficial bacteria harbored in the human body.
Article
Full-text available
This article gives a formal definition of a lognormal family of probability distributions on the set of symmetric positive definite (PD) matrices, seen as a matrix-variate extension of the univariate lognormal family of distributions. Two forms of this distribution are obtained as the large sample limiting distribution via the central limit theorem of two types of geometric averages of i.i.d. PD matrices: the log-Euclidean average and the canonical geometric average. These averages correspond to two different geometries imposed on the set of PD matrices. The limiting distributions of these averages are used to provide large-sample confidence regions for the corresponding population means. The methods are illustrated on a voxelwise analysis of diffusion tensor imaging data, permitting a comparison between the various average types from the point of view of their sampling variability.
Conference Paper
Full-text available
Reproducibility of experiments is the pillar of a rigorous scientific approach. However, simulation-based experiments often fail to meet this fundamental requirement. In this paper, we first revisit the definition of reproducibility in the context of simulation. Then, we give a comprehensive review of issues that make this highly desirable feature so difficult to obtain. Given that experimental (in-silico) science is only one of the many applications of simulation, our analysis also explores the needs and benefits of providing the simulation reproducibility property for other kinds of applications. Coming back to scientific applications, we give a few examples of solutions proposed for solving the above issues. Finally, going one step beyond reproducibility, we also discuss in our conclusion the notion of traceability and its potential use in order to improve the simulation methodology.
Article
Full-text available
Persistent homology is a widely used tool in Topological Data Analysis that encodes multiscale topological information as a multi-set of points in the plane called a persistence diagram. It is difficult to apply statistical theory directly to a random sample of diagrams. Instead, we can summarize the persistent homology with the persistence landscape, introduced by Bubenik, which converts a diagram into a well-behaved real-valued function. We investigate the statistical properties of landscapes, such as weak convergence of the average landscapes and convergence of the bootstrap. In addition, we introduce an alternate functional summary of persistent homology, which we call the silhouette, and derive an analogous statistical theory.
Article
Full-text available
Optimization on manifolds is a rapidly developing branch of nonlinear optimization. Its focus is on problems where the smooth geometry of the search space can be leveraged to design efficient numerical algorithms. In particular, optimization on manifolds is well-suited to deal with rank and orthogonality constraints. Such structured constraints appear pervasively in machine learning applications, including low-rank matrix completion, sensor network localization, camera network registration, independent component analysis, metric learning, dimensionality reduction and so on. The Manopt toolbox, available at www.manopt.org, is a user-friendly, documented piece of software dedicated to simplify experimenting with state of the art Riemannian optimization algorithms. We aim particularly at reaching practitioners outside our field.
Article
Full-text available
A general framework for a novel non-geodesic decomposition of high-dimensional spheres or high-dimensional shape spaces for planar landmarks is discussed. The decomposition, principal nested spheres, leads to a sequence of submanifolds with decreasing intrinsic dimensions, which can be interpreted as an analogue of principal component analysis. In a number of real datasets, an apparent one-dimensional mode of variation curving through more than one geodesic component is captured in the one-dimensional component of principal nested spheres. While analysis of principal nested spheres provides an intuitive and flexible decomposition of the high-dimensional sphere, an interesting special case of the analysis results in finding principal geodesics, similar to those from previous approaches to manifold principal component analysis. An adaptation of our method to Kendall’s shape space is discussed, and a computational algorithm for fitting principal nested spheres is proposed. The result provides a coordinate system to visualize the data structure and an intuitive summary of principal modes of variation, as exemplified by several datasets.
Conference Paper
Full-text available
NetworkX is a Python language package for exploration and analysis of networks and network algorithms. The core package provides data structures for representing many types of networks, or graphs, including simple graphs, directed graphs, and graphs with parallel edges and self loops. The nodes in NetworkX graphs can be any (hashable) Python object and edges can contain arbitrary data; this flexibility mades NetworkX ideal for representing networks found in many different scientific fields. In addition to the basic data structures many graph algorithms are implemented for calculating network properties and structure measures: shortest paths, betweenness centrality, clustering, and degree distribution and many more. NetworkX can read and write various graph formats for eash exchange with existing data, and provides generators for many classic graphs and popular graph models, such as the Erdoes-Renyi, Small World, and Barabasi-Albert models, are included. The ease-of-use and flexibility of the Python programming language together with connection to the SciPy tools make NetworkX a powerful tool for scientific computations. We discuss some of our recent work studying synchronization of coupled oscillators to demonstrate how NetworkX enables research in the field of computational networks.
Conference Paper
Full-text available
This paper is a report on the 3D Shape Retrieval Constest 2010 (SHREC'10) track on large scale retrieval. This benchmark allows evaluating how wel retrieval algorithms scale up to large collections of 3D models. The task was to perform 40 queries in a dataset of 10000 shapes. We describe the methods used and discuss the results and signifiance analysis.
Conference Paper
Full-text available
Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. To handle such data structures, Grassmann kernels have been proposed and used previously. In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of affine as well as scaled subspaces. We demonstrate the advantages of these extended kernels for classification and recognition tasks with Support Vector Machines and Kernel Discriminant Analysis using synthetic and real image databases.
Article
Full-text available
We present an algorithm for determining the Morse complex of a 2- or 3-dimensional grayscale digital image. Each cell in the Morse complex corresponds to a topological change in the level sets (i.e. a critical point) of the grayscale image. Since more than one critical point may be associated with a single image voxel, we model digital images by cubical complexes. A new homotopic algorithm is used to construct a discrete Morse function on the cubical complex that agrees with the digital image and has exactly the number and type of critical cells necessary to characterize the topological changes in the level sets. We make use of discrete Morse theory and simple homotopy theory to prove correctness of this algorithm. The resulting Morse complex is considerably simpler than the cubical complex originally used to represent the image and may be used to compute persistent homology.
Article
Full-text available
This paper introduces a square-root velocity (SRV) representation for analyzing shapes of curves in Euclidean spaces using an elastic metric. Under this SRV representation the elastic metric simplifies to the L2 metric, the re-parameterization group acts by isometries, and the space of unit length curves becomes the unit sphere. The shape space of closed curves is quotient space of (a submanifold of) the unit sphere, modulo rotation and re-parameterization groups, and one finds geodesics in that space using a path-straightening approach. These geodesics and geodesic distances provide a framework for optimally matching, deforming and comparing shapes. These ideas are demonstrated using: (i) Shape analysis of cylindrical helices for studying protein structures, (ii) Shape analysis of facial curves for face recognition, (iii) A wrapped probability distribution to capture shapes of planar closed curves, and (iv) Parallel transport of deformations for predicting shapes from novel poses.
Article
Full-text available
In this paper we develop new Newton and conjugate gradient algorithms on the Grassmann and Stiefel manifolds. These manifolds represent the constraints that arise in such areas as the symmetric eigenvalue problem, nonlinear eigenvalue problems, electronic structures computations, and signal processing. In addition to the new algorithms, we show how the geometrical framework gives penetrating new insights allowing us to create, understand, and compare algorithms. The theory proposed here provides a taxonomy for numerical linear algebra algorithms that provide a top level mathematical view of previously unrelated algorithms. It is our hope that developers of new algorithms and perturbation theories will benefit from the theory, methods, and examples in this paper. Comment: The condensed matter interest is as new methods for minimizing Kohn-Sham orbitals under the constraints of orthonormality and as "geometrically correct" generalizations and extensions of the analystically continued functional approach, Phys. Rev. Lett. 69, 1077 (1992). The problem of orthonormality constraints is quite general and the methods discussed are also applicable in a wide range of fields. To appear in SIAM Journal of Matrix Analysis and Applications, in press for sometime in August-October 1998; 52 pages, 8 figures
Article
Persistent homology is a fundamental tool in topological data analysis used for the most diverse applications. Information captured by persistent homology is commonly visualized using scatter plots representations. Despite being widely adopted, such a visualization technique limits user understanding and is prone to misinterpretation. This paper proposes a new approach for the efficient computation of persistence cycles, a geometric representation of the features captured by persistent homology. We illustrate the importance of rendering persistence cycles when analyzing scalar fields, and we discuss the advantages that our approach provides compared to other techniques in topology-based visualization. We provide an efficient implementation of our approach based on discrete Morse theory, as a new module for the Topology Toolkit. We show that our implementation has comparable performance with respect to state-of-the-art toolboxes while providing a better framework for visually analyzing persistent homology information.
Article
Representing an object by a skeletal structure can be powerful for statistical shape analysis if there is good correspondence of the representations within a population. Many anatomic objects have a genus-zero boundary and can be represented by a smooth unbranching skeletal structure that can be discretely approximated. We describe how to compute such a discrete skeletal structure (“d-s-rep”) for an individual 3D shape with the desired correspondence across cases. The method involves fitting a d-s-rep to an input representation of an object’s boundary. A good fit is taken to be one whose skeletally implied boundary well approximates the target surface in terms of low order geometric boundary properties: 1) positions, 2) tangent fields, 3) various curvatures. Our method involves a two-stage framework that first, roughly yet consistently fits a skeletal structure to each object and second, refines the skeletal structure such that the shape of the implied boundary well approximates that of the object. The first stage uses a stratified diffeomorphism to produce topologically non-self-overlapping, smooth and unbranching skeletal structures for each object of a population. The second stage uses loss terms that measure geometric disagreement between the skeletally implied boundary and the target boundary and avoid self-overlaps in the boundary. By minimizing the total loss, we end up with a good d-s-rep for each individual shape. We demonstrate such d-s-reps for various human brain structures. The framework is accessible and extensible by clinical users, researchers and developers as an extension of SlicerSALT, which is based on 3D Slicer.
Chapter
The James-Stein shrinkage estimator was proposed in the field of Statistics as an estimator of the mean for samples drawn from a Gaussian distribution and shown to dominate the maximum likelihood estimator (MLE) in terms of the risk. This seminal work lead to a flurry of activity in the field of shrinkage estimation. However, there has been very little work on shrinkage estimation for data samples that reside on manifolds. In this paper, we present a novel shrinkage estimator of the Fréchet Mean (FM) of manifold-valued data for the manifold, PnP_n, of symmetric positive definite matrices of size ‘n’. We choose to endow PnP_n with the well known Log-Euclidean metric for its simplicity and ease of computation. With this choice of the metric, we show that the shrinkage estimator can be derived in an analytic form. Further, we prove that the shrinkage estimate of FM dominates the MLE of the FM in terms of the risk. We present several synthetic data examples with noise along with performance comparisons to estimated FM using other non-shrinkage estimators. As an application of shrinkage FM-estimation to real data, we compute the average motor sensory area (M1) tract from diffusion MR brain scans of controls and patients with Parkinson Disease (PD). We first show the dominance of the shrinkage FM estimator over the MLE of FM in this setting and then perform group testing to show differences between PD and controls based on the M1 tracts.
Article
Motivation: Sequence analysis is arguably a foundation of modern biology. Classic approaches to sequence analysis are based on sequence alignment, which is limited when dealing with large-scale sequence data. A dozen of alignment-free approaches have been developed to provide computationally efficient alternatives to alignment-based approaches. However, existing methods define sequence similarity based on various heuristics and can only provide rough approximations to alignment distances. Results: In this paper, we developed a new approach, referred to as SENSE (SiamEse Neural network for Sequence Embedding), for efficient and accurate alignment-free sequence comparison. The basic idea is to use a deep neural network to learn an explicit embedding function based on a small training dataset to project sequences into an embedding space so that the mean square error between alignment distances and pairwise distances defined in the embedding space is minimized. To the best of our knowledge, this is the first attempt to use deep learning for alignment-free sequence analysis. A large-scale experiment was performed that demonstrated that our method significantly outperformed the state-of-the-art alignment-free methods in terms of both efficiency and accuracy. Availability and implementation: Open-source software for the proposed method is developed and freely available at https://www.acsu.buffalo.edu/~yijunsun/lab/SENSE.html. Supplementary information: Supplementary data are available at Bioinformatics online.
Article
The nature of scientific research in mathematical and computational biology allows editors and reviewers to evaluate the findings of a scientific paper. Replication of a research study should be the minimum standard for judging its scientific claims and considering it for publication. This requires changes in the current peer review practice and a strict adoption of a replication policy similar to those adopted in experimental fields such as organic synthesis. In the future, the culture of replication can be easily adopted by publishing papers through dynamic computational notebooks combining formatted text, equations, computer algebra and computer code.
Article
This system paper presents the Topology ToolKit (TTK), a software platform designed for the topological analysis of scalar data in scientific visualization. While topological data analysis has gained in popularity over the last two decades, it has not yet been widely adopted as a standard data analysis tool for end users or developers. TTK aims at addressing this problem by providing a unified, generic, efficient, and robust implementation of key algorithms for the topological analysis of scalar data, including: critical points, integral lines, persistence diagrams, persistence curves, merge trees, contour trees, Morse-Smale complexes, fiber surfaces, continuous scatterplots, Jacobi sets, Reeb spaces, and more. TTK is easily accessible to end users due to a tight integration with ParaView. It is also easily accessible to developers through a variety of bindings (Python, VTK/C++) for fast prototyping or through direct, dependency-free, C++, to ease integration into pre-existing complex systems. While developing TTK, we faced several algorithmic and software engineering challenges, which we document in this paper. In particular, we present an algorithm for the construction of a discrete gradient that complies to the critical points extracted in the piecewise-linear setting. This algorithm guarantees a combinatorial consistency across the topological abstractions supported by TTK, and importantly, a unified implementation of topological data simplification for multi-scale exploration and analysis. We also present a cached triangulation data structure, that supports time efficient and generic traversals, which self-adjusts its memory usage on demand for input simplicial meshes and which implicitly emulates a triangulation for regular grids with no memory overhead. Finally, we describe an original software architecture, which guarantees memory efficient and direct accesses to TTK features, while still allowing for researchers powerful and easy bindings and extensions. TTK is open source (BSD license) and its code, online documentation and video tutorials are available on TTK's website [108].
Book
A thoroughly revised and updated edition of thisintroduction to modern statistical methods for shape analysis Shape analysis is an important tool in the many disciplines where objects are compared using geometrical features. Examples include comparing brain shape in schizophrenia; investigating protein molecules in bioinformatics; and describing growth of organisms in biology. This book is a significant update of the highly-regarded `Statistical Shape Analysis' by the same authors. The new edition lays the foundations of landmark shape analysis, including geometrical concepts and statistical techniques, and extends to include analysis of curves, surfaces, images and other types of object data. Key definitions and concepts are discussed throughout, and the relative merits of different approaches are presented. The authors have included substantial new material on recent statistical developments and offer numerous examples throughout the text. Concepts are introduced in an accessible manner, while retaining sufficient detail for more specialist statisticians to appreciate the challenges and opportunities of this new field. Computer code has been included for instructional use, along with exercises to enable readers to implement the applications themselves in R and to follow the key ideas by hands-on analysis. Statistical Shape Analysis: with Applications in R will offer a valuable introduction to this fast-moving research area for statisticians and other applied scientists working in diverse areas, including archaeology, bioinformatics, biology, chemistry, computer science, medicine, morphometics and image analysis.
Article
PHAT is an open-source C++ library for the computation of persistent homology by matrix reduction, targeted towards developers of software for topological data analysis. We aim for a simple generic design that decouples algorithms from data structures without sacrificing efficiency or user-friendliness. We provide numerous different reduction strategies as well as data types to store and manipulate the boundary matrix. We compare the different combinations through extensive experimental evaluation and identify optimization techniques that work well in practical situations. We also compare our software with various other publicly available libraries for persistent homology.
Article
This technical report introduces a novel approach to efficient computation in homological algebra over fields, with particular emphasis on computing the persistent homology of a filtered topological cell complex. The algorithms here presented rely on a novel relationship between discrete Morse theory, matroid theory, and classical matrix factorizations. We provide background, detail the algorithms, and benchmark the software implementation in the Eirene package.
Conference Paper
The original mean shift algorithm [1] on Euclidean spaces (MS) was extended in [2] to operate on general Riemannian manifolds. This extension is extrinsic (Ext-MS) since the mode seeking is performed on the tangent spaces [3], where the underlying curvature is not fully considered (tangent spaces are only valid in a small neighborhood). In [3] was proposed an intrinsic mean shift designed to operate on two particular Riemannian manifolds (IntGS-MS), i.e. Grassmann and Stiefel manifolds (using manifold-dedicated density kernels). It is then natural to ask whether mean shift could be intrinsically extended to work on a large class of manifolds. We propose a novel paradigm to intrinsically reformulate the mean shift on general Riemannian manifolds. This is accomplished by embedding the Riemannian manifold into a Reproducing Kernel Hilbert Space (RKHS) by using a general and mathematically well-founded Riemannian kernel function, i.e. heat kernel [5]. The key issue is that when the data is implicitly mapped to the Hilbert space, the curvature of the manifold is taken into account (i.e. exploits the underlying information of the data). The inherent optimization is then performed on the embedded space. Theoretic analysis and experimental results demonstrate the promise and effectiveness of this novel paradigm.
Article
We introduce an efficient preprocessing algorithm to reduce the number of cells in a filtered cell complex while preserving its persistent homology groups. The technique is based on an extension of combinatorial Morse theory from complexes to filtrations.
Article
The shape-space Σ m k whose points σ represent the shapes of not totally degenerate k-ads in ℝ m is introduced as a quotient space carrying the quotient metric. When m=1, we find that Σ 1 k =S k-2 : when m≥3, the shape-space contains singularities. This paper deals mainly with the case m=2, when the shape-space Σ 2 k can be identified with a version of ℂP k-2 · Of special importance are the shape-measures induced on ℂP k-2 by any assigned diffuse law of distribution for the k vertices. We determine several such shape-measures, we resolve some of the technical problems associated with the graphic presentation and statistical analysis of empirical shape distributions, and among applications we discuss the relevance of these ideas to testing for the presence of non- accidental multiple alignments in collections of (i) neolithic stone monuments and (ii) quasars. Finally the recently introduced Ambartzumian density is examined from the present point of view, its norming constant is found, and its connexion with random Crofton polygons is established.
Article
A fuzzy extension of the Rand index [Rand, W.M., 1971. Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 846–850] is introduced in this paper. The Rand index is a traditional criterion for assessment and comparison of different results provided by classifiers and clustering algorithms. It is able to measure the quality of different hard partitions of a data set from a classification perspective, including partitions with different numbers of classes or clusters. The original Rand index is extended here by making it able to evaluate a fuzzy partition of a data set – provided by a fuzzy clustering algorithm or a classifier with fuzzy-like outputs – against a reference hard partition that encodes the actual (known) data classes. A theoretical formulation based on formal concepts from the fuzzy set theory is derived and used as a basis for the mathematical interpretation of the Fuzzy Rand Index proposed. The fuzzy counterparts of other (five) related indexes, namely, the Adjusted Rand Index of Hubert and Arabie, the Jaccard coefficient, the Minkowski measure, the Fowlkes–Mallows Index, and the Γ statistics, are also derived from this formulation.
Article
This paper shows an embedding of the manifold of multivariate normal densities with informative geometry into the manifold of definite positive matrices with the Siegel metric. This embedding allows us to obtain a general lower bound for the Rao distance, which is itself a distance, and we suggest employing it for statistical purposes, taking into account the similitude of the above related metrics. Further-more, through this embedding, general statistical tests of hypothesis are derived, and some geometrical properties are studied too.
Article
The original mean shift algorithm is widely ap- plied for nonparametric clustering in vector spaces. In this paper we generalize it to data points lying on Riemannian manifolds. This allows us to extend mean shift based clus- tering and filtering techniques to a large class of frequently occurring non-vector spaces in vision. We present an exact algorithm and prove its convergence properties as opposed to previous work which approximates the mean shift vector. The computational details of our algorithm are presented for frequently occurring classes of manifolds such as matrix Lie groups, Grassmann manifolds, essential matrices and sym- metric positive definite matrices. Applications of the mean shift over these manifolds are shown.
Manifolds.jl: a library of Riemannian manifolds in Julia
  • Seth Axen
  • Mateusz Baran
  • Ronny Bergmann
Seth Axen, Mateusz Baran, and Ronny Bergmann. Manifolds.jl: a library of Riemannian manifolds in Julia, 2021. URL https://github.com/JuliaManifolds/Manifolds.jl.
PyRiemann: Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface
  • Alexandre Barachant
Alexandre Barachant. PyRiemann: Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface, 2015. URL https://github.com/alexandrebarachant/pyRiemann.
Capd::redhom -simplicical and cubical homology
  • Piotr Brendel
  • Paweł Dłotko
  • Grzegorz Jabłoński
  • Mateusz Juda
  • Andrzej Krajniak
  • Marian Mrozek
  • Hubert Wagner
  • Przemysław Witek
Piotr Brendel, Paweł Dłotko, Grzegorz Jabłoński, Mateusz Juda, Andrzej Krajniak, Marian Mrozek, Hubert Wagner, Przemysław Witek, and NataliaŻelazna. Capd::redhom -simplicical and cubical homology., 2019. URL http://capd.sourceforge.net/capdRedHom/.
PyGeometry: Library for handling various differentiable manifolds
  • Andrea Censi
Andrea Censi. PyGeometry: Library for handling various differentiable manifolds., 2012. URL https://github.com/AndreaCensi/geometry.
A wilcoxon-type test for trend
  • Jack Cuzick
Jack Cuzick. A wilcoxon-type test for trend. Statistics in Medicine, 4 (4):543-547, 1985. doi: https://doi.org/10.1002/sim.4780040416. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.4780040416.
Fuzzy cmeans clustering for persistence diagrams. CoRR, abs
  • Thomas Davies
  • Jack Aspinall
  • Bryan Wilder
  • Long Tran-Thanh
Thomas Davies, Jack Aspinall, Bryan Wilder, and Long Tran-Thanh. Fuzzy cmeans clustering for persistence diagrams. CoRR, abs/2006.02796, 2020. URL https://arxiv.org/abs/2006.02796.