Anna Kirkpatrick’s research while affiliated with Georgia Institute of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (9)


sEBM: Scaling Event Based Models to Predict Disease Progression via Implicit Biomarker Selection and Clustering
  • Chapter

June 2023

·

5 Reads

·

3 Citations

Information Processing in Medical Imaging: Proceedings of the ... Conference

·

Anna Kirkpatrick

·

Cassie S. Mitchell

The Event Based Model (EBM) is a probabilistic generative model to explore biomarker changes occurring as a disease progresses. Disease progression is hypothesized to occur through a sequence of biomarker dysregulation “events”. The EBM estimates the biomarker dysregulation event sequence. It computes the data likelihood for a given dysregulation sequence, and subsequently evaluates the posterior distribution on the dysregulation sequence. Since the posterior distribution is intractable, Markov Chain Monte-Carlo is employed to generate samples under the posterior distribution. However, the set of possible sequences increases as N! where N is the number of biomarkers (data dimension) and quickly becomes prohibitively large for effective sampling via MCMC. This work proposes the “scaled EBM” (sEBM) to enable event based modeling on large biomarker sets (e.g. high-dimensional data). First, sEBM implicitly selects a subset of biomarkers useful for modeling disease progression and infers the event sequence only for that subset. Second, sEBM clusters biomarkers with similar positions in the event sequence and only orders the “clusters”, with each successive cluster corresponding to the next stage in disease progression. These two modifications used to construct the sEBM method provably reduces the possible space of event sequences by multiple orders of magnitude. The novel modifications are supported by theory and experiments on synthetic and real clinical data provides validation for sEBM to work in higher dimensional settings. Results on synthetic data with known ground truth shows that sEBM outperforms previous EBM variants as data dimensions increase. sEBM was successfully implemented with up to 300 biomarkers, which is a 6-fold increase over previous EBM applications. A real-world clinical application of sEBM is performed using 119 neuroimaging markers from publicly available Alzheimer’s Disease Neuroimaging Initiative (ADNI) data to stratify subjects into 6 stages of disease progression. Subjects included cognitively normal (CN), mild cognitive impairment (MCI), and Alzheimer’s Disease (AD). sEBM stage is differentiated for the 3 groups (χ2pvalue<\chi ^{2} p-value< 4.6e−32). Increased sEBM stage is a strong predictor of conversion risk to AD (pvalue<2.3e14p-value < 2.3e-14) for MCI subjects, as verified with a Cox proportional-hazards model adjusted for age, sex, education and APOE4 status. Like EBM, sEBM does not rely on apriori defined diagnostic labels and only uses cross-sectional data.Keywordsdisease progression modelingbayesian learningprognostic biomarker selectionbiomarker clustering


Example graph, metapath, and HeteSim computation.
Overview of SemNet version 1 HeteSim implementation. Speed ratio is computed as (SemNet1time)/(SemNet2time) and is given for source node insulin and target node Alzheimer’s disease. In SemNet 2, the approximate mean HeteSim algorithm is used with approximation parameters ϵ=0.1 and r=0.9.
Distribution of SemNet version 1 HeteSim computation times for all metapaths joining the given source node and Alzheimer’s disease. (a) Insulin; (b) Hypothyroidism; (c) Amyloid.
Distribution of Neo4j query times in SemNet version 1 HeteSim computation for all metapaths joining the given source node and Alzheimer’s disease. (a) Insulin; (b) Hypothyroidism; (c) Amyloid.
Overview of SemNet version 2 approximate mean HeteSim implementation. Speed ratio is (SemNet1time)/(SemNet2time) and is given for source node insulin and target node Alzheimer’s disease. SemNet version 2 used approximation parameters ϵ=0.1 and r=0.9.

+5

Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0
  • Article
  • Full-text available

March 2022

·

71 Reads

·

12 Citations

Anna Kirkpatrick

·

Chidozie Onyeze

·

David Kartchner

·

[...]

·

Cassie S. Mitchell

Literature-based discovery (LBD) summarizes information and generates insight from large text corpuses. The SemNet framework utilizes a large heterogeneous information network or “knowledge graph” of nodes and edges to compute relatedness and rank concepts pertinent to a user-specified target. SemNet provides a way to perform multi-factorial and multi-scalar analysis of complex disease etiology and therapeutic identification using the 33+ million articles in PubMed. The present work improves the efficacy and efficiency of LBD for end users by augmenting SemNet to create SemNet 2.0. A custom Python data structure replaced reliance on Neo4j to improve knowledge graph query times by several orders of magnitude. Additionally, two randomized algorithms were built to optimize the HeteSim metric calculation for computing metapath similarity. The unsupervised learning algorithm for rank aggregation (ULARA), which ranks concepts with respect to the user-specified target, was reconstructed using derived mathematical proofs of correctness and probabilistic performance guarantees for optimization. The upgraded ULARA is generalizable to other rank aggregation problems outside of SemNet. In summary, SemNet 2.0 is a comprehensive open-source software for significantly faster, more effective, and user-friendly means of automated biomedical LBD. An example case is performed to rank relationships between Alzheimer’s disease and metabolic co-morbidities.

Download

RNAStructViz: Graphical base pairing analysis

April 2021

·

12 Reads

·

1 Citation

Bioinformatics

We present a new graphical tool for RNA secondary structure analysis. The central feature is the ability to visually compare/contrast up to three base pairing configurations for a given sequence in a compact, standardized circular arc diagram layout. This is complemented by a built-in CT-style file viewer and radial layout substructure viewer which are directly linked to the arc diagram window via the zoom selection tool. Additional functionality includes the computation of some numerical information, and the ability to export images and data for later use. This tool should be of use to researchers seeking to better understand similarities and differences between structural alternatives for an RNA sequence. Availability and implementation: https://github.com/gtDMMB/RNAStructViz/wiki.


RNAStructViz: Graphical base pairing analysis

January 2021

·

50 Reads

We present a new graphical tool for RNA secondary structure analysis. The central feature is the ability to visually compare/contrast up to three base pairing configurations for a given sequence in a compact, standardized circular arc diagram layout. This is complemented by a built-in CT-style file viewer and radial layout substructure viewer which are directly linked to the arc diagram window via the zoom selection tool. Additional functionality includes the computation of some numerical information, and the ability to export images and data for later use. This tool should be of use to researchers seeking to better understand similarities and differences between structural alternatives for an RNA sequence. Availability and implementation https://github.com/gtDMMB/RNAStructViz/wiki Author contacts mschmidt34@gatech.edu , akirkpatrick3@gatech.edu , and heitsch@math.gatech.edu


On the Asymptotic Distributions of Classes of Subtree Additive Properties of Plane Trees under the Nearest Neighbor Thermodynamic Model

January 2021

·

8 Reads

We define a class of properties on random plane trees, which we call subtree additive properties, inspired by the combinatorics of certain biologically-interesting properties in a plane tree model of RNA secondary structure. The class of subtree additive properties includes the Wiener index and path length (total ladder distance and total ladder contact distance, respectively, in the biological context). We then investigate the asymptotic distribution of these subtree additive properties on a random plane tree distributed according to a Gibbs distribution arising from the Nearest Neighbor Thermodynamic Model for RNA secondary structure. We show that for any property in the class considered, there is a constant that translates the uniformly weighted random variable to the Gibbs distribution weighted random variable (and we provide the constant). We also relate the asymptotic distribution of another class of properties, which we call simple subtree additive properties, to the asymptotic distribution of the path length, both in the uniformly weighted case. The primary proof techniques in this paper come from analytic combinatorics, and most of our results follow from relating the moments of known and unknown distributions and showing that this is sufficient for convergence.


Nearest Neighbor Thermodynamic Model (NNTM) parameters and resulting energy functions. Energy functions are of the form αd 0 + βd 1 + γr.
Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications

October 2020

·

105 Reads

·

3 Citations

Mathematical and Computational Applications

Ribonucleic acid (RNA) secondary structures and branching properties are important for determining functional ramifications in biology. While energy minimization of the Nearest Neighbor Thermodynamic Model (NNTM) is commonly used to identify such properties (number of hairpins, maximum ladder distance, etc.), it is difficult to know whether the resultant values fall within expected dispersion thresholds for a given energy function. The goal of this study was to construct a Markov chain capable of examining the dispersion of RNA secondary structures and branching properties obtained from NNTM energy function minimization independent of a specific nucleotide sequence. Plane trees are studied as a model for RNA secondary structure, with energy assigned to each tree based on the NNTM, and a corresponding Gibbs distribution is defined on the trees. Through a bijection between plane trees and 2-Motzkin paths, a Markov chain converging to the Gibbs distribution is constructed, and fast mixing time is established by estimating the spectral gap of the chain. The spectral gap estimate is obtained through a series of decompositions of the chain and also by building on known mixing time results for other chains on Dyck paths. The resulting algorithm can be used as a tool for exploring the branching structure of RNA, especially for long sequences, and to examine branching structure dependence on energy model parameters. Full exposition is provided for the mathematical techniques used with the expectation that these techniques will prove useful in bioinformatics, computational biology, and additional extended applications.


Markov Chain-based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model

April 2020

·

18 Reads

We study plane trees as a model for RNA secondary structure, assigning energy to each tree based on the Nearest Neighbor Thermodynamic Model, and defining a corresponding Gibbs distribution on the trees. Through a bijection between plane trees and 2-Motzkin paths, we design a Markov chain converging to the Gibbs distribution, and establish fast mixing time results by estimating the spectral gap of the chain. The spectral gap estimate is established through a series of decompositions of the chain and also by building on known mixing time results for other chains on Dyck paths. In addition to the mathematical aspects of the result, the resulting algorithm can be used as a tool for exploring the branching structure of RNA and its dependence on energy model parameters. The pseudocode implementing the Markov chain is provided in an appendix.


The challenge of RNA branching prediction: a parametric analysis of multiloop initiation under thermodynamic optimization

February 2020

·

17 Reads

·

11 Citations

Journal of Structural Biology

Prediction of RNA base pairings yields insight into molecular structure, and therefore function. The most common methods predict an optimal structure under the standard thermodynamic model. One component of this model is the equation which governs the cost of branching, where three or more helical "arms" radiate out from a multiloop (also known as a junction). The multiloop initiation equation has three parameters; changing those values can significantly alter the predicted structure. We give a complete analysis of the prediction accuracy, stability, and robustness for all possible parameter combinations for a diverse set of tRNA sequences, and also for 5S rRNA. We find that the accuracy can often be substantially improved on a per sequence basis. However, simultaneous improvement within families, and most especially between families, remains a challenge.


MFE prediction accuracy comparison
Improved parameters from branching polytopes
Polytope computation time and structural complexity for tRNA and 5S rRNA
Average d = 1 bounded region dimensions in (a, b, c)
The challenge of RNA branching prediction: a parametric analysis of multiloop initiation under thermodynamic optimization

January 2020

·

37 Reads

Prediction of RNA base pairings yields insight into molecular structure, and therefore function. The most common methods predict an optimal structure under the standard thermodynamic model. One component of this model is the equation which governs the cost of branching, where three or more helical "arms" radiate out from a multiloop (also known as a junction). The multiloop initiation equation has three parameters; changing those values can significantly alter the predicted structure. We give a complete analysis of the prediction accuracy, stability, and robustness for all possible parameter combinations for a diverse set of tRNA sequences, and also for 5S rRNA. We find that the accuracy can often be substantially improved on a per sequence basis. However, simultaneous improvement within families, and most especially between families, remains a challenge.

Citations (5)


... Such models ignored the overlapping impacts between normal aging and disease progression that compromise the connectivity of brains [38,39] • Event base models (EBM)-An EBM uses cross-sectional data to compute various metrics on networks at different stages. A maximum-likelihood estimate determines the ordered sequence in which biomarkers become abnormal [38,40,41]. While an EBM can capture the dynamics of networks, the downstream usage of the sequential aggregated metrics for longitudinal network diffusion modeling is limited by granularity. ...

Reference:

Network Diffusion-Constrained Variational Generative Models for Investigating the Molecular Dynamics of Brain Connectomes Under Neurodegeneration
sEBM: Scaling Event Based Models to Predict Disease Progression via Implicit Biomarker Selection and Clustering
  • Citing Chapter
  • June 2023

Information Processing in Medical Imaging: Proceedings of the ... Conference

... Utilizing computational NLP pipelines: Four out of six studies [38][39][40]43 analyzing the literature datasets focused on a single source of publications, PubMed. While this source is valuable, there is considerable scope for broadening the range of literature sources analyzed to provide a more comprehensive view of the field. ...

Optimizations for Computing Relatedness in Biomedical Heterogeneous Information Networks: SemNet 2.0

... SHAPE reactivity values were used as constraints to model the vRNA secondary structure using RNAStructure (version 6.4) [36] with the default values of −0.6 kcal/mol and 1.8 kcal/mol [37] for intercept and slope, respectively, and secondary structure were using VARNA (version 3.93) [38]. Secondary structures predicted with RNAStructure v6.4 were compared using RNAStructViz v2.14.18 [39]. ...

RNAStructViz: Graphical base pairing analysis
  • Citing Article
  • April 2021

Bioinformatics

... Proposed in 1913 by the mathematician Markov (2006), Markov chains have allowed for the development of a new line of research in probability theory that no longer considers events as independent, but considers them as the result of a succession of linked events (result of the system state) (Hayes, 2013). Today, the principle of Markov chains can be found in many fields such as finance (Cui et al., 2019a), economics (Phelan and Eslami, 2021;Briggs and Sculpher, 1998), biology (Kirkpatrick et al., 2020), medicine (Sonnenberg and Beck, 1993;Cooper and Lipsitch, 2004), physics (Ibinson et al., 2008) as well as in many manufacturing systems (Papadopoulos et al., 2019) and agricultural problematics such as the prediction of vegetation dynamics (Balzter, 2000), the analysis of agricultural drought (Biamah et al., 2005) or the forecasting cotton yields (Matis et al., 1989). ...

Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications

Mathematical and Computational Applications

... We provide here a mathematical motivation for generating alternative predictions based on a parametric analysis of RNA branching Drellich et al. (2017); Barrera-Cruz et al. (2018); Poznanović et al. (2020Poznanović et al. ( , 2021. Using methods from geometric combinatorics Drellich et al. (2017), it is possible to identify all optimal predictions under any parameterization of the entropic branching penalty. ...

The challenge of RNA branching prediction: a parametric analysis of multiloop initiation under thermodynamic optimization
  • Citing Article
  • February 2020

Journal of Structural Biology