Christine Heitsch’s research while affiliated with Georgia Institute of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (56)


How Parameters Influence SHAPE-Directed Predictions
  • Article

May 2024

·

14 Reads

Methods in molecular biology (Clifton, N.J.)

Torin Greenwood

·

Christine E Heitsch

The structure of an rna sequence encodes information about its biological function. Dynamic programming algorithms are often used to predict the conformation of an rna molecule from its sequence alone, and adding experimental data as auxiliary information improves prediction accuracy. This auxiliary data is typically incorporated into the nearest neighbor thermodynamic model22 by converting the data into pseudoenergies. Here, we look at how much of the space of possible structures auxiliary data allows prediction methods to explore. We find that for a large class of rna sequences, auxiliary data shifts the predictions significantly. Additionally, we find that predictions are highly sensitive to the parameters which define the auxiliary data pseudoenergies. In fact, the parameter space can typically be partitioned into regions where different structural predictions predominate.


Figure 1: Example of Pv2 output webpage. Left column displays an interactive summary profile graph -by default a decision tree. Nodes are clickable and labeled with corresponding number of sampled structures. Center column has a dynamic panel (top) displaying features in arc diagram format for chosen (grey) node from profile graph. Decisions on incoming edge are emphasized; positive ones in bold above the sequence line, and negative ones, denoted with ¬ in tree, in red below. Features are labeled according to the FSC table (center, middle) which lists SC regions [i, j; k, l] with fuzzy frequencies and SHC contained. Nontrivial FSC are denoted by letters, and trivial ones by their SHC index. All SHC are listed (center, bottom) with maximal (i, j, k) triplet and integer indexed in decreasing exact frequency as given. Selected profiles, or groups thereof, are denoted by rectangular leaves in decision tree and labeled by roman numerals. More than one selected profile is represented if the incoming edge is a contingency (dashed). In this case, the contingency table is given below the feature display when the leaf (dashed rectangle) is chosen. Right column shows most frequent secondary structure for each leaf in radial (or arc) diagram format. Users can download all structures corresponding to the chosen node, or just the most frequent, for further analysis. See Section 2.4, and Supplemental Material, for further information.
RNAprofiling 2.0: Enhanced cluster analysis of structural ensembles
  • Article
  • Full-text available

March 2023

·

38 Reads

Understanding the base pairing of an RNA sequence provides insight into its molecular structure.By mining suboptimal sampling data, RNAprofiling 1.0 identifies the dominant helices in low-energy secondary structures as features, organizes them into profiles which partition the Boltzmann sample, and highlights key similarities/differences among the most informative, i.e. selected, profiles in a graphical format. Version 2.0 enhances every step of this approach. First, the featured substructures are expanded from helices to stems. Second, profile selection includes low-frequency pairings similar to featured ones. In conjunction, these updates extend the utility of the method to sequences up to length 600, as evaluated over a sizable dataset. Third, relationships are visualized in a decision tree which highlights the most important structural differences. Finally, this cluster analysis is made accessible to experimental researchers in a portable format as an interactive webpage, permitting a much greater understanding of trade-offs among different possible base pairing combinations.

Download

Figure 1: Example of Pv2 output webpage. Left column displays an interactive summary profile graph -by default a decision tree. Nodes are clickable and labeled with corresponding number of sampled structures. Center column has a dynamic panel (top) displaying features in arc diagram format for chosen (grey) node from profile graph. Decisions on incoming edge are emphasized; positive ones in bold above the sequence line, and negative ones, denoted with ¬ in tree, in red below. Features are labeled according to the FSC table (center, middle) which lists SC regions [i, j; k, l] with fuzzy frequencies and SHC contained. Nontrivial FSC are denoted by letters, and trivial ones by their SHC index. All SHC are listed (center, bottom) with maximal (i, j, k) triplet and integer indexed in decreasing exact frequency as given. Selected profiles, or groups thereof, are denoted by rectangular leaves in decision tree and labeled by roman numerals. More than one selected profile is represented if the incoming edge is a contingency (dashed). In this case, the contingency table is given below the feature display when the leaf (dashed rectangle) is chosen. Right column shows most frequent secondary structure for each leaf in radial (or arc) diagram format. Users can download all structures corresponding to the chosen node, or just the most frequent, for further analysis. See Section 2.4, and Supplemental Material, for further information.
RNAprofiling 2.0: Enhanced cluster analysis of structural ensembles

March 2023

·

1 Read

Understanding the base pairing of an RNA sequence provides insight into its molecular structure.By mining suboptimal sampling data, RNAprofiling 1.0 identifies the dominant helices in low-energy secondary structures as features, organizes them into profiles which partition the Boltzmann sample, and highlights key similarities/differences among the most informative, i.e. selected, profiles in a graphical format. Version 2.0 enhances every step of this approach. First, the featured substructures are expanded from helices to stems. Second, profile selection includes low-frequency pairings similar to featured ones. In conjunction, these updates extend the utility of the method to sequences up to length 600, as evaluated over a sizable dataset. Third, relationships are visualized in a decision tree which highlights the most important structural differences. Finally, this cluster analysis is made accessible to experimental researchers in a portable format as an interactive webpage, permitting a much greater understanding of trade-offs among different possible base pairing combinations.


Counting orbits under Kreweras complementation

March 2023

·

28 Reads

The Kreweras complementation map is an anti-isomorphism on the lattice of noncrossing partitions. We consider an analogous operation for plane trees motivated by the molecular biology problem of RNA folding. In this context, we explicitly count the orbits of Kreweras' map according to their length as the number of appropriate symmetry classes of trees in the plane. These enumeration results are consolidated into a single implicit formula under the cyclic sieving phenomenon.


Figure 1: A pairing exchange on unobstructed edges with indices 1 ≤ a < b < c < d ≤ 2n converts siblings (a, b), (c, d) into parent/child (a, d), (b, c), and vice versa. Note how the incident vertices split and merge. However, edges in the four subtrees with indices exclusively in A = [1, a − 1] ∪ [d + 1, 2n], B = [a + 1, b − 1], C = [b + 1, c − 1], or D = [c + 1, d − 1] are unaltered.
Figure 2: The graph G 3 with L 3 on left, U 3 on right, and the three plane trees T ∈ T 3 with c(T ) = 2 in the middle. Dashed lines are pairing exchanges. The number of odd edges increases by 1 moving left to right.
On a barrier height problem for RNA branching

March 2023

·

14 Reads

The branching of an RNA molecule is an important structural characteristic yet difficult to predict correctly, especially for longer sequences. Using plane trees as a combinatorial model for RNA folding, we consider the thermodynamic cost, known as the barrier height, of transitioning between branching configurations. Using branching skew as a coarse energy approximation, we characterize various types of paths in the discrete configuration landscape. In particular, we give sufficient conditions for a path to have both minimal length and minimal branching skew. The proofs offer some biological insights, notably the potential importance of both hairpin stability and domain architecture to higher resolution RNA barrier height analyses.


Figure 1: A pairing exchange on unobstructed edges with indices 1 ≤ a < b < c < d ≤ 2n converts siblings (a, b), (c, d) into parent/child (a, d), (b, c), and vice versa. Note how the incident vertices split and merge. However, edges in the four subtrees with indices exclusively in A = [1, a − 1] ∪ [d + 1, 2n], B = [a + 1, b − 1], C = [b + 1, c − 1], or D = [c + 1, d − 1] are unaltered.
Figure 2: The graph G 3 with L 3 on left, U 3 on right, and the three plane trees T ∈ T 3 with c(T ) = 2 in the middle. Dashed lines are pairing exchanges. The number of odd edges increases by 1 moving left to right.
On a barrier height problem for RNA branching

March 2023

·

15 Reads

·

1 Citation

The branching of an RNA molecule is an important structural characteristic yet difficult to predict correctly, especially for longer sequences. Using plane trees as a combinatorial model for RNA folding, we consider the thermodynamic cost, known as the barrier height, of transitioning between branching configurations. Using branching skew as a coarse energy approximation, we characterize various types of paths in the discrete configuration landscape. In particular, we give sufficient conditions for a path to have both minimal length and minimal branching skew. The proofs offer some biological insights, notably the potential importance of both hairpin stability and domain architecture to higher resolution RNA barrier height analyses.


Figure 1: Example of Pv2 output webpage. Left column displays an interactive summary profile graph -by default a decision tree. Nodes are clickable and labeled with corresponding number of sampled structures. Center column has a dynamic panel (top) displaying features in arc diagram format for chosen (grey) node from profile graph. Decisions on incoming edge are emphasized; positive ones in bold above the sequence line, and negative ones, denoted with ¬ in tree, in red below. Features are labeled according to the FSC table (center, middle) which lists SC regions [i, j; k, l] with fuzzy frequencies and SHC contained. Nontrivial FSC are denoted by letters, and trivial ones by their SHC index. All SHC are listed (center, bottom) with maximal (i, j, k) triplet and integer indexed in decreasing exact frequency as given. Selected profiles, or groups thereof, are denoted by rectangular leaves in decision tree and labeled by roman numerals. More than one selected profile is represented if the incoming edge is a contingency (dashed). In this case, the contingency table is given below the feature display when the leaf (dashed rectangle) is chosen. Right column shows most frequent secondary structure for each leaf in radial (or arc) diagram format. Users can download all structures corresponding to the chosen node, or just the most frequent, for further analysis. See Section 2.4, and Supplemental Material, for further information.
RNAprofiling 2.0: Enhanced Cluster Analysis of Structural Ensembles

March 2023

·

20 Reads

·

1 Citation

Journal of Molecular Biology

Understanding the base pairing of an RNA sequence provides insight into its molecular structure. By mining suboptimal sampling data, RNAprofiling 1.0 identifies the dominant helices in low-energy secondary structures as features, organizes them into profiles which partition the Boltzmann sample, and highlights key similarities/differences among the most informative, i.e.selected, profiles in a graphical format. Version 2.0 enhances every step of this approach. First, the featured substructures are expanded from helices to stems. Second, profile selection includes low-frequency pairings similar to featured ones. In conjunction, these updates extend the utility of the method to sequences up to length 600, as evaluated over a sizable dataset. Third, relationships are visualized in a decision tree which highlights the most important structural differences. Finally, this cluster analysis is made accessible to experimental researchers in a portable format as an interactive webpage, permitting a much greater understanding of trade-offs among different possible base pairing combinations.



RNAStructViz: Graphical base pairing analysis

April 2021

·

12 Reads

·

1 Citation

Bioinformatics

We present a new graphical tool for RNA secondary structure analysis. The central feature is the ability to visually compare/contrast up to three base pairing configurations for a given sequence in a compact, standardized circular arc diagram layout. This is complemented by a built-in CT-style file viewer and radial layout substructure viewer which are directly linked to the arc diagram window via the zoom selection tool. Additional functionality includes the computation of some numerical information, and the ability to export images and data for later use. This tool should be of use to researchers seeking to better understand similarities and differences between structural alternatives for an RNA sequence. Availability and implementation: https://github.com/gtDMMB/RNAStructViz/wiki.


Figure 2. Improved merging order.
Per family characteristics of the training and testing data sets.
MFE prediction accuracies for testing data set from Mathews Lab (U Rochester) with 557 tRNA sequences and 1283 5S rRNA. Table 2 parameters repeated for completeness.
Improving RNA Branching Predictions: Advances and Limitations

March 2021

·

52 Reads

·

7 Citations

Minimum free energy prediction of RNA secondary structures is based on the Nearest Neighbor Thermodynamics Model. While such predictions are typically good, the accuracy can vary widely even for short sequences, and the branching thermodynamics are an important factor in this variance. Recently, the simplest model for multiloop energetics—a linear function of the number of branches and unpaired nucleotides—was found to be the best. Subsequently, a parametric analysis demonstrated that per family accuracy can be improved by changing the weightings in this linear function. However, the extent of improvement was not known due to the ad hoc method used to find the new parameters. Here we develop a branch-and-bound algorithm that finds the set of optimal parameters with the highest average accuracy for a given set of sequences. Our analysis shows that the previous ad hoc parameters are nearly optimal for tRNA and 5S rRNA sequences on both training and testing sets. Moreover, cross-family improvement is possible but more difficult because competing parameter regions favor different families. The results also indicate that restricting the unpaired nucleotide penalty to small values is warranted. This reduction makes analyzing longer sequences using the present techniques more feasible.


Citations (30)


... Let T n denote the set of plane trees with n edges, and T ∈ T n . Motivated by the molecular biology problem of RNA folding [7], label the boundary of T with [1, 2n] in increasing order counter-clockwise from the root. Let (i, j) denote the edge in T with indices i < j on the left and right sides. ...

Reference:

Counting orbits under Kreweras complementation
On a barrier height problem for RNA branching

... SHAPE reactivity values were used as constraints to model the vRNA secondary structure using RNAStructure (version 6.4) [36] with the default values of −0.6 kcal/mol and 1.8 kcal/mol [37] for intercept and slope, respectively, and secondary structure were using VARNA (version 3.93) [38]. Secondary structures predicted with RNAStructure v6.4 were compared using RNAStructViz v2.14.18 [39]. ...

RNAStructViz: Graphical base pairing analysis
  • Citing Article
  • April 2021

Bioinformatics

... While the occurrence of multiloops of degree 10 or higher is not uncommon in various RNAs, 5 multiloop energies are the least accurately known among the numerous energy parameters involved in RNA secondary structure prediction. 50 Most of the current energy-based structure prediction software-including ViennaRNA-assume that the energy of a multiloop depends only on the amount of enclosed base pairs (number of branches) and the number of unpaired nucleotides in it and use a linear model of the form ...

Improving RNA Branching Predictions: Advances and Limitations

... Our DMS-MaP (Dimethyl Sulfate-Mutational Profiling) experiments on this 87nt construct in Figure 1C indicate low (black), medium (yellow), and high (red) nucleotides based on their reactivity against DMS (Mustoe et al. 2019 Dey, et al. 2021). Both DMS and SHAPE reactivities can be used as a pseudofree energy term in thermodynamic folding algorithms to significantly improve structure prediction (Deigan et al. 2009;Greenwood and Heitsch 2020) Our two DMS-MaP biological replicates in supplementary Figure S1 are quantitatively reproducible (R 2 >0.85). When used for structure prediction with SHAPEknots (Hajdin et al. 2013), we obtained three conformations, shown in Figures 1C, 1D and 1E. ...

On the Problem of Reconstructing a Mixture of RNA Structures
  • Citing Article
  • October 2020

Bulletin of Mathematical Biology

... stemming from the multiloop (see Ref.24 for a comparison of multiloop energy parameters used in different energy models). Earlier versions of ViennaRNA (until v2.0), for instance, used a positive value of this parameter, penalizing high-degree nodes, while the later versions of the software use a negative value, promoting high-degree nodes.Differences in the multiloop energy parameters can of course reflect in the predicted structures of long RNAs and their topological properties.51,52 This needs to be taken into account when comparing results obtained by existing studies on the branching properties of RNA8,9,20,25 that use different versions of folding software and thus potentially different energy models. ...

The challenge of RNA branching prediction: a parametric analysis of multiloop initiation under thermodynamic optimization
  • Citing Article
  • February 2020

Journal of Structural Biology

·

Fidel Barrera-Cruz

·

Anna Kirkpatrick

·

[...]

·

Christine Heitsch

... By focusing on the overall arrangement of edges/helices and vertices/loops, mathematical results have provided insight into the challenge of designing RNA sequences with a particular branching structure [10], configurations which minimize loop energy costs [1,2], and a parametric analysis of the branching entropy approximation [13]. This work has lead both to better understanding of RNA prediction accuracy [3,20,21] as well as some new combinatorics [12]. ...

On the Structure of RNA Branching Polytopes
  • Citing Article
  • August 2017

SIAM Journal on Applied Algebra and Geometry

... 1 several implementations of NNTM including RNAstructure [5], the ViennaRNA Package [6], UNAFold [7], and GTfold [8]. Although very useful, the physics-rooted NNTM approach is prone to accuracy errors in its predictions [9,10]. Accuracy can be improved by incorporating additional information, such as highthroughput chemical probing data that further constrain the thermodynamic model [11,12,13]. ...

Conditioning and Robustness of RNA Boltzmann Sampling under Thermodynamic Parameter Perturbations
  • Citing Article
  • June 2017

Biophysical Journal

... 1 Introduction Over past decades, analysis of transcriptome or gene expression data has been an essential component to understand the processes involved in human development and disease [1,2,3,4], but the complex nature of tissue samples under investigation remains as a major obstacle [5,6,7]. A bulk tissue sample could include many cell types, and its heterogeneous characteristics make the interpretation of gene expression (such as RNA-Seq) complicated [8,9,10]: for every gene, its measured gene expression profiles (GEPs) in a compound sample are actually tissue-averaged, i.e., the sum of expression of all cells in the sample. ...

Geometric combinatorics and computational molecular biology: Branching polytopes for RNA sequences
  • Citing Chapter
  • January 2017

... Under the energy minimization hypothesis, an RNA sequence will fold into its configuration only when the loop region energies are minimized and their stacked pairs are maximized. Various algorithms and combinatorial models are developed to solve the RNA secondary structure design problem [96,97,174]. An example of RNA secondary structure is demonstrated in Figure 2 (C). ...

Combinatorial Insights into RNA Secondary Structure
  • Citing Chapter
  • October 2014

Natural Computing Series

... When introduced [4], it was established that RNAprofiling provides complementary information to both Sfold and RNAshapes. Moreover, a thorough analysis [9] compared the three, where Pv1 analyzed Boltzmann samples generated by GTfold [10]. It was found that all three improved over the MFE, but there was no clear advantage among cluster analysis methods in terms of base pair prediction accuracy. ...

New insights from cluster analysis methods for RNA secondary structure prediction
  • Citing Article
  • March 2016

WIREs RNA