PosterPDF Available

AlignmentComparator - Comparing and annotating alternative alignments of the same data set

Authors:

Abstract

http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/Publications/ConferenceContribution?id=100873
Ben C. Stöver1, Kai F. Müller1
1) Evolution and Biodiversity of Plants Group, Institute for Evolution and Biodiversity, WWU Münster, Hüfferstr. 1, 48149 Münster, Germany
AlignmentComparator Comparing and annotating
alternative alignments of the same data set
Citations: Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res 32, 17921797. Morrison, D.A. (2008). A framework for phylogenetic sequence alignment. Plant Syst Evol 282, 127149.
Prlić, A., Yates, A., Bliven, S.E., Rose, P.W., Jacobsen, J., Troshin, P.V., Chapman, M., Gao, J., Koh, C.H., Foisy, S., et al. (2012). BioJava: an open-source framework for bioinformatics in 2012. Bioinformatics 28, 26932695. Stöver BC, Müller KF:
TreeGraph 2: Combining and visualizing evidence from different phylogenetic analyses. BMC Bioinformatics 2010, 11:7
Future development
AlignmentComparator is still under active development. The
following additional features are currently planned:
Add support to compare more than two alternative MSAs at
the same time, allowing the user to dynamically choose which
MSAs shall be included in the current comparison view.
Output statistical information about the differences between alternative alignments (e.g. ratio of
different columns between two MSAs).
Automatic annotation of simple types of differences, e.g. super gaps that simply originated from
a different positioning of tandem repeat periods in the alternative alignments (see fig. 4).
http://www2.ieb.uni-muenster.de/EvolBiodivPlants
Poster download: http://www2.ieb.uni-muenster.de/EvolBiodivPlants/en/Publications/ConferenceContribution?id=100873
Availability
AlignmentComparator is distributed under GNU General Public License Version
3 . at the BioInfWeb Software portal:
http://bioinfweb.info/AlignmentComparator
Scope of AlignmentComparator
With a growing number of alternative algorithms for automated multiple sequence alignment (MSA)
and different strategies for manual alignment corrections (see e.g. Morrison 2008) it becomes more
and more relevant to visualize the differences between alternative MSAs of the same data set.
This allows the researcher to decide which alternative alignments to take into account as the
bases of e.g. a phylogenetic study or numerous other tasks in biological research.
Manual alignment corrections (made by a single person or a team of researchers) can be
visualized and tracked.
Bioinformaticians can determine the effect of modifications they made on a MSA algorithm.
AlignmentComparator also has an application in teaching and was already used by students at
the IEB to investigate differences between automated MSA algorithms.
How it works
1) Two input alignments of the same data set that shall be compared are loaded.
AlignmentComparator is platform independent and runs on every operating system with a Java
Virtual machine (e.g. Windows, Linux or Mac).
A full undo history is provided and every user edit (see step 5 in fig. 1) can be undone.
It is based on LibrAlign. (See other poster and http://bioinfweb.info/LibrAlign.)
Seq1
Seq2
Seq3
Seq4
Seq5
Seq6
Seq7
Seq8
Seq1
Seq2
Seq3
Seq4
Seq5
Seq6
Seq7
Seq8
gaps initially contained in the alignments
Fig. 2 Initial dialog of
AlignmentComparator al-
lowing to select the input
alignment files and the
compare algorithm
2) New alignment files with unique names are
generated as the source for the next step.
A_Seq1
A_Seq2
A_Seq3
A_Seq4
A_Seq5
A_Seq6
A_Seq7
A_Seq8
B_Seq1
B_Seq2
B_Seq3
B_Seq4
B_Seq5
B_Seq6
B_Seq7
B_Seq8
new unique
names
3) A profile-profile-alignment is
performed using the specified
algorithm, e.g. MUSCLE (Edgar
et al. 2004). Fig. 3 Dialog showing the
output of a MUSCLE
profile-profile-alignment
used as the basis of step
4 (in fig. 1)
4) The output of the profile-profile-
alignment is processed by Align-
mentComparator and supergaps
are identified and displayed.
5a) The user can browse through the comparison
and add comments and annotations to the
displayed differences.
5b) Manual changes of the profile alignment can
be made.
supergaps (gaps which originated from the comparison)
Use in ongoing research
Our group currently uses AlignmentComporator in a study investigating the influence of different
MSA algorithms compared to manual alignment improvements on numerous phylogenetic meth-
ods for tree inference, dating or phylogenetic network reconstruction.
In addition, an analysis software has been developed that performs the following tasks:
Realign a manual input alignment using several MSA algorithms
Perform the different phylogenetic methods on all these alignments
Comparing the resulting alignments using AlignmentComparator and the resulting trees using
TreeGraph 2 (Stöver et al. 2010).
Seq1
Seq2
Seq3
Seq4
Seq5
Seq6
Seq7
Seq8
Seq1
Seq2
Seq3
Seq4
Seq5
Seq6
Seq7
Seq8
...
Fig. 6 Future versions of
AlignmentComparator shall be able
to compare more than two
alternative MSAs at the same time
Fig. 1 Schematic example illustrating how AlignmentComparator calculates and visualizes the differences between two
alternative multiple sequence alignments and how the user can edit and comment the result.
Fig. 5 Screenshot of the analysis software described above.
Fig. 4 An example of a comparison of two alternative multi-
ple sequence alignments of the same data set using Align-
mentComparator. The light gray spaces are supergaps that
result from the alignment comparison. At the bottom one
can see example comments that can be added by the user.
6) The results can be
saved to a XML file.
Acknowledgements
BCS wants to thank the NSF (National Science Foundation) and IPAM for partly financing the presentation
of this poster at the MSA Workshop at IPAM, UCLA 2015. Furthermore the authors are very thankful to the
developers of the other open source projects AlignmentComparator is build on (BioJava, Apache commons).
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.