A key role in arthropod phylogeny plays a group of organisms that was already in the focus of taxonomic research of Charles Darwin in the mid of the 19th century, namely the Crustacea. This extremely divers group comprises small species like the Mystacocarida (Derocheilocaris typicus) with only 0.3 mm body size or such big representatives like the Japanese giant crab (Macrocheira kaempferi) with a span width of almost 4 m. Generally accepted are six major crustacean taxa, the Malacostraca (Latreille, 1802), Branchiopoda (Latreille, 1817), Remipedia (Yager, 1981), Cephalocarida (Sanders, 1955), Maxillopoda (Dahl, 1956) and Ostracoda (Latreille 1802). The validity of the taxon Maxillopoda is to date still disputed. The monophyly of some crustacean groups like the Malacostraca and Branchiopoda is generally accepted, but for several other groups unclear. This thesis aims to resolve internal relationships of the major crustacean groups inferring phylogenies with molecular data. The crustaceans are in addition of eminent interest to enlight the question how land was successfully conquered by arthropod taxa. New molecular and neuroanatomical data support the scenario that the Hexapoda might have evolved from Crustacea. The thesis further seeks to address the possible close relationship of Crustacea and Hexapoda. That issue is closely linked to the partly still debated position of crustaceans within arthropods and the supposable sister-group of the Crustacea.
Most molecular studies of crustaceans relied on single gene or multigene analyses in which for most cases partly sequenced rRNA genes were used. However, intensive data quality and alignment assessments prior to phylogenetic reconstructions are not conducted in most studies. Additionally, a complex modeling and the implementation of compositional base heterogeneity along lineages are missing. One methodological aim in this thesis was to implement new tools to infer data quality, to improve alignment quality and to test the impact of complex modeling of the data. Two of the three phylogenetic analyses in this thesis are also based on rRNA genes.
In analysis (A) 16S rRNA, 18S rRNA and COI sequences were analyzed. RY coding of the COI fragment, an alignment procedure that considers the secondary structure of RNA molecules and the exclusion of alignment positions of ambiguous positional homology was performed to improve data quality. Anyhow, by extensive network reconstructions it was shown that the signal quality in the chosen and commonly used markers is not suitable to infer crustacean phylogeny, despite the extensive data processing and optimization. This result draws a new light on previous studies relying on these markers.
In analyses (B) completely sequenced 18S and 28S rRNA genes were used to reconstruct the phylogeny. Base compositional heterogeneity was taken into account based on the finding of analysis (A), additionally to secondary structure alignment optimization and alignment assessment. The complex modeling to compare time-heterogeneous versus time-homogenous processes in combination with mixed models for an implementation of secondary structures was only possible applying the Bayesian software package PHASE. The results clearly demonstrated that complex modeling counts and that ignoring time-heterogeneous processes can mislead phylogenetic reconstructions. Some results enlight the phylogeny of Crustaceans, for the first time the Cephalocarida (Hutchinsoniella macracantha) were placed in a clade with the Branchiopoda, which morphologically is plausible. Unfortunately, the internal relationships of most crustacean groups were still poorly supported. Compared to the time-homogeneous tree the time-heterogeneous tree gives lower support values for some nodes. It can be suggested, that the incorporation of base compositional heterogeneity in phylogenetic analysis improves the reliability of the topology. The Pancrustacea are supported maximally in both approaches, but internal relations are not reliably reconstructed. One result of this analysis is that the phylogenetic signal in rRNA data might be eroded for crustaceans.
Recent publications presented analyses based on phylogenomic data, to reconstruct mainly metazoan phylogeny. Analyzing such a large number of sequences is possible with the “supertree” or “supermatrix” method. The supermatrix method seems to outperform the supertree approach. One main advantage is the possibility to apply modeling for each partition (each gene) separately. Within this thesis crustaceans were collected to conduct EST sequencing projects and to include the resulting sequences combined with public sequence data into a phylogenomic analysis (C). In this analysis the supermatrix approach was applied. New and innovative reduction heuristics were performed to condense the dataset. The strategy of the reduction heuristics relies on the potential relative information content of each gene of each taxon to use a more objective criterion to select taxa and genes. Again, the alignment evaluation and processing was a major aspect for the analysis design. The results showed that the matrix implementation of the reduced dataset ends in a more reliable topology in which most node values are highly supported. In analysis (C) the Branchiopoda were positioned as sister-group to Hexapoda, a differing result to analysis (A), but that is in line with other phylogenomic studies. Unfortunately, important crustacean taxa are still missing to conduct an extensive phylogenomic analysis. Some EST sequencing projects of the collected crustaceans for this thesis were delayed for technical reasons, e.g. the ESTs for Sarsinebalia urgorrii (non-derived malacostracan) and Speleonectes tulumensis (Remipedia) are still in progress. A preliminary result obtained with sequences isolated from remipede tissue is suggesting that remipedes and hexapods are closely related based on homologous hemocyanin subunits.
The conclusion of the analyses conducted in the framework of this thesis is that alignment evaluation and processing improves the resulting inference of the phylogeny. Assessing the quality of the signal or potential conflicts in the dataset is extremely important, also for further decisions on the selection of substitution models and final phylogenetic reconstructions. Complex models can improve the phylogeny reconstruction additionally. This was explicitly demonstrated in analysis B. The supermatrix approach relying on a more objective criterion to select genes and taxa compared to cut-off values is very promising for future studies. However, for the Crustacea it was also demonstrated that this group is problematic regarding the phylogenetic signal of the analyzed single gene data. The hope is, that phylogenomic data with similar complex models as applied in analysis B, in combination with a denser taxon sampling can improve our knowledge about crustacean phylogeny in future studies. This thesis presents essential new methodological but also phylogenetic findings for this challenging task.