Joanne L. Birch’s research while affiliated with University of Melbourne and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (44)


nph20263-sup-0001-supinfo.pdf
  • Data
  • File available

January 2025

·

30 Reads

·

·

·

[...]

·

Download

A nuclear phylogenomic tree of grasses (Poaceae) recovers current classification despite gene tree incongruence

January 2025

·

532 Reads

·

1 Citation

Grasses (Poaceae) comprise c . 11 800 species and are central to human livelihoods and terrestrial ecosystems. Knowing their relationships and evolutionary history is key to comparative research and crop breeding. Advances in genome‐scale sequencing allow for increased breadth and depth of phylogenomic analyses, making it possible to infer a new reference species tree of the family. We inferred a comprehensive species tree of grasses by combining new and published sequences for 331 nuclear genes from genome, transcriptome, target enrichment and shotgun data. Our 1153‐tip tree covers 79% of grass genera (including 21 genera sequenced for the first time) and all but two small tribes. We compared it to a newly inferred 910‐tip plastome tree. We recovered most of the tribes and subfamilies previously established, despite pervasive incongruence among nuclear gene trees. The early diversification of the PACMAD clade could represent a hard polytomy. Gene tree–species tree reconciliation suggests that reticulation events occurred repeatedly. Nuclear–plastome incongruence is rare, with very few cases of supported conflict. We provide a robust framework for the grass tree of life to support research on grass evolution, including modes of reticulation, and genetic diversity for sustainable agriculture.


Figure 1: The Hespi pipeline. Specimen sheet available at https://online.herbarium.unimelb.edu.au/ collectionobject/MELUA118997a.
Figure 6: An example screenshot of a Hespi HTML report.
Hespi: A pipeline for automatically detecting information from hebarium specimen sheets

October 2024

·

46 Reads

Specimen associated biodiversity data are sought after for biological, environmental, climate, and conservation sciences. A rate shift is required for the extraction of data from specimen images to eliminate the bottleneck that the reliance on human-mediated transcription of these data represents. We applied advanced computer vision techniques to develop the `Hespi' (HErbarium Specimen sheet PIpeline), which extracts a pre-catalogue subset of collection data on the institutional labels on herbarium specimens from their digital images. The pipeline integrates two object detection models; the first detects bounding boxes around text-based labels and the second detects bounding boxes around text-based data fields on the primary institutional label. The pipeline classifies text-based institutional labels as printed, typed, handwritten, or a combination and applies Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) for data extraction. The recognized text is then corrected against authoritative databases of taxon names. The extracted text is also corrected with the aide of a multimodal Large Language Model (LLM). Hespi accurately detects and extracts text for test datasets including specimen sheet images from international herbaria. The components of the pipeline are modular and users can train their own models with their own data and use them in place of the models provided.


Figure 1: Phylogeny of 1,153 Poaceae accessions inferred from 331 nuclear genes, including paralogs, using a multi-species coalescent approach. Branch colours reflect local posterior support for the quartet configuration displayed. Hollow circles indicate supported conflict among nuclear gene trees at 48 internal branches, where two alternative quartet configurations each have >1/3 local posterior support. Subfamilies and larger tribes (abbreviated) are labelled according to the most recent Poaceae classification (Soreng et al., 2022). The coloured lines link taxonomic outliers at tribe to subfamily level to their nominal taxa. Silhouettes show representatives for large subfamilies (from top): Maize or corn, Zea mays (Panicoideae); Dactyloctenium radulans (Chloridoideae); oat, Avena sativa (Pooideae); Bambusa textilis (Bambudoideae); rice, Oryza sativa (Oryzoideae). See Fig. S5 for a detailed version of the tree.
Figure 3: Comparison of nuclear and plastome topologies for the Poaceae. The 1,153-tip nuclear tree is shown on the left, the 910-tip plastome tree on the right. Plastome support (transfer bootstrap expectation, TBE) was summarised for branches present in both trees (814 shared species). Grey branches in the nuclear tree had no equivalent for comparison in the plastome tree. Hollow circles indicate strong signals of conflict, i.e. high support in the nuclear tree (local posterior probability > 0.8) but poor support (TBE < 0.3) in the plastome tree. Tribes are matched between the two in both trees, and larger tribes are labelled for orientation. The inset plots plastome support against a measure of conflict between nuclear gene trees (local posterior support for the second-most supported quartet per branch), which are negatively correlated. The blue line is a simple linear trend line.
Taxonomic discrepancies in the nuclear tree at subfamily to tribe level. Taxa listed here will need follow-up studies to validate their placement. An asterisk (*) denotes genera whose type species was sampled.
Nuclear phylogenomics of grasses (Poaceae) supports current classification and reveals repeated reticulation

May 2024

·

1,041 Reads

·

1 Citation

Grasses (Poaceae) comprise around 11,800 species and are central for human livelihoods and terrestrial ecosystems. Knowing their relationships and evolutionary history is key to comparative research and crop breeding. Advances in genome-scale sequencing allow for increased breadth and depth of phylogenomic analyses, making it possible to infer a new reference species tree of the family. We inferred a comprehensive species tree of grasses by combining new and published sequences for 331 nuclear genes from genome, transcriptome, target enrichment and shotgun data. Our 1,153-tip tree covers 79% of grass genera (including 21 genera sequenced for the first time) and all but two small tribes. We compared it to a 910-tip plastome tree. The nuclear phylogeny matches that of the plastome at most deep branches, with only a few instances of incongruence. Gene tree–species tree reconciliation suggests that reticulation events occurred repeatedly in the history of grasses. We provide a robust framework for the grass tree of life to support research on grass evolution, including modes of reticulation, and genetic diversity for sustainable agriculture.


Phylogenomics and the rise of the angiosperms

April 2024

·

4,041 Reads

·

76 Citations

Nature

Angiosperms are the cornerstone of most terrestrial ecosystems and human livelihoods1,2. A robust understanding of angiosperm evolution is required to explain their rise to ecological dominance. So far, the angiosperm tree of life has been determined primarily by means of analyses of the plastid genome3,4. Many studies have drawn on this foundational work, such as classification and first insights into angiosperm diversification since their Mesozoic origins5–7. However, the limited and biased sampling of both taxa and genomes undermines confidence in the tree and its implications. Here, we build the tree of life for almost 8,000 (about 60%) angiosperm genera using a standardized set of 353 nuclear genes⁸. This 15-fold increase in genus-level sampling relative to comparable nuclear studies⁹ provides a critical test of earlier results and brings notable change to key groups, especially in rosids, while substantiating many previously predicted relationships. Scaling this tree to time using 200 fossils, we discovered that early angiosperm evolution was characterized by high gene tree conflict and explosive diversification, giving rise to more than 80% of extant angiosperm orders. Steady diversification ensued through the remaining Mesozoic Era until rates resurged in the Cenozoic Era, concurrent with decreasing global temperatures and tightly linked with gene tree conflict. Taken together, our extensive sampling combined with advanced phylogenomic methods shows the deep history and full complexity in the evolution of a megadiverse clade.


Figure and Figure legend 325
OrthoFlow: phylogenomic analysis and diagnostics with one command

December 2023

·

413 Reads

Species trees, which depict the evolutionary relationships among organisms, underlie many evolutionary studies. Phylogenomics, the use of genome-scale datasets for phylogenetic inference, is the current gold standard for species tree inference. The development, maintenance, and execution of phylogenomic workflows is challenging, requiring programming, data management skills, and familiarity with changing best practices. We introduce OrthoFlow, a software wherein a single command automatically conducts end-to-end phylogenomic analysis—orthology inference and identification of phylogenomic markers, quality control, data matrix construction, diagnostics, and tree inference using supermatrix and supertree methods from multiple input data formats. To demonstrate the utility of OrthoFlow, we successfully recapitulate the evolutionary relationships among 24 yeast species. OrthoFlow increases the accessibility of researchers to conduct rigorous phylogenomic analysis flexibly. OrthoFlow is freely available from PyPI (https://pypi.org/project/orthoflow/), Bioconda (https://anaconda.org/bioconda/orthoflow) and GitHub (https://github.com/rbturnbull/orthoflow) under the Apache License 2.0.


Orthoflow: phylogenomic analysis and diagnostics with one command

December 2023

·

280 Reads

·

1 Citation

Species trees, which depict the evolutionary relationships among organisms, underlie many evolutionary studies. Phylogenomics, the use of genome-scale datasets for phylogenetic inference, is the current gold standard for species tree inference. The development, maintenance, and execution of phylogenomic workflows is challenging, requiring programming, data management skills, and familiarity with changing best practices. We introduce Orthoflow, a software wherein a single command automatically conducts end-to-end phylogenomic analysis—orthology inference and identification of phylogenomic markers, quality control, data matrix construction, diagnostics, and tree inference using supermatrix and supertree methods from multiple input data formats. To demonstrate the utility of Orthoflow, we successfully recapitulate the evolutionary relationships among 24 yeast species. Orthoflow increases the accessibility of researchers to conduct rigorous phylogenomic analysis flexibly. Orthoflow is freely available from PyPI (https://pypi.org/project/orthoflow/), Bioconda (https://anaconda.org/bioconda/orthoflow) and GitHub (https://github.com/rbturnbull/orthoflow) under the Apache License 2.0.


Genomic data resolve phylogenetic relationships of Australian mat-rushes, Lomandra (Asparagaceae: Lomandroideae)

September 2023

·

147 Reads

·

4 Citations

Botanical Journal of the Linnean Society

Lomandra is the largest genus in Asparagaceae subfamily Lomandroideae and possesses economic, ecological, and ethnobotanical significance in Australia. Lomandra comprises four sections, L. section Capitatae, L. section Macrostachya, L. section Typhopsis and L. section Lomandra, the latter comprising series Lomandra and series Sparsiflorae, all recognized based solely on morphology. In this study, phylogenetic relationships were estimated for 79 Lomandroideae individuals, including 45 Lomandra species and subspecies (c. 63% of species and subspecies diversity). We generated genome-scale plastome sequence data and used maximum likelihood and Bayesian inference criteria for phylogenetic estimation. Lomandra was non-monophyletic, with Xerolirion divaricata nested within it. Two major clades were recovered: Capitatae–Macrostachya (CM) and Lomandra–Typhopsis (LT). The CM clade included a monophyletic Lomandra section Capitatae with a base chromosome number x = 7, and L. section Macrostachya (x = 8); the LT clade included L. sections Typhopsis and Lomandra, both x = 8. Section Lomandra series Lomandra and series Sparsiflorae were both recovered as non-monophyletic. Morphological characters were assessed to identify combinations of characters that characterize clades. A base chromosome number of x = 8 was plesiomorphic for Lomandra. The largest number of Lomandra species occupy the Mediterranean ecoregion and occupancy of sclerophyll vegetation was reconstructed as ancestral for the genus.


Mapping the Digitisation Workflow in a University Herbarium

August 2023

·

198 Reads

·

3 Citations

Specimens or objects in natural history collections hold substantial research and cultural value that is enhanced where these items are made digitally available. Benefits of digitisation include increasing open access to collection-based biodiversity data, increasing productivity of scientific research, enabling novel research applications of digitally accessible data, reducing preservation requirements through reduced object handling, and expanding potential for “remote curation” in collections. However, the time available for object and data digitisation is limited for most collections. Well documented digitisation workflows can ensure that curation time is efficiently applied to achieve digitisation outputs, and that digitisation standards are consistently applied within and among projects. While this case study focused on the generation of digitisation workflows in a medium-sized Australian university-based herbarium, the findings of this study are relevant to collections globally. The curation workflows comprise a set of modular steps required for the digitisation of herbarium specimen data and images. Steps are clearly identified as requiring human-mediation versus those that can be automated, those that require on-site versus remote-access, and those that require transfer or transformation of data or files. This clarity enables consideration of the opportunities and challenges for increasing efficiencies for collection-based digitisation, data and file management. The maps provide a contextual framework for herbarium-based digitisation pathways for those who work with specimen-derived biodiversity data, and an insight into these tools for those who are not familiar with herbarium protocols.


Identification of herbarium specimen sheet components from high‐resolution images using deep learning

August 2023

·

156 Reads

·

5 Citations

Advanced computer vision techniques hold the potential to mobilise vast quantities of biodiversity data by facilitating the rapid extraction of text‐ and trait‐based data from herbarium specimen digital images, and to increase the efficiency and accuracy of downstream data capture during digitisation. This investigation developed an object detection model using YOLOv5 and digitised collection images from the University of Melbourne Herbarium (MELU). The MELU‐trained ‘sheet‐component’ model—trained on 3371 annotated images, validated on 1000 annotated images, run using ‘large’ model type, at 640 pixels, for 200 epochs—successfully identified most of the 11 component types of the digital specimen images, with an overall model precision measure of 0.983, recall of 0.969 and moving average precision (mAP0.5–0.95) of 0.847. Specifically, ‘institutional’ and ‘annotation’ labels were predicted with mAP0.5–0.95 of 0.970 and 0.878 respectively. It was found that annotating at least 2000 images was required to train an adequate model, likely due to the heterogeneity of specimen sheets. The full model was then applied to selected specimens from nine global herbaria ( Biodiversity Data Journal , 7, 2019), quantifying its generalisability: for example, the ‘institutional label’ was identified with mAP0.5–0.95 of between 0.68 and 0.89 across the various herbaria. Further detailed study demonstrated that starting with the MELU‐model weights and retraining for as few as 50 epochs on 30 additional annotated images was sufficient to enable the prediction of a previously unseen component. As many herbaria are resource‐constrained, the MELU‐trained ‘sheet‐component’ model weights are made available and application encouraged.


Citations (15)


... Indeed, magnoliids can be traced in the fossil record to at least the Barremian (ca. 121-129 Ma;Massoni et al., 2015), and fossilcalibrated molecular dating studies have estimated the stem age of the clade to fall between 133 and 242 Ma (Magallón et al., 2015;Ramírez-Barahona et al., 2020;Zuntini et al., 2024). Species of magnoliids are found across a broad range of habitats and climates, but occur predominantly in tropical and warm temperate rain forests. ...

Reference:

Toward a phylogenomic classification of magnoliids
Phylogenomics and the rise of the angiosperms

Nature

... (Asparagaceae: Lomandroideae) includes four sections and two series, according to Lee and Macfarlane (1986), and has been studied intensively, especially in recent years (Wang 2023a(Wang , 2023b(Wang , 2023cGunn et al. 2024;Wang 2024;Wang & Gray 2024). To date, 67 species and ten nonautonymic subspecies are recognised (IPNI 2024;POWO 2024). ...

Genomic data resolve phylogenetic relationships of Australian mat-rushes, Lomandra (Asparagaceae: Lomandroideae)

Botanical Journal of the Linnean Society

... Historically, specimen-associated data were documented on paper , Walton et al., 2020; recorded in field notebooks, transcribed into printed catalogs, and primary and secondary data were written on labels that were attached to the specimens. Mobilization of these specimen data is typically achieved by processing specimens through a digitization workflow, involving the production of a digital specimen image followed by the extraction of text data from that digital image either manually (i.e., via a human intermediary) or semi-automatically [Thompson and Birch, 2023, de la Hidalga et al., 2020, Kirchhoff et al., 2018, Nelson et al., 2015. Their digitization, conforming to biological data standards [e.g., ABCD (Access to Biological Collections Data) [Holetschek et al., 2012] and DarwinCore [Wieczorek et al., 2012]], is essential for maintaining their accuracy and ensuring their availability for reuse . ...

Mapping the Digitisation Workflow in a University Herbarium

... Based on 1000 randomizations, we show that despite only having one-third as many records, herbarium specimens accumulate more taxonomic (a), phylogenetic (b), and functional (c) diversity than iNaturalist observations. greatly reduce both the cost and time it takes to digitize, while also increasing data standards 44,46,[48][49][50] . For example, the Smithsonian's US National Herbarium, which houses roughly 3.8M specimens, was recently completely digitized and the use of high-throughput workflows reduced the cost of digitization from $3.32 down to $1.85 per specimen and allowed for the digitization of 3000-4000 specimens daily 35 . ...

Identification of herbarium specimen sheet components from high‐resolution images using deep learning

... Xanthodermatei and A. sect. Hondenses exhibit toxicity, such as A. xanthodermus Genev., which can induce gastrointestinal symptoms [49][50][51]. Despite a lack of research, A. daqinggouensis is unsuitable for consumption due to potential toxicity. ...

A field-based investigation of simple phenol variation in Australian Agaricus xanthodermus

... We used the script BYO_transcriptome.py to search for Rhododendron versions of the Angiosperms353 genes. The "Mega353" gene set, an expanded Angiosperms353 set with many additional taxa representing each sequence (McLay et al. 2021), was used as the reference. The three Rhododendron CDS files (see above, MarkerMiner method) were used as the input transcriptomes. ...

New targets acquired: Improving locus recovery from the Angiosperms353 probe set

... The order comprises ca. 36,265 species (Birch and Kocyan 2021). Asparagaceae is divided into seven subfamilies, viz. ...

Biogeography of the monocotyledon astelioid clade (Asparagales): A history of long-distance dispersal and diversification with emerging habitats

Molecular Phylogenetics and Evolution

... (chocolate lily). C. australasicum self-pollinates when its flowers are tripped (i.e. the lip pushed open), and can be pollinated by honey bees or native bees (Wang et al., 2010), whereas A. strictum is buzz pollinated and therefore not pollinated by honey bees (Gunn et al., 2020). Many Australian native bee species buzz pollinate (Smith & Saunders, 2019), and therefore, the use of these two species could provide some insight into the pollination services provided by native and non-native bees in the landscape. ...

Evolution of Lomandroideae: Multiple origins of polyploidy and biome occupancy in Australia
  • Citing Article
  • April 2020

Molecular Phylogenetics and Evolution

... Taxonomic species are usually described based on morphological characteristics that can easily be altered by local adaptation, phenotypic plasticity, or neutral morphological polymorphism, which may cause a single variable species to be classified as many species (e.g., Gemeinholzer & Bachmann, 2005). On the other hand, very recent divergence and little differentiation might contribute to the inability of barcoding to separate species in some cases (Birch et al., 2017). ...

Testing efficacy of distance and tree-based methods for DNA barcoding of grasses (Poaceae tribe Poeae) in Australia

... Asteliaceae contains two endemic Australian genera: Neoastelia, a monotypic genus of temperate rainforests in central eastern Australia and Milligania (5 species), a Tasmanian endemic with taxa occupying lowland, alluvial to alpine herbfield vegetation. Conversely, Astelia (30 species; Birch, 2015), the largest genus in the family, has a center of diversity in New Zealand, with secondary centers of diversity in Australia and Hawai'i, and two exceptional occurrences in Africa (La Réunion, Mauritius) and in South America (Patagonia). Blandfordiaceae and Boryaceae are Australian endemics containing a single genus Blandfordia (4 species) and Borya (13 species) and the monotypic genus Alania, respectively. ...

A revision of infrageneric classification in Astelia Banks & Sol. ex R.Br. (Asteliaceae)