Ggplot2: Elegant Graphics for Data Analysis
Chapters (9)
In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy to produce complex plots, often requiring several lines of code using other plotting systems, in one line.
qplot() can do this because it’s based on the grammar of graphics, which allows you to create a simple, yet expressive, description
of the plot. In later chapters you’ll learn to use all of the expressive power of the grammar, but here we’ll start simple
so you can work your way up. You will also start to learn some of the ggplot2 terminology that will be used throughout the
book.
You can choose to use just qplot(), without any understanding of the underlying grammar, but if you do you will never be able to unlock the full power of ggplot2. By learning more about the grammar and its components, you will be able to create a wider range of plots, as well as being
able to combine multiple sources of data, and customise to your heart’s content. You may want to skip this chapter in a first
reading of the book, returning when you want a deeper understanding of how all the pieces fit together.
Layering is the mechanism by which additional data elements are added to a plot. Each layer can come from a different dataset and have a different aesthetic mapping, allowing us to create plots that could not be generated using qplot(), which permits only a single dataset and a single set of aesthetic mappings.
The layered structure of ggplot2 encourages you to design and construct graphics in a structured manner. You have learned what a layer is and how to add one to your graphic, but not what geoms and statistics are available to help you build revealing plots. This chapter lists some of the many geoms and stats included in ggplot2, broken down by their purpose. This chapter will provide a good overview of the available options, but it does not describe each geom and stat in detail. For more information about individual geoms, along with many more examples illustrating their use, see the online and electronic documentation. You may also want to consult the documentation to learn more about the datasets used in this chapter.
Scales control the mapping from data to aesthetics. They take your data and turn it into something that you can perceive visually:
e.g., size, colour, position or shape. Scales also provide the tools you use to read the plot: the axes and legends (collectively
known as guides). Formally, each scale is a function from a region in data space (the domain of the scale) to a region in
aesthetic space (the range of the range). The domain of each scale corresponds to the range of the variable supplied to the
scale, and can be continuous or discrete, ordered or unordered. The range consists of the concrete aesthetics that you can
perceive and that R can understand: position, colour, shape, size and line type. If you blinked when you read that scales
map data both to position and colour, you are not alone. The notion that the same kind of object is used to map data to positions
and symbols strikes some people as unintuitive. However, you will see the logic and power of this notion as you read further
in the chapter.
This chapter discusses position, particularly how facets are laid out on a page, and how coordinate systems within a panel work. There are four components that control position.
In this chapter you will learn how to prepare polished plots for publication. Most of this chapter focusses on the theming
capability of ggplot2 which allows you to control many non-data aspects of plot appearance, but you will also learn how to adjust geom, stat and
scale defaults, and the best way to save plots for inclusion into other software packages. Together with the next chapter,
manipulating plot rendering with grid, you will learn how to control every visual aspect of the plot to get exactly the appearance
that you want.
So far this book has assumed you have your data in a nicely structured data frame ready to feed to ggplot() or qplot(). If this is not the case, then you’ll need to do some transformation.
A major requirement of a good data analysis is flexibility. If the data changes, or you discover something that makes you
rethink your basic assumptions, you need to be able to easily change many plots at once. The main inhibitor of flexibility
is duplication. If you have the same plotting statement repeated over and over again, you have to make the same change in
many different places. Often just the thought of making all those changes is exhausting!
... PCA was conducted using the prcomp() function, and the results were summarized to assess the proportion of variance explained by each component. Visualization of PCA results was achieved through biplots and scree plots, using the ggbiplot and ggplot2 packages to facilitate interpretation of the principal components' contributions (Jolliffe, 2002;Wickham, 2016) [7,15] . ...
... PCA was conducted using the prcomp() function, and the results were summarized to assess the proportion of variance explained by each component. Visualization of PCA results was achieved through biplots and scree plots, using the ggbiplot and ggplot2 packages to facilitate interpretation of the principal components' contributions (Jolliffe, 2002;Wickham, 2016) [7,15] . ...
Sorghum (Sorghum bicolor (L.) Moench) is a crucial global crop cultivated across diverse climatic conditions. This study investigates the correlation and principal component analysis (PCA) of morpho-physiological and biochemical traits in sorghum genotypes under both irrigated and rainfed conditions. Conducted at the University of Agricultural Sciences, Dharwad, the experiment involved twenty rabi sorghum genotypes planted in medium black soil, with two moisture levels: rainfed and irrigated. Correlation analysis revealed significant associations between morpho-physiological parameters and grain yield. Under drought stress, leaf chlorophyll content (SPAD) and relative water content (RWC) exhibited strong positive correlations with yield (r = 0.684 and 0.789, respectively), indicating their critical role in enhancing yield potential under water scarcity. Plant height, leaf area, and dry weight also showed positive correlations with yield, emphasizing their importance in productivity under stressed conditions. Conversely, biochemical parameters such as proline and wax content demonstrated variable correlations with yield depending on moisture availability; proline and wax content had negative correlations under irrigated conditions but positive correlations under stress, suggesting their role in drought tolerance. PCA identified the first three principal components (PC1, PC2, and PC3) accounting for significant variance in trait data. PC1, explaining 47.6% of the variance under irrigated and 56.5% under rainfed conditions, was positively associated with traits like plant height, chlorophyll content, and yield parameters, reflecting genotypic responses to productivity and drought. PC2 was linked to chlorophyll content and proline, highlighting its relevance to drought tolerance mechanisms. Overall, the study underscores the complex interplay between physiological and biochemical traits in determining sorghum's adaptability to different moisture conditions, providing insights for breeding programs aimed at improving drought resilience and yield stability.
... The four beta diversity indexes were used to ordinate the samples with the PCoA and to identify variables correlations with the PER-MANOVA based on 10,000 permutations. The PCoA and histograms were drawn with the function ggplot() [138]. PCoA ellipses were drawn based on the centroid at a radius of 0.8. ...
... The stats, built-in R base package, was used to perform Spearman correlation analysis using the functions cor() with BH correction and cor.test(). Then, the correlogram heatmap were drawn with the function ggplot() [138]. ...
Background
In holobiont, microbiota is known to play a central role on the health and immunity of its host. Then, understanding the microbiota, its dynamic according to the environmental conditions and its link to the immunity would help to react to potential dysbiosis of aquacultured species. While the gut microbiota is highly studied, in marine invertebrates the hemolymph microbiota is often set aside even if it remains an important actor of the hemolymph homeostasis. Indeed, the hemolymph harbors the factors involved in the animal homeostasis that interacts with the microbiota, the immunity. In the Southwest Pacific, the high economical valued shrimp Penaeus stylirostris is reared in two contrasted sites, in New Caledonia (NC) and in French Polynesia (FP).
Results
We characterized the active microbiota inhabiting the hemolymph of shrimps while considering its stability during two seasons and at a one-month interval and evidenced an important microbial variability between the shrimps according to the rearing conditions and the sites. We highlighted specific biomarkers along with a common core microbiota composed of 6 ASVs. Putative microbial functions were mostly associated with bacterial competition, infections and metabolism in NC, while they were highly associated with the cell metabolism in FP suggesting a rearing site discrimination. Differential relative expression of immune effectors measured in the hemolymph of two shrimp populations from NC and FP, exhibited higher level of expression in NC compared to FP. In addition, differential relative expression of immune effectors was correlated to bacterial biomarkers based on their geographical location.
Conclusions
Our data suggest that, in Pacific shrimps, both the microbiota and the expression of the immune effectors could have undergone differential immunostimulation according to the rearing site as well as a geographical adaptative divergence of the shrimps as an holobiont, to their rearing sites. Further, the identification of proxies such as the core microbiota and site biomarkers, could be used to guide future actions to monitor the bacterial microbiota and thus preserve the productions.
... Normalized data were filtered to remove ASVs with abundance lower than 0.1%. Bray-Curtis community dissimilarities were also calculated with the Phyloseq package and visualized with non-metric multidimensional scaling (NMDS) using Ggplot2 (Wickham 2016). To determine whether community dissimilarity was significantly different between variables, the assumption of homogeneity was tested with the betadisper function in the Vegan package; if the assumption was met, the adonis function in the Vegan package was used to test for significance (Dixon 2003). ...
... To determine whether community dissimilarity was significantly different between variables, the assumption of homogeneity was tested with the betadisper function in the Vegan package; if the assumption was met, the adonis function in the Vegan package was used to test for significance (Dixon 2003). Heatmaps and bar plots were made with the ggplot2 package (Wickham 2016). DNA sequences were submitted to the National Center for Biotechnology Information (NCBI) under BioProject PRJNA852240. ...
... Elevation data for each location were acquired using the elevatr package in R [67]. Data were visualized using the ggplot2 and ggpubr R packages [75,76]. ...
Human exposure to mycotoxins is common and often severe in underregulated maize-based food systems. This study explored how monitoring of these systems could help to identify when and where outbreaks occur and inform potential mitigation efforts. Within a maize smallholder system in Kongwa District, Tanzania, we performed two food surveys of mycotoxin contamination at local grain mills, documenting high levels of aflatoxins and fumonisins in maize destined for human consumption. A farmer questionnaire documented diverse pre-harvest and post-harvest practices among smallholder farmers. We modeled maize aflatoxins and fumonisins as a function of diverse indicators of mycotoxin risk based on survey data, high-resolution geospatial environmental data (normalized difference vegetation index and soil quality), and proximal near-infrared spectroscopy. Interestingly, mixed linear models revealed that all data types explained some portion of variance in aflatoxin and fumonisin concentrations. Including all covariates, 2015 models explained 27.6% and 20.6% of variation in aflatoxin and fumonisin, and 2019 models explained 39.4% and 40.0% of variation in aflatoxin and fumonisin. This study demonstrates the value of using low-cost risk factors to model mycotoxins and provides a framework for designing and implementing mycotoxin monitoring within smallholder settings.
... (Pedersen, 2023) to construct dependency networks and ggplot2 3.4.4 (Wickham, 2016) to visualize results. ...
Sentence-final particles (SFPs) are pervasive in spoken Cantonese for expressing speakers’ attitudes. This study explores the global and local features of SFPs in both spoken Cantonese and Mandarin Chinese using two part-of-speech-based (POS-based) dependency networks. Results show that (1) globally, spoken Cantonese and Mandarin Chinese networks exhibit centralization and scale-free properties, reflecting the communication efficiency of human languages. However, spoken Cantonese manifests weaker centralization properties, as demonstrated by the diversity of its edges. Moreover, SFPs in spoken Cantonese have a greater degree and in-degree than those in Mandarin Chinese, indicating a stronger ability to form syntactic connections with other POSs in the language structure. (2) locally, Cantonese SFPs display more extensive mood expression devices, notably differing in PART-NOUN (discourse:sp), PART-ADV and PART-ADJ dependencies compared to Mandarin Chinese. Additionally, specific examples illustrate how Cantonese SFP usage differs from Mandarin Chinese, showcasing their distinct discourse functions. The findings suggest that communication efficiency is a cross-lingual universal, while spoken Cantonese is distinctive in its use of diverse SFPs to express moods. This study may shed new light on adapting the complex network approach to explore the similarities and differences across human languages.
... As the experiment that certain factors influence the total amount of calories goes on [1], there are several correlations between calories and the in/dependent variables concludes protein, type, sugars and carbo [2]. The theme could be how the basic elements of cereal can make a difference to the total calories [3]. How protein, type, sugars and carbo can influence total calories [4]. ...
Cereals are a major global source, classified into whole and refined grains. Whole grain cereals provide a higher nutritional value due to the inclusion of the bran, germ, and endosperm, making them a rich source of dietary fiber, vitamins, and minerals. Conversely, refined grains undergo processing that eliminates the bran and germ components, resulting in a loss of valuable nutrients. In recent years, there has been a growing interest in exploring the health benefits of whole grain cereals. Research has established a connection between whole grains consumption and reduced risks of chronic diseases such as heart disease, diabetes, and certain cancers. Additionally, whole grains are recognized for their role in maintaining digestive health and promoting satiety which can aid in weight management. Therefore, this paper aims to investigate how various element present total calorie content using R-studio. The findings reveal that calories serves as the independent variable while protein content, type, sugars and carbohydrates act as independent variables influencing calorie levels accordingly. Cereal type plays a negative role by increasing calories whereas other factors contribute positively positively towards increased caloric intake levels
... [19,20]. We used the following R packages: dplyr [21], lubridate [22], stringr [23], tidyr [24], ggplot2 [25], and reshape2 [26]. ...
Background
Vaccination of farmed salmonids has been an integral part of preventing infectious diseases in Norway’s aquaculture industry. In Norway, vaccine usage is regulated by the government. There is a need to monitor vaccine usage for both regulatory and research purposes, at local and national scales. The Norwegian Veterinary Prescription Register (VetReg) is a national database that includes all prescriptions of medicines to animals dispensed by pharmacies and all medicines used for food producing animals by veterinarians. This study aimed to evaluate the quality of fish vaccination data reported to VetReg in 2016–2022. We considered the following attributes: completeness, validity, and timeliness. For external validation, we compared the data in VetReg to wholesaler statistics.
Results
Pharmacies reported fish vaccines to VetReg in a variety of quantity units, including doses and volumes, which required us to harmonize the data to a single unit. It was not possible to harmonize the quantity units for nine percent of the records, which were mainly bath vaccines reported in doses. We identified specific issues that required manual editing of the units of 1 percent of the records. We validated individual variables such as product codes and location identifiers using external registers. The ‘number of animals’ variable was inconsistent for 31 percent of the records. The coverage of vaccine data in VetReg ranged from 81 to 113 percent for the ten most sold vaccines in 2020–2022, as compared to wholesales statistics. For the timeliness, we found that 75 percent of the records were submitted within 25 days for all years.
Conclusions
Overall, we found that the fish vaccination data in VetReg was of sufficient quality to monitor injectable vaccine usage at hatcheries after 2020. We identified issues at the product level, with bath vaccines, and with single variables (number of animals, weight, and species). We recommend that quality can be improved by reporting all vaccines in volume rather than doses, reporting a single vaccine prescription per report, and including a deadline for pharmacies to report in the legislation.
... Näin voidaan havainnollistaa Julkisten rahavirtojen korrelaatiorakennetta ja niissä muodostuvia ryhmiä. Raportin tulokset on pääosin visualisoitu R tilasto-ohjelman ggplot2-paketilla (Wickham 2019). Aineistojen kuvailussa hyödynnettiin myös webr pakettia (Keon-Woong 2020). ...
... All results were obtained using the R system for statistical computing (R Core Team, 2023), version 4.4.2, making use of the lavaan (Rosseel, 2012), ggplot2 (Wickham, 2016), and xtable (Dahl, Scott, Roosen, Magnusson, & Swinton, 2019) packages. R is freely available under the General Public License 2 from the Comprehensive R Archive Network at http:// CRAN.R-project.org/. ...
Social science researchers are generally accustomed to treating ordinal variables as though they are continuous. In this paper, we consider how identification constraints in ordinal factor analysis can mimic the treatment of ordinal variables as continuous. We describe model constraints that lead to latent variable predictions equaling the average of ordinal variables. This result leads us to propose minimal identification constraints, which we call "integer constraints," that center the latent variables around the scale of the observed, integer-coded ordinal variables. The integer constraints lead to intuitive model parameterizations because researchers are already accustomed to thinking about ordinal variables as though they are continuous. We provide a proof that our proposed integer constraints are indeed minimal identification constraints, as well as an illustration of how integer constraints work with real data. We also provide simulation results indicating that integer constraints are similar to other identification constraints in terms of estimation convergence and admissibility.
... Furthermore, we introduce a set of packages offering streamlined data manipulation and advanced visualization, facilitating alternative multiple placement filtering techniques within R. Unlike BoSSA, these packages leverage ggplot2's graphics grammar, providing an elegant and flexible approach to data visualization [24]. With treeio and tidytree's capabilities for data parsing, manipulation, and integration, metadata, and other associated data can be seamlessly incorporated into phylogenetic placement analyses. ...
In metabarcoding research, such as taxon identification, phylogenetic placement plays a critical role. However, many existing phylogenetic placement methods lack comprehensive features for downstream analysis and visualization. Visualization tools often ignore placement uncertainty, making it difficult to explore and interpret placement data effectively. To overcome these limitations, we introduce a scalable approach using treeio and ggtree for parsing and visualizing phylogenetic placement data. The treeio‐ggtree method supports placement filtration, uncertainty exploration, and customized visualization. It enhances scalability for large analyses by enabling users to extract subtrees from the full reference tree, focusing on specific samples within a clade. Additionally, this approach provides a clearer representation of phylogenetic placement uncertainty by visualizing associated placement information on the final placement tree.
... We performed all data analysis in RStudio 4.3.1 (R Core Team, 2020). All graphs were created using functions from the ggplot2 package (Wickham, 2016). The PCA was carried out using the pca function, and the verification of the most explanatory variables for each axis was carried out using the dimdesc function, present in the FactoMineR package (Lê et al., 2008). ...
Cascade reservoirs consist of a series of dams built along a watercourse or within a watershed. Successive reservoirs, which regulate the flow of matter, form hydrological gradients. Consequently, they create limnological gradients resulting from altered biogeochemical cycles along the longitudinal axis of the river. Our objective was to examine the dynamics of cascade reservoirs in a large neotropical river, considering their hydrological and limnological characteristics, and the responses to biomass, richness and functional diversity, as well as species composition and functional traits. In this study, we assessed seven hydropower reservoirs installed in a cascade along a stretch of 1,500 km in a large neotropical river. Our results demonstrated that water residence time and distances upstream were important hydrological variables in determining limnological and hydrological longitudinal gradients, leading to a decrease in conductivity and total phosphorus and an increase in water temperature. We observed the selection of species and taxonomic groups of Cyanobacteria such as Raphidiopsis raciborskii upstream and Microcystis sp. downstream, and species with unicellular and mucilaginous characteristics throughout the cascade. Our work demonstrated that the dynamics between cascade reservoirs interfered with the spatial distribution of species and the selection of functional traits of phytoplankton.
... Acoustic detection data were analysed using R software [59] and the packages tidyverse [81], sp [9,54], and sf [53]. Plots were created with ggplot2 [80] and ggspatial [33]. ...
Grey mullets (family Mugilidae) are widespread across coastal, brackish, and freshwater habitats, and have supported fisheries for millennia. Despite their global distribution and commercial value, little is known about their movement ecology and its role in the co-existence of sympatric mullet species. Gaps in knowledge about migratory behaviour, seasonal occurrence, and movement scales have also impeded effective management, highlighting the need for further research. This study aimed to identify key habitats and timing of grey mullet presence across the Dutch Wadden Sea, North Sea, and freshwater areas, and to explore potential behavioral differences between two grey mullet species: thicklip mullet (Chelon labrosus) and thinlip mullet (Chelon ramada). Using acoustic telemetry, we tracked 86 tagged grey mullet over three years (thicklip mullet, N = 74; thinlip mullet, N = 12), combining data from 100 local acoustic receivers and the European Tracking Network. Both species were detected in the Wadden Sea from April to November, however, thinlip mullet arrived in the Wadden Sea earlier than thicklip mullet (median date = May 16 vs. June 7). Individual residency in the Wadden Sea lasted a median 97 days for thicklip mullet and 94 days for thinlip mullet. Thinlip mullet were also detected by more receivers and over a larger area than thicklip mullet, indicating differences in movement behaviour. Both species showed an affinity for receivers near major harbours, with thinlip mullet more often detected near fresh water outflows. Seasonal migrations between coastal and offshore waters were also observed, with one thinlip mullet returning to freshwater across consecutive years. North Sea detections spanned ten months, with a gap during the presumed spawning period (Jan–Feb). Our data suggest that thinlip mullet show a preference for deeper gullies while thicklip mullet may spend more time in shallow areas and flooded tidal flats. These findings highlight the importance of the Wadden Sea as a seasonal foraging ground and provide insights into the migratory patterns of grey mullets.
... All analyses were done in Program R (R Core Team 2022) and mixed models were analyzed in package lme4 (Bates et al. 2015). All visualizations were created using the "ggplot2" package in R (Wickham 2016). ...
Warming associated with climate change is driving poleward shifts in the marine habitat of anadromous Pacific salmon (Oncorhynchus spp.). Yet the spawning locations for salmon to establish self‐sustaining populations and the consequences for the ecosystem if they should do so are unclear. Here, we explore the role of temperature‐dependent incubation survival and developmental phenology of coho salmon (Oncorhynchus kisutch) as a potential early life history barrier to establishment in an Arctic stream. We exposed embryos to temperatures previously recorded in the substrate of an Arctic groundwater spring‐fed spawning environment. Using a common garden experimental design, coho salmon embryos were exposed to treatments that thermally mimicked four spawning dates from August 1 to October 1 (AUG1, SEPT1, SEPT15, and OCT1). Spawning temperatures were 6°C at the warmest (AUG1) and 1.25°C at the coldest (OCT1). We observed low survival rates in SEPT1 (41%) and OCT1 (34%) and near complete mortality in the other treatments. While far below what is considered normal in benign hatchery‐like conditions, these rates suggest that temperatures experienced at these spawning dates are survivable. We detected differences in developmental rates across treatments; embryos developed 1.9 times faster in the warmest treatment (AUG1, 120 days) compared to the coldest (OCT1, 231 days). Differences in accumulated thermal units (ATUs) needed for hatching ranged from 392 ATUs in AUG1 to 270 ATUs in OCT1, revealing compensation in developmental requirements. Given these findings, the most thermally suitable spawning dates within our study are between September 15 and October 1, which facilitates hatching and projected nest emergence to occur in spring warming conditions (March–September). Broadly, our findings suggest that spawning sites within thermal tolerances that can support the survival and development of coho salmon exist in the North American Arctic. Whether the habitat is otherwise suitable for transitions through other life stages remains unknown.
... [113], and genes with a P-value of less than 0.05 and a fold change greater than 2 were designated as mitochondrial differential expression genes in this study. Finally, principal component analysis and volcano plotting were performed using FactoMineR [114] and ggplot2 [115]. ...
As a globally distributed perennial Gramineae, Phragmites australis can adapt to harsh ecological environments and has significant economic and environmental values. Here, we performed a complete assembly and annotation of the mitogenome of P. australis using genomic data from the PacBio and BGI platforms. The P. australis mitogenome is a multibranched structure of 501,134 bp, divided into two circular chromosomes of 325,493 bp and 175,641 bp, respectively. A sequence-simplified succinate dehydrogenase 4 gene was identified in this mitogenome, which is often translocated to the nuclear genome in the mitogenomes of gramineous species. We also identified tissue-specific mitochondrial differentially expressed genes using RNAseq data, providing new insights into understanding energy allocation and gene regulatory strategies in the long-term adaptive evolution of P. australis mitochondria. In addition, we studied the mitogenome features of P. australis in more detail, including repetitive sequences, gene Ka/Ks analyses, codon preferences, intracellular gene transfer, RNA editing, and multispecies phylogenetic analyses. Our results provide an essential molecular resource for understanding the genetic characterisation of the mitogenome of P. australis and provide a research basis for population genetics and species evolution in Arundiaceae.
... After discussing some initial trends and examples, we focused on relationships and data we thought would illuminate the care provider's experience as a remote worker. Frequency counts and other data manipulation were conducted in R with the dplyr, tidyr, ggplot2, irr, stringr, tibble, svglite [25,45,[58][59][60][61][62]. We discuss these descriptive statistics along with important examples from our diary study and interviews in Section 4 below. ...
The upsurge in remote and hybrid work practices has prompted researchers to explore the technological, organizational, and psychological dimensions of remote work. However, the nuanced dynamics of balancing familial duties, especially care work for older adults, and professional work is often overlooked in the literature. This balancing act introduces unique stressors, blurring work and personal life boundaries, potentially causing physical stress or prompting care providers to leave their jobs. The inherent nature of remote work executed within the familial sphere underscores the importance of understanding how care responsibilities impact the remote work experience. This study addresses this gap by focusing on informal care providers, an understudied population in the CSCW remote work literature. Through a diary study and interviews, we investigate challenges remote workers face and the role of technology in their work. Findings highlight the prevalence of care work, emphasizing the need for targeted technological interventions to support the well-being and productivity of remote workers managing care duties. Critical challenges include familial responsibilities on higher-stress days, lack of communication regarding availability, personal time sacrifices for productivity, coordination in place making among care providers, and multitasking on days with familial responsibilities or distractions. This exploratory study underscores the importance of assisting care providers in a way that embraces their (possible) role as remote workers, offering insights for future research and technological interventions to support remote workers navigating the complexities of care work.
... We then implemented a hierarchical Bayesian clustering algorithm in Fastbaps v1.0 (Fast Hierarchical Bayesian Analysis of Population Structure) (Tonkin-Hill et al., 2019) in R v3.5.3 using the ape v5.3, ggplot2 v3.1.1, and ggtree v2.4.1 packages to cluster the alignment sequences (Paradis & Schliep, 2019;R Core Team, 2021;Wickham, 2016;Yu et al., 2017). We visualized and annotated the phylogenetic trees using iTOL v6 (Letunic & Bork, 2021). ...
Background
Hepatitis E virus (HEV) genotype 1 is a major cause of acute jaundice in Bangladesh, yet the transmission dynamics and genetic diversity of this virus remains inadequately characterized. This study aims to elucidate the genetic landscape and transmission patterns of HEV infection in Bangladesh through phylogenetic analysis of viral sequences obtained from a nation-wide surveillance program.
Methodology/Principal Findings
We analyzed 104 partial HEV open reading frame 1 (ORF-1) sequences collected from acute jaundice patients admitted to six tertiary hospitals across Bangladesh during December 2014–September 2017. Phylogenetic trees were constructed using maximum likelihood methods, and Bayesian clustering was employed to assess genetic diversity and transmission patterns. All sequences were identified as HEV genotype 1 (HEV-1), with 10 sequences predominantly collected in 2017 classified as subtype 1g, forming a distinct cluster. A lack of geographic clustering across the sequences suggests widespread transmission across the country rather than geographically distinct transmission networks. Of the 104 sequenced cases, 5 (5%) were associated with fatal outcomes, although these sequences did not cluster phylogenetically.
Conclusions/Significance
This phylogenetic analysis provides evidence of widespread transmission of HEV-1 across Bangladesh, with a reduction in genetic diversity in 2017 suggesting the potential emergence of a dominant viral cluster around that time. Given the paucity of clinical surveillance of HEV, genomics may provide new insights into unobserved aspects of the transmission of the virus locally and globally.
... We ran simple linear regression analyses to visualize the relationships between significant continuous variables and foraging success. Plots were generated with a 95% confidence interval with the "ggplot2" package in R. 38,42 Additionally, we created box plots to illustrate the variations in foraging success across different categorical variables, using the similar R-package. Due to the presence of unequal sample sizes in various categories, we applied the Paired t-test to assess variations in foraging success between two categories and non-parametric PERMANOVA (Permutational multivariate analysis of variance) test for comparisons involving three categories. ...
It has been considered that the coexistence of similar species is facilitated by the differentiation in their foraging habits. We sought to test this hypothesis by evaluating the foraging behavior and factors influencing the foraging success of three coexisting ibis species-Black-headed ibis (Threskiornis melanocephalus), Red-naped ibis (Pseudibis papillosa), and Glossy ibis (Plegadis falcinellus)-in the semi-arid landscape of western India from January 2020 to April 2022. Overall, foraging parameters were similar among the species, except for inter-individual parameters (P < 0.05) and for the number of locomotion turns each species performed. Probing and the number of nearby wading birds significantly and positively influenced the foraging success of all the Ibis species studied. Seasonal variations affected the foraging success only for the Red-naped Ibis and within a season, the foraging success was significantly different between species. All species showed different water depth utilization for foraging. Red-naped Ibis used habitats other than wetlands without impacting its foraging success. Also, the foraging success differed between adults and juveniles of the Red-naped ibis but remained consistent for the Black-headed ibis. These findings can aid the future development of hypotheses related to how similar species coexist and help in management and conservation efforts for these species.
... , employing the rstatix package (version 0.7.0) 72 . Plots were generated 688 using the ggplot2 package73 . The comparison of normal and cilia misdirection phenotypes 689 among mutants was performed using Fisher's exact tests, while comparisons of cilia 690 lengths, dendrite lengths, and other metrics were conducted using the Wilcoxon time-lapse recordings of IFT movements (endogenous IFT-74::GFP), 699 comprising 90 frames with a frequency of one frame per 0.333 seconds. ...
... Wilcoxon signed-rank test [23] were used to compare the L*a*b* color measurement between two paired test series (time 1: before treatment, time 2: after treatment) for each irradiation source. Figures were created with the R packages ggplot2 [25] and tidyverse [26]. Statistical analyses were performed with R version 4.1.2 ...
Irradiation with UV-C is a non-thermal decontamination treatment for food surfaces. It can be of particular interest for foods which are not usually heat treated, such as fermented, dried or cured meat products. An example of a food is dry-cured and smoked raw ham, which was short-term treated with UV-C for 5 to 60 s. Objective of this study was to determine the surface decontamination effect of short-term UV-C treatment, as higher treatment times and doses were usually applied in other studies. Quality parameters such as lipid oxidation and color were also evaluated. Raw ham samples were inoculated (Escherichia coli, Staphylococcus aureus, Latilactobacillus sakei, Debaryomyces hansenii) and treated with a conventional low-pressure mercury vapor (Hg-LP) lamp (mean intensity = 4.5 mW/cm²) and an UV-C LED module (mean intensity = 4.2 mW/cm²). Overall, the UV-C treatment resulted in a reduction of all inoculated microorganisms on raw ham, without affecting quality parameters. The antimicrobial effect was different for different microorganisms and UV-C applications. Highest reduction effect after 60 s was observed for E. coli with Hg-LP lamp (1.4 log10 cfu/cm²). The microbial inactivation effect with the Hg-LP lamp was larger even with a lower treatment dose than treatment with the LED module. Higher treatment doses did not result in significantly larger reductions of colony counts. The effect on surface decontamination was rather low as a single treatment. However, but it could be an additional measure as part of a multiple hurdle concept to reduce microbial load and improve food safety.
... SAMtools v1.10 [26] was used to calculate the mapping rate and genome coverage. Data visualization was performed using the R package ggplot2 [27]. ...
Background
Identification of global transcriptional events is crucial for genome annotation, as accurate annotation enhances the efficiency and comparability of genomic information across species. However, the annotation of transcripts in the cucumber genome remains to be improved, and many transcriptional events have not been well studied.
Results
We collected 1,904 high-quality public cucumber transcriptome samples from the National Center for Biotechnology Information (NCBI) to identify and annotate transcript isoforms in the cucumber genome. Over 44.26 billion Q30 clean reads were mapped to the cucumber genome with an average mapping rate of 92.75%. Transcriptome assembly identified 151,453 transcripts spanning 20,442 loci. Among these, 12.7% of transcripts exactly matched annotated genes in the cucumber reference genome. More than 80% of the transcripts were classified as novel isoforms. Approximately 96.6% of these isoforms originated from known gene loci, while around 3.3% were derived from novel gene loci. Coding potential prediction identified 4,543 long non-coding RNAs (lncRNAs) across 3,376 loci. Building on these results, we identified tissue-specific transcripts in 10 tissues. Among that, 1,655 annotated genes and 4,214 predicted transcripts were considered as tissue-specific. The root exhibited the highest number of tissue-specific transcripts, followed by shoot apex. Subsequent selective pressure analysis revealed that tissue-specific regions experienced stronger directional selection compared to non-specific regions.
Conclusions
By analyzing thousands of published transcriptome data, we identified abundant transcriptional events and tissue-specific transcripts in cucumbers. This study presented here adds the great value to the public data and offers insights for further exploration of a more comprehensive tissue regulatory network in cucumber.
... The SDMs were made using the R packages: "lidR," "Hmisc," "raster," "tidyverse," "sdm," "rgeos," "dismo," and "corrplot." Lastly, the figures were made using the R packages "ggplot2" (Wickham 2016), "gridExtra," and "sdm." First, we tested for multicollinearity between variables using a Spearman rank correlation. ...
The Eurasian Water Shrew (Neomys fodiens) is one of the largest shrew (Soricidae) species in Eurasia. In Western Europe, this semiaquatic species often occurs in riparian and marshland habitats that have a high degree of naturalness, but is being threatened by habitat degradation and other anthropogenic factors. The species mostly occurs in low abundance and is elusive. Therefore, understanding its habitat use is challenging, yet imperative for establishing species-specific conservation measures. Technological developments in radio tracking and high-resolution remote sensing such as Light Detection And Ranging (LiDAR) now enable the quantification of ecological niches and provide insight into habitat requirements for a species. Here, we combined radio tracking and LiDAR to quantify habitat use by Eurasian water shrews. Alongside a lowland brook in the Netherlands, 20 individuals were tracked between September and October 2022, resulting in 332 unique locations of Eurasian water shrews. For each of these locations, 11 LiDAR-derived variables were calculated and subsequently analyzed in a species distribution model (SDM). The SDM yielded a model with a high accuracy (predictive performance AUC = 0.93). The variable of highest importance was dense and relatively short vegetation <1 m, which had a positive effect on Eurasian Water Shrew occurrence. Open areas seem to be avoided. Vegetation of heights between 1 and 15 m were found to be less important for the occurrence. The probability of occurrence decreased with increasing distance to water, indicating that the species occurs in the proximity of water, although vegetation-related variables were more important. The obtained detailed knowledge of fine-scale habitat use can be used to improve habitat conservation, restoration, and management for the species. Combining radiotelemetry data with LiDAR data is a promising approach to identifying species–habitat relationships of elusive species such as the Eurasian Water Shrew.
... Most figure panels were generated programmatically in R using the package ggplot2 (ref. 56) or with BioRender (full license) (Fig. 1). ...
Gene expression quantitative trait loci are widely used to infer relationships between genes and central nervous system (CNS) phenotypes; however, the effect of brain disease on these inferences is unclear. Using 2,348,438 single-nuclei profiles from 391 disease-case and control brains, we report 13,939 genes whose expression correlated with genetic variation, of which 16.7–40.8% (depending on cell type) showed disease-dependent allelic effects. Across 501 colocalizations for 30 CNS traits, 23.6% had a disease dependency, even after adjusting for disease status. To estimate the unconfounded effect of genes on outcomes, we repeated the analysis using nondiseased brains (n = 183) and reported an additional 91 colocalizations not present in the larger mixed disease and control dataset, demonstrating enhanced interpretation of disease-associated variants. Principled implementation of single-cell Mendelian randomization in control-only brains identified 140 putatively causal gene–trait associations, of which 11 were replicated in the UK Biobank, prioritizing candidate peripheral biomarkers predictive of CNS outcomes.
... For performing the LightGBM, XGBoost and SVR algorithms, the 'lightgbm', 'xgboost' and 'e1071' packages were used, respectively (Shi et al. 2023;Chen et al. 2023;Meyer et al. 2023). To visualise the feature importance of all models, the 'ggplot2' package was used (Wickham 2016). Descriptive statistics of the explanatory and response variables. ...
Prediction of body weight (BW) using biometric measurements is an important tool especially for animal welfare and automatic phenotyping tools that needs mathematical models. In this study, it was aimed to predict the BW using body length (BL), chest girth (CG) and width of the waist (WW) for rabbits of the maternal form of Hyla NG. The standard rabbit‐raising practices were applied for the animals. A highly efficient gradient‐boosting decision tree (LightGBM), eXtreme gradient‐boosting (XGBoost) and support vector machine (SVM) algorithms were evaluated and compared to the prediction of BW. The coefficient of determination, root mean square error and mean absolute error values were used as comparison criteria. The results showed that LightGBM, XGBoost and SVM algorithms were well fit for the BW using the biometric measures with over 95% accuracy for both train and test sets. The BL was determined as the most explanatory variable on body weight.
... Linear and polynomial models, Mann-Whitney-U-Test, and Welch's t-test implemented using basic R and ggplot2 (Wickham, 2016) were used to determine the leaf-level physiological and NSC temperature response. The final assembly of the graphs was done using the R package patchwork (Pedersen, 2022). ...
Accurate predictions of vegetation responses to global warming require a precise
understanding of physiological temperature responses. We investigated the effects of air temperature (10°C to 40°C) under constant low vapour pressure deficit and
sufficient water supply on leaf-level gas exchange, chlorophyll fluorescence, non-
structural carbohydrate (NSC) concentrations, and the hydrogen (δ2H) and oxygen
(δ18O) isotopic composition of leaf water and leaf sugar in C3 trees, forbs, grasses, and one C4 grass.
Rising temperatures significantly altered leaf physiology, NSC composition, and the leaf sugar isotopic composition. We observed a shift from starch to sugar above 30°C, indicating a preference for a more readily available carbohydrate, with a concomitant shift in the hydrogen isotopic composition of leaf sugar.
Furthermore, we demonstrate for the first time the close relationship between
carbohydrate metabolism and stable isotope fractionation, with 2H enrichment in leaf sugar with increasing temperature. Our results suggest that C3 plants may experience shifts in their carbon metabolism at temperatures above 30°C, which can be detected by δ2H of leaf sugar. Such carbon imbalances may reduce the resilience of C3 plants in an increasingly warming world.
... All analyses were carried out in R version 4.4.0 (R Core Team, 2024) using the packages psych (Revelle, 2024), dplyr (Wickham et al., 2023), tidyr (Wickham et al., 2024), lavaan (Rosseel, 2012), EFAtools (Steiner & Grieder, 2020), car (Fox & Weisberg, 2018), mice (van Buuren & Groothuis-Oudshoorn, 2011), naniar (Tierney & Cook, 2023), ggplot2 (Wickham, 2016), semPower (Moshagen & Bader, 2024), and qgraph (Epskamp et al., 2012). Prior to the main analyses, missing data were analysed to see any patterns that might distort the results. ...
Cambodia has a high potential for the use of residential photovoltaics (RPV), a promising approach to mitigate climate change, but the country is lagging behind in realising this potential. This paper attempts to empirically investigate what motivates and hinders Cambodians' intentions to adopt RPV from a psychological perspective. To answer this research question, an integrative theoretical framework based on the value−belief−norm (VBN) theory and the theory of planned behaviour (TPB) was used. Data was collected by means of a survey, distributed among individuals belonging to the urban middle and upper classes of Cambodia's capital, Phnom Penh. The data of N = 272 participants was then analysed using structural equation modelling and Gaussian graphical modelling. The results revealed that participants' intention to adopt RPV is associated both with the motivation to protect the environment and with the motivation to make a reasoned decision within the role of consumer. The study's results are discussed with particular regard to practical implications that can be derived from them, e.g., the design of potential communicative strategies that can be used to foster the intention to adopt RPV in the future.
... All analyses were performed using the R language version 4.3.1 (R Core Team 2023) through the Rstudio version 023.9.1.494 (Posit team 2023), and the "geomorph" (Adams et al. 2022;Baken et al. 2021;Adams 2021, 2018), "pataqu" (Bonhomme and Evin 2023), and "ggplot2" (Wickham 2016) packages. ...
In North-western Mediterranean basin, from Southern France to North-eastern Iberia, the transition from the Iron Age to Antiquity is marked by significant political, economic, and cultural changes, as well as a major shift in the body size of livestock, particularly cattle. However, the evolution of suids and caprines during this period has been less thoroughly investigated in the area. This study aims to investigate the morphological variation of sheep, goats, and pigs from the Rhône to the Ebro rivers, from the First Iron Age to Late Antiquity (eighth century BCE to sixth century CE). To this end, 1,099 caprine and 384 suid third lower molars from 96 archaeological sites were analysed using a 2D landmark and sliding semi-landmark based geometric morphometric approach. The impact of a series of socio-economic and environmental factors on the morphometric variation was tested considering time, geography, altitude, topography and urban/rural categorisation of the sites. The results indicate that while sheep teeth increased in size and differ in shape between the Second Iron Age and the end of the Roman Empire, no variation was observed in goat teeth measurements, suggesting different selection patterns for the two species over time. For suids, no differences in teeth size were detected, but differences in shape were observed throughout the chronology, possibly reflecting zootechnical improvements. While little, or no effect of different factors was found for the teeth of suids and goats, the shape of sheep teeth exhibits clear geographical structuring, along with effects of altitude, topography and site type. Thus, changes in tooth shape and size in domestic species are not the result of a single explanatory factor, but rather reflect multifactorial influences including both environmental and anthropological factors. The importance of these influences may vary over time and between species.
... 2.1, using the metafor, lme4 and ggplot 2 packages[36][37][38][39] . ...
Objective
Selective serotonin reuptake inhibitors (SSRIs) are the first choice in pharmacotherapy for children and adolescents with obsessive-compulsive disorder (OCD). SSRI-trials for pediatric OCD have never been investigated using individual participant data (IPD), which is crucial for detecting patient-level effect modifiers. Here, we performed an IPD meta-analysis on the efficacy of SSRIs compared to placebo, and a meta-regression on baseline patient characteristics which might modify efficacy.
Method
We used crude participant data from short-term, randomized, placebo-controlled SSRI trials for pediatric OCD, available from the registry of the Dutch regulatory authority. We also performed a systematic literature search and approached the authors to provide IPD. We performed a one- and two-stage analysis, with change on the Children’s Yale Brown Obsessive-Compulsive Scale (CY-BOCS) as the primary outcome. We used Odds Ratio (OR) with ≥ 35% CY-BOCS-reduction as the responder outcome measure. We examined modifying effect of age, sex, weight, duration of illness, family history and baseline symptom severity. We used the Cochrane Risk of Bias 2.0 tool to examine methodological rigor, and used the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach in order to examine certainty of evidence.
Results
We obtained data from 4 studies with a total of 614 patients. Our sample represented 86% of all participants ever included in double blind placebo controlled SSRI trials for pediatric OCD. Meta-analysis showed a reduction of 3.0 CY-BOCS points compared to placebo (95% CI 2.5 – 3.5), corresponding to a small effect size (0.38 Hedges’ g). Analysis of response showed an OR of 1.89 (95% CI 1.45 – 2.45). Of all possible modifiers, severity was correlated negatively with OR for response (beta -0.92, p 0.0074). Risk of bias was generally low. All studies were performed on the North American continent with an overrepresentation of White participants. Our findings were limited by the inability to include data on additional variables such as socio-economic status and comorbidities.
Conclusion
Our IPD meta-analysis showed a small effect size of SSRIs in pediatric OCD, with baseline severity as a negative modifier of response. Generalizability of findings might be limited by selective inclusion of White, North American participants.
... Software implementation. We use R [48] with libraries: ggplot2 [49], ggpubr [50], pheatmap [51], ggplotify [52], igraph [53], ggraph [54], and latticeExtra [55] for visualisation; cluster [56], factoextra [57], NbClust [29] for clustering and multivariate analysis; phytools [58], phangorn [59], [60], ape [61] for processing and manipulating phylogenies; and parameters [62], R.utils [63], stringr [64], and git2r ...
Accumulation modelling uses machine learning to discover the dynamics by which systems acquire discrete features over time. Many systems of biomedical interest show such dynamics: from bacteria acquiring resistances to sets of drugs, to patients acquiring symptoms during the course of progressive disease. Existing approaches for accumulation modelling are typically limited either in the number of features they consider or their ability to characterise interactions between these features – a limitation for the large-scale genetic and/or phenotypic datasets often found in modern biomedical applications. Here, we demonstrate how clustering can make such large-scale datasets tractable for powerful accumulation modelling approaches. Clustering resolves issues of sparsity and high dimensionality in datasets, but complicates the intepretation of the inferred dynamics, especially if observations are not independent. Focussing on hypercubic hidden Markov models (HyperHMM), we introduce several approaches for interpreting, estimating, and bounding the results of the dynamics in these cases and discuss how biomedical insight could be gained from such analyses. We demonstrate this ‘Cluster-based HyperHMM’ (CHyperHMM) pipeline for synthetic data, clinical data on disease progression in severe malaria, and genomic data for anti-microbial resistance evolution in
Klebsiella pneumoniae
, reflecting two global health threats.
Studying complexes of cryptic or pseudocryptic species opens new horizons for the understanding of speciation processes, an important yet vague issue for the digeneans. We investigated a hemiuroidean trematode Lecithaster salmonis across a wide geographic range including the northern European seas (White, Barents, and Pechora), East Siberian Sea, and the Pacific Northwest (Sea of Okhotsk and Sea of Japan). The goals were to explore the genetic diversity within L. salmonis through mitochondrial (cox1 and nad5 genes) and ribosomal (ITS1, ITS2, 28S rDNA) marker sequences, to study morphometry of maritae, and to revise the life cycle data. Mitochondrial markers showed that L. salmonis is likely divided into six lineages (referred to as operational taxonomic units, OTUs), which often occur in sympatry, sometimes in a single host specimen. Variation in rDNA was not consistent with that in the mitochondrial markers. Morphometric analysis of maritae was performed for four out of six OTUs; it showed that some OTUs had significant differences from the others, but some did not. The effect of host species on the morphometric characteristics cannot be excluded. Intramolluscan stages were identified for two OTUs; they differed clearly by the structure of cercariae and also by the species of the first intermediate host. The case of L. salmonis is instructive in how different criteria for species delimitation can contradict each other. We regard this as a sign of recent or ongoing speciation and suggest using the name Lecithaster cf. salmonis. The most promising criteria to differentiate genetic lineages within L. cf. salmonis are first intermediate hosts and morphological characteristics of the cercariae: shape of the delivery tube and caudal cyst, and length of the filamentous appendage.
This paper proposes a Multimarginal Optimal Transport (MOT) approach for simultaneously comparing measures supported on finite subsets of , . We derive asymptotic distributions of the optimal value of the empirical MOT program under the null hypothesis that all k measures are same, and the alternative hypothesis that at least two measures are different. We use these results to construct the test of the null hypothesis and provide consistency and power guarantees of this k-sample test. We consistently estimate asymptotic distributions using bootstrap, and propose a low complexity linear program to approximate the test cut-off. We demonstrate the advantages of our approach on synthetic and real datasets, including the real data on cancers in the United States in 2004 - 2020.
Topic modeling using Latent Dirichlet Allocation (LDA) is a type of text mining approach. Text mining encompasses a range of techniques and processes for extracting information and knowledge from large collections of textual data. LDA, as a specific method within text mining, helps achieve this goal by uncovering underlying topics or themes present in a corpus of text documents. There is emerging interest in the use of approaches such as topic modeling in the field of educational research for such tasks which may require analyzing vast amounts of unstructured textual data, including open-ended responses to surveys or assessments, student essays, or other unstructured forms of text data. Topic modeling provides a means to identify themes (i.e., topics) within such data, helping researchers identify underlying structures and patterns that may be used to complement or serve the function of more time-consuming conventional methods of qualitative analysis. Particularly when studying novel phenomena at scale, topic modeling could be an especially useful method. At the same time, there are limitations to using topic modeling in educational research. In the present chapter, we introduce topic modeling by describing past educational research that has used this method, provide general steps for conducting topic modeling, and discuss the limitations of using topic modeling in educational research.
Florida and Caribbean coral reefs have been in decline for decades due to pollution, overfishing, climate change and disease. In 2014, stony coral tissue loss disease (SCTLD) emerged in Florida and has since spread throughout much of the Caribbean. SCTLD triggers rapid tissue loss and mortality across > 20 coral species, dramatically impacting ecosystem function. Little is known about how SCTLD impacts early life stage corals and whether interventions can aid recruits. Therefore, the goals of this study were to (1) assess if exposure to SCTLD impacts newly settled Montastraea cavernosa and Dendrogyra cylindrus recruits and (2) determine if probiotic treatment could enhance recruit survival. Our results show that M. cavernosa recruits exposed to disease experienced lower survivorship. D. cylindrus recruits were not significantly impacted by disease, but disease severity of donor corals, as assessed by lesion progression, was minimal in the D. cylindrus experiment, which could explain the results. Treatment of recruits with the probiotic Pseudoalteromonas sp. McH1-7 did not have a significant effect on recruit survivorship for either species, although this could be due to interactions with the settlement inducer tetrabromopyrrole, which we used to induce larval settlement, but can also have antibiotic properties. Given the myriad of challenges coral reefs are facing, understanding how SCTLD impacts early life stages and developing methods to enhance recruit survival would be invaluable.
In recent years, Oxford Nanopore Technologies (ONT) has gained substantial attention across various domains of nucleic acids’ research, owing to its unique advantages over other sequencing platforms. Originally developed for long-read sequencing, ONT technology has evolved, with recent advancements enhancing its applicability beyond long reads to include short, synthetic DNA-based applications. However, sequencing short DNA fragments with nanopore technology often results in lower data quality, likely due to a lack of protocols optimised for these fragment sizes. To address this challenge, we refined the standard ONT library preparation protocol to improve its performance for ultra-short DNA targets. Utilising the same core reagents required for conventional ONT workflows, we introduced targeted alterations to enhance compatibility with shorter fragment lengths. We then benchmarked these adjustments against libraries prepared using the standard ONT protocol. Here, we present a comprehensive, step-by-step protocol that is accessible to researchers of varied technical expertise, facilitating high-quality sequencing of ultra-short DNA fragments. This protocol represents a significant improvement in sequencing quality for short DNA fragments using ONT technology, broadening the range of possible applications.
Insects can survive harsh conditions, including Arctic winters, by entering a hormonally induced state of dormancy, known as diapause. Diapause is triggered by environmental cues such as shortening of the photoperiod (lengthening of the night). The time of entry into diapause depends on the latitude of the insects' habitat, and this applies even within a species: populations living at higher latitudes enter diapause earlier in the year than populations living at lower latitudes. A long-standing question in biology is whether the internal circadian clock, which governs daily behaviour and serves as a reference clock to measure night length, shows similar latitudinal adaptations. To address this question, we examined the onset of diapause and various behavioural and molecular parameters of the circadian clock in the cosmopolitan fly, Drosophila littoralis, a species distributed throughout Europe from the Black Sea (41 degrees N) to arctic regions (69 degrees N). We found that all clock parameters examined showed the same correlation with latitude as the critical night length for diapause induction. We conclude that the circadian clock has adapted to the latitude and that this may result in the observed latitudinal differences in the onset of diapause.
This chapter will guide you through the steps to reproduce the results of a published corpus linguistics study (Van Hulle & Enghels 2024a) using R.
The chapter will walk you through how to:
Download the authors’ original data (Van Hulle & Enghels 2024b) and load it in R
Understand the structure of the data
Wrangle the data to reproduce Tables 5 and 8 from Van Hulle & Enghels (2024a)
Calculate the normalized frequencies as reported in Van Hulle & Enghels (2024a)
Calculate the type/token ratios as reported in Van Hulle & Enghels (2024a)
Compare our results with those printed in Van Hulle & Enghels (2024a)
Visualize our results as line plots using {ggplot2} to facilitate the interpretation of the results
Wildfires are particularly prevalent in the Mediterranean, being expected to increase in frequency due to the expected increase in regional temperatures and decrease in precipitation. Effectively suppressing large wildfires requires a thorough understanding of containment opportunities across landscapes, to which empirical spatial modelling can contribute largely. The previous containment model in Catalonia failed to account for the crucial roles of weather conditions, lacked temporal prediction and could not forecast windows for containment opportunities, prompting this research. We employed a detailed geospatial approach to assess the spatial-temporal variations in containment probability for escaped wildfires in Catalonia. Using machine learning algorithms, geospatial data, and 124 historical wildfire perimeters from 2000 to 2015, we developed a predictive model with high accuracy (Area Under the Receiver Operating Characteristics Curve = 0.81 ± 0.03) over 32,108 km² at a 30-meter resolution. Our analysis identified agricultural plains near non-burnable barriers, such as major road corridors, as having the highest containment probability. Conversely, steep mountainous regions with limited accessibility exhibited lower containment success rates. We also found temperature and windspeed to be critical factors influencing containment success. These findings inform optimal firefighting resource allocation and contribute to strategic fuel management initiatives to enhance firefighting operations.
Terrestrial molluscs living in temperate and polar environments must contend with cold winter temperatures. However, the physiological mechanisms underlying the survival of terrestrial molluscs in cold environments and the strategies employed by them are poorly understood. Here we investigated the cold tolerance of Ambigolimax valentianus, an invasive, terrestrial slug that has established populations in Japan, Canada, and Europe. To do this, we acclimated A. valentianus to different environmental conditions (differing day lengths and temperatures), then exposed them to sub-zero temperatures and measured overall survival. Then, we measured low molecular weight metabolites using ¹H NMR to see if they play a role in their cold tolerance as they do in other invertebrate species. We found that A. valentianus is not strongly freeze tolerant but does become more cold-hardy after acclimation to shorter day lengths. We also found that no metabolites were strongly upregulated in response to winter conditions despite the change in cold hardiness, and instead saw evidence of metabolic suppression leading up to winter such as formate and L-glutamine being suppressed in winter conditions.
Shotgun and proximity‐ligation metagenomic sequencing were used to generate thousands of metagenome assembled genomes (MAGs) from the untreated wastewater, activated sludge bioreactors, and anaerobic digesters from two full‐scale municipal wastewater treatment facilities. Analysis of the antibiotic resistance genes (ARGs) in the pool of contigs from the shotgun metagenomic sequences revealed significantly different relative abundances and types of ARGs in the untreated wastewaster compared to the activated sludge bioreactors or the anaerobic digesters ( p < 0.05). In contrast, these results were statistically similar when comparing the ARGs in the pool of MAGs, suggesting that proximity‐ligation metagenomic sequencing is particularly useful for pairing ARGs with their hosts but less adept at discerning quantitative differences in ARG types and relative abundances. For example, numerous MAGs of the genera Acinetobacter, Enterococcus, Klebsiella and Pseudomonas were identified in the untreated wastewater, many of which harboured plasmid‐borne and/or chromosomal‐borne ARGs; none of these MAGs, however, were detected in the activated sludge bioreactors or anaerobic digesters. In conclusion, this research demonstrates that the antibiotic resistome undergoes significant transitions in both the relative abundance and the host organisms during the municipal wastewater treatment process.
Cover crops could provide numerous benefits on cocoa farms, including promoting nutrient cycling, carbon sequestration, and active soil microbial communities. Despite growing interest in cover crops for cocoa, many knowledge gaps remain, particularly detailed species and management recommendations to maximize ecosystem services and optimize the soil microbiome in different geographies and production contexts. A field experiment was conducted in South Sulawesi, Indonesia to investigate the suitability of two potential cover crops, tropical kudzu (Pueraria javanica) and fodder sweet potato (Ipomoea batatas L. Lam), for cocoa agroforestry systems. Cover crops were terminated after 6 months due to leaf chlorosis and declining yields in 2-year-old cocoa trees, leading to an analysis of tradeoffs among supporting, regulating, and provisioning services and impacts on diversity and community composition of soil prokaryotes and fungi. Kudzu had a slight positive impact on N cycling, but both cover crops appeared to compete with cocoa for K, with lower yields in sweet potato plots. Among regulating services, cover crops tended to increase C sequestration but did not affect pest and disease incidence. Cover crop treatment accounted for a small but significant percentage of soil microbiome variation, likely driven by effects on soil pH and C, and altered the relative abundance of 155 microbial taxa. Functional-trait-based species selection and optimized management could help maximize the ecosystem services delivered by cover crops, including those mediated by the microbiome, and minimize negative impacts on cocoa productivity.
This paper is the first report of symptoms caused by Alternaria alternata on aboveground organs of olive plants in Italy. On leaves, symptoms included spots and necroses frequently associated with damage caused by the olive thrips (Liothrips oleae). On fruits, symptoms included browning and necroses of pedicels, necroses of fruitlets soon after fruit set, and rot and mummification of mature fruit. Several isolates of A. alternata with identical morphological features and DNA sequences were associated with all the different symptoms. The impacts of A. alternata on olive production can be severe, and infections to fruit pedicels are particularly relevant as they cause severe fruit fall soon after fruit set.
To access genetic diversity and strategize germplasm conservation for posterity, a total of 478 oil palm (Elaeis guineensis) accessions from 11 origins in Africa were analysed via genotype-by-sequencing (GBS). The GBS revealed 7048 high-quality single nucleotide polymorphism (SNP) markers distributed across the 16 oil palm chromosomes. Polymorphic information content (PIC) and the genetic diversity parameters estimated using the SNPs revealed higher diversity for palms from the Nigerian collection, compared to other origins. Furthermore, only the Nigerian population possess private allele with the frequency of 0.016. Analysis of molecular variance (AMOVA) showed major variation occurred within the populations (74%). We also found a high geneflow and low Fst between Angola and Zaire populations. Population structure analysis revealed that the germplasm palms were stratified into six subpopulations with admixture in palms from Nigeria, Cameroon, Ghana, Sierra Leone and Guinea Conakry. Rapid linkage disequilibrium decay was observed, at 1.69 kb for r² = 0.1. A core collection could be established by conserving 96 palms, which together preserved all the alleles present in the germplasm collection.
Chlorinated paraffins (CPs) are environmental pollutants extensively used in industries. While the use of short-chain chlorinated paraffins (SCCPs) has been restricted since 2017, the use of medium-chain chlorinated paraffins (MCCPs) has risen as their replacement. Due to lipophilic character, it can be expected that CPs enter the cells; however, the in vitro accumulation potential of CPs remains poorly understood. In this study, we aimed to explore the ability of SCCPs and MCCPs to accumulate in fat cells. We utilized an in vitro model of mouse 3T3-L1 preadipocytes and adipocytes. Using gas chromatography coupled with high-resolution mass spectrometry operated in negative chemical ionization mode, we determined the intracellular amounts of CPs. These compounds accumulated at rates of 8.5 ± 0.1 µg/gcells/h for SCCPs and 7.8 ± 0.3 µg/gcells/h for MCCPs when an initial concentration of 120 ng/ml was present in the medium. This rate increased approximately tenfold when the concentration of CPs was raised to 1200 ng/ml. CPs content in adipocytes steadily increased over 5 days, whereas preadipocytes accumulated 15–20 times less CPs. This highlights the importance of cellular lipid content, which was about 12 times higher in adipocytes. Furthermore, we found that the level of chlorine content in the CPs molecules significantly influenced their accumulation. Our results demonstrate that MCCPs exhibit a similar accumulation potential to SCCPs, with lipid content playing a crucial role. As with SCCPs, restrictions on the use of MCCPs in industry should be considered to mitigate their environmental and health impacts.
Ectotherms from highly seasonal habitats should have enhanced potential for physiological plasticity to cope with climatic variability. However, whether this pattern is applicable to fossorial ectotherms, who are potentially buffered from thermal variability, is still unclear. Here, we evaluated how seasonal acclimatisation (spring vs. autumn) affected the thermal sensitivity of standard metabolic rates (SMR), rates of evaporative water loss (EWL), and skin resistance to water loss (R s) in the spotted salamander (Ambystoma maculatum). We hypothesised that temperature would have both short-and long-term effects over traits (i.e., acute exposure to test temperatures and seasonal acclimatisation, respectively). After accounting for body mass and sex, we found that short-term changes in temperature led to an increase in SMR, EWL, and R s. Additionally, SMR and R s differed between seasons, but EWL did not. Sustaining low SMR and high R s in the spring may allow salamanders to allocate energy toward overwintering emergence and breeding while simultaneously maximising water conservation. By contrast, maintaining high SMR and low R s in the autumn may allow salamanders to forage aboveground on rainy nights to replenish energy reserves in preparation for the winter. Despite the common assumption that fossorial ectotherms are buffered from thermal effects, our study shows that functional differences between seasons (i.e., breeding in the spring and provisioning in the autumn) are accompanied by seasonal changes in energetic and hydroregulatory requirements.
Asset return volatility is important for decision making in different areas of econometric finance. In the statistical modeling literature, the cDCC (corrected dynamic conditional correlation) model is frequently used for applications of this nature. With the increase in the number of assets, techniques such as composite likelihood are used to enable to estimate of high-dimensional data. However, in the presence of additive outliers this estimator becomes severely biased. This dissertation presents a robust high-dimensional method resulting from the combination of different methodologies present in the literature. The results of the simulation study showed that the proposed estimator produces less biased estimates and generates minimum variance portfolios with lower variances. This was
confirmed in the application, which involves returns from the S&P500, in which the estimators were compared considering daily and monthly rebalancing. In general, the robust method produced portfolios with lower variance in most years, including crisis years.
ResearchGate has not been able to resolve any references for this publication.