Article

Individual Comparisons by Ranking Methods

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The comparison of two treatments generally falls into one of the following two categories: (a) we may have a number of replications for each of the two treatments, which are unpaired, or (b) we may have a number of paired comparisons leading to a series of differences, some of which may be positive and some negative. The appropriate methods for testing the significance of the differences of the means in these two cases are described in most of the textbooks on statistical methods.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... The focus is on the CAARs at the end of each EW [67]. To ensure the robustness of the results through nonparametric testing, the Wilcoxon signed-rank test is applied [68]. ...
Article
Full-text available
Climate change has heightened the need to understand physical climate risks, such as the increasing frequency and severity of heat waves, for informed financial decision-making. This study investigates the financial implications of extreme heat waves on stock returns in Europe and the United States. Accordingly, the study combines meteorological and stock market data by integrating methodologies from both climate science and finance. The authors use meteorological data to ascertain the five strongest heat waves since 1979 in Europe and the United States, respectively, and event study analyses to capture their effects on stock prices across firms with varying levels of environmental performance. The findings reveal a marked increase in the frequency of heat waves in the 21st century, reflecting global warming trends, and that European heat waves generally have a higher intensity and longer duration than those in the United States. This study provides evidence that extreme heat waves reduce stock values in both regions, with portfolio declines of up to 3.1%. However, there are marked transnational differences in investor reactions. Stocks listed in the United States appear more affected by the most recent heat waves compared to those further in the past, whereas the effect on European stock prices is more closely tied to event intensity and duration. For the United States sample only, the analysis reveals a mitigating effect of high corporate environmental performance against heat risk. This study introduces an innovative interdisciplinary methodology, merging meteorological precision with financial analytics to provide deeper insights into climate-related risks.
... Investigaciones Geográficas, 83, 37-56 Para comprobar la significación estadística de los valores de los diferentes índices climatológicos durante los eventos seleccionados, así como su impacto en los promedios diarios de los contaminantes, se optó por la prueba de Wilcoxon (Wilcoxon, 1945). Esta prueba se utiliza para determinar si existe una diferencia estadísticamente significativa entre las medianas de dos grupos independientes, una prueba adecuada cuando la distribución de ambos grupos no es normal, y puede incluir valores atípicos, situación que caracteriza a la mayoría de las variables. ...
Article
Full-text available
Las inversiones térmicas constituyen un fenómeno meteorológico frecuente en áreas de montaña, por el que una masa de aire cálido se superpone a una masa de aire frío. El aire frío en superficie (CAP) causa, entre otros impactos, la acumulación de contaminantes cerca de sus fuentes de emisión. Este trabajo analiza las características de estos eventos en Campoo, un valle montañoso del norte de España, y evalúa su influencia en la calidad del aire local. Estos eventos se generan durante todo el año, siendo más fuertes y persistentes en los meses invernales, caracterizándose por un acusado ciclo diario de temperatura y humedad, cielos despejados y una capa límite de reducido espesor. Los vientos muestran una débil circulación con sentido ascendente por el día y descendente por la noche. Como consecuencia de la gran estabilidad atmosférica, los niveles de NO2 y PM10 experimentan un aumento mientras que la reducción de O3 es compensada a medida que avanza la estación cálida por procesos fotoquímicos. Sobre esta dinámica natural se superpone el origen local de los compuestos contaminantes, que determina una evolución temporal sujeta a variaciones en la actividad humana, como demuestra un “efecto fin de semana” y una mejora de la calidad del aire después de la pandemia de COVID-19.
... The R package cluster v 0.4.4 (26) was used to select the optimal resolution of 0.5, ultimately allowing the selection of a resolution that provides stable, resolved clusters. Marker genes were identified for each cluster using the Wilcoxon rank sum test (27), as implemented within Seurat. These markers were considered statistically significant at a 1% false discovery rate (FDR). ...
Article
Full-text available
Introduction Macrophages exhibit marked phenotypic heterogeneity within and across disease states, with lipid metabolic reprogramming contributing to macrophage activation and heterogeneity. Chronic inflammation has been observed in human benign prostatic hyperplasia (BPH) tissues, however macrophage activation states and their contributions to this hyperplastic disease have not been defined. We postulated that a shift in macrophage phenotypes with increasing prostate size could involve metabolic alterations resulting in prostatic epithelial or stromal hyperplasia. Methods Single-cell RNA-seq of CD45⁺ transition zone leukocytes from 10 large (>90 grams) and 10 small (<40 grams) human prostates was conducted. Macrophage subpopulations were defined using marker genes and evaluated by flow cytometry. Results BPH macrophages do not distinctly categorize into M1 and M2 phenotypes. Instead, macrophages with neither polarization signature preferentially accumulate in large versus small prostates. Specifically, macrophage subpopulations with altered lipid metabolism pathways, demarcated by TREM2 and MARCO expression, accumulate with increased prostate volume. TREM2 high and MARCO high macrophage abundance positively correlates with patient body mass index and urinary symptom scores. TREM2high macrophages have a statistically significant increase in neutral lipid compared to TREM2low macrophages from BPH tissues. Lipid-rich macrophages were observed to localize within the stroma in BPH tissues. In vitro studies indicate that lipid-loaded macrophages increase prostate epithelial and stromal cell proliferation compared to control macrophages. Discussion These data define two new BPH immune subpopulations, TREM2high and MARCOhigh macrophages, and suggest that lipid-rich macrophages may exacerbate lower urinary tract symptoms in patients with large prostates. Further investigation is needed to evaluate the therapeutic benefit of targeting these cells in BPH.
... We show the distribution of such similarities, comparing them between links confirmed by the annotators and links classified as "no". We complemented this analysis by the (non-parametric) Wilcoxon rank sum test [68] (data were not normally distributed as the Wilk-Shapiro test indicated), and Cliff delta effect size [6], considering a significance level of 95%. Due to multiple tests, p-values are adjusted using Holm's correction [25]. ...
Preprint
Full-text available
Large Language Models (LLMs) are currently used for various software development tasks, including generating code snippets to solve specific problems. Unlike reuse from the Web, LLMs are limited in providing provenance information about the generated code, which may have important trustworthiness and legal consequences. While LLM-based assistants may provide external links that are "related" to the generated code, we do not know how relevant such links are. This paper presents the findings of an empirical study assessing the extent to which 243 and 194 code snippets, across six programming languages, generated by Bing CoPilot and Google Gemini, likely originate from the links provided by these two LLM-based assistants. The study leverages automated code similarity assessments with thorough manual analysis. The study's findings indicate that the LLM-based assistants provide a mix of relevant and irrelevant links having a different nature. Specifically, although 66% of the links from Bing CoPilot and 28% from Google Gemini are relevant, LLMs-based assistants still suffer from serious "provenance debt".
... Microbial alpha diversity (Chao1 richness and Shannon diversity) was calculated using the vegan package, with the effects of population and environment assessed through linear models. Paired comparisons were performed using the nonparametric Wilcoxon rank sum test (Wilcoxon, 1945). For beta diversity, the Bray-Curtis distance was calculated, and permutational multivariate analysis of variance (PERMANOVA) was used to evaluate the influence of population and environment on microbial community composition. ...
... As a result, the ANOVA plot provides valuable information about the variability in the selected feature across different models and highlights the influence of the models on the average. According to Table 9, Wilcoxon rank-sum tests are used to compare two samples in a nonparametric way [38]. In Table 9, it can be seen that SKEW-FS produced the most favorable results in 7 out of 11 datasets. ...
Article
Full-text available
It is crucial to select the most relevant and informative features in a dataset to perform data analysis. Machine learning algorithms perform better when features are selected correctly. Feature selection is not solvable in polynomial time. The exact method takes exponential time, so the researchers used approximate algorithms to reach semi-optimal solutions. It is impossible to explore and exploit the search space in a balanced manner when using heuristic algorithms and metaheuristic methods. To solve this problem, the proposed method replaces meta-heuristic algorithms with the linear time SKEW algorithm in bioinformatics. First, each feature is ranked using the Pearson correlation criterion. Each feature is labeled A, C, G, or T according to its rank. The best feature is A, and the worst feature is T. The dataset can now be viewed as Deoxyribonucleic Acid (DNA). In the second step, the SKEW algorithm is used to determine the lexico-graphical order of suffixes. Suffixes are considered and checked as selected features. The third step involves permuting the features, and the first and second steps are repeated. The best suffix with the lowest cost function is selected after multiple iterations (e.g., ten). As compared to Simulated Annealing (SA), Genetic Algorithm (GA), Gray Wolf Optimizer (GWO), Grasshopper Optimization Algorithm (GOA), Ant Colony Optimization (ACO), Greedy, Gravitational Search Algorithm (GSA), and Pyramid Gravitational Search Algorithm (PGSA), the proposed algorithm improves the objective function by 19.3%, 7.6%, 80.6%, 102.2%, 39.7%, 105.6%, 38.1%, and 14.2% respectively.
... The Wilcoxon rank-sum test, was originally proposed by Frank Wilcoxon in a very brief paper, along with the similarly named one-sample signedrank test (Wilcoxon 1945), before the Mann-Whitney test was proposed. Though the test is often presented in the context of the location problem, the rank-sum test was originally proposed without an explicit alternative, following Fisher's significance testing paradigm. ...
Preprint
Full-text available
Statistically equivalent blocks are not frequently considered in the context of nonparametric two-sample hypothesis testing. Despite the limited exposure, this paper shows that a number of classical nonparametric hypothesis tests can be derived on the basis of statistically equivalent blocks and their frequencies. Far from a moot historical point, this allows for a more unified approach in considering the many two-sample nonparametric tests based on ranks, signs, placements, order statistics, and runs. Perhaps more importantly, this approach also allows for the easy extension of many univariate nonparametric tests into arbitrarily high dimensions that retain all null properties regardless of dimensionality and are invariant to the scaling of the observations. These generalizations do not require depth functions or the explicit use of spatial signs or ranks and may be of use in various areas such as life-testing and quality control. In the manuscript, an overview of statistically equivalent blocks and tests based on these blocks are provided. This is followed by reformulations of some popular univariate tests and generalizations to higher dimensions. Comments comparing proposed methods to those based on spatial signs and ranks are offered along with some conclusions.
... In order to determine whether there are significant differences in model performance, we first apply the Friedman test (Friedman, 1940). Following the recommendations of Benavoli et al. (2016), we then conduct a pairwise post-hoc analysis using the Wilcoxon signed-rank test (Wilcoxon, 1945), coupled with Holm's alpha correction (Holm, 1979) to adjust for multiple comparisons. ...
Preprint
Full-text available
Quantifying uncertainty in multivariate regression is essential in many real-world applications, yet existing methods for constructing prediction regions often face limitations such as the inability to capture complex dependencies, lack of coverage guarantees, or high computational cost. Conformal prediction provides a robust framework for producing distribution-free prediction regions with finite-sample coverage guarantees. In this work, we present a unified comparative study of multi-output conformal methods, exploring their properties and interconnections. Based on our findings, we introduce two classes of conformity scores that achieve asymptotic conditional coverage: one is compatible with any generative model, and the other offers low computational cost by leveraging invertible generative models. Finally, we conduct a comprehensive empirical study across 32 tabular datasets to compare all the multi-output conformal methods considered in this work. All methods are implemented within a unified code base to ensure a fair and consistent comparison.
... We calculated the means and standard deviations for the resilient outcomes and resilience factors in our sample. To assess how these values compared to those reported in the respective validation studies, we conducted one-sample Wilcoxon tests (Wilcoxon, 1945). This non-parametric test allowed us to evaluate whether the mean values in our sample significantly differed from the mean values reported in the validation studies (Siegel, 1956). ...
Article
Full-text available
Parents of children in need of care, such as those caring for chronically ill and disabled children, are exposed to significant stress associated with caregiving, placing them at risk for mental disorders. Resilience factors, as psychological resources, can help mitigate the negative effects of stress for both parents and their children, ultimately promoting resilient outcomes. However, little is known about the relationship between resilience factors and resilient outcomes in this highly stressor-exposed population. The aim of this study was to investigate the relationship between resilience factors and resilient outcomes in parents of children in need of care, thereby contributing to a better understanding of how these factors can influence parents’ quality of life. A sample of 202 German-speaking parents of children in need of care from a non-randomized controlled trial (ID: NCT05418205) completed measures assessing resilience-related outcomes, including indicators of mental distress, well-being, perceived stress, and the ability to recover from stressors. Using k-means cluster analysis, two clusters were identified, differentiating burdened and unburdened individuals based on their responses. Logistic regression was subsequently conducted to examine the predictive role of psychological resilience factors—self-efficacy, social support, optimism, internal locus of control, and family cohesion—in distinguishing between the two groups. Results from the logistic regression analysis revealed that self-efficacy, social support, optimism, and family cohesion were significant predictors of cluster membership. These findings contribute to the understanding of the influence of resilience factors on resilient outcomes in parents of children in need of care.
... Le test de normalité de Shapiro-Wilk a indiqué que les données des transects linéaires et des RECCE ne suivaient pas une loi Normale. Le test de Wilcoxon-Mann-Whitney (Wilcoxon, 1945 ;Mann et Whitney, 1947) a été utilisé pour comparer les données obtenues en fonction des saisons et des méthodes de collecte. ...
Article
Full-text available
RESUMÉ Seules quelques forêts résiduelles persistent au sud-est de la Côte d'Ivoire et les données sur leur diversité mammalienne sont peu documentées. Pour répondre à ce manque d'information, notre étude s'est concentrée sur la richesse spécifique, l'abondance relative et la distribution spatiale des moyens et grands mammifères dans trois forêts classées (Comoé 1, N'ganda N'ganda et Soumié), entre 2019 et 2021. Plusieurs méthodes ont été utilisées, dont celle des transects linéaires, de la marche de reconnaissance et du piégeage photographique. Au total, 17 espèces de moyens et grands mammifères ont pu être identifiées avec précision dans les trois sites. Des espèces comme Tragelaphus eurycerus et Nandinia binotata n'ont pu être observées qu'au sein de la forêt classée de Comoé 1 et Philantomba maxwellii était la seule espèce de céphalophes qui a été identifiée avec précision. La distribution spatiale saisonnière des espèces identifiées a montré que toute la superficie des sites était occupée. Par ailleurs, la raréfaction des espèces qui se traduit par des indices kilométriques d'abondance faibles (< 1 indice/km) doit être une sonnette d'alarme afin d'améliorer les mécanismes de préservation des forêts classées en Côte d'Ivoire. ABSTRACT Only a few residual forests persist in the southeast of Côte d'Ivoire and the mammal diversity is poorly documented. To respond to this lack of information, our study focused on the specific richness, relative abundance, and spatial distribution of medium and large mammals in three classified forests (Comoé 1, N'ganda N'ganda and Soumié), between 2019 and 2021. Several methods were used, including line transects, reconnaissance walking, and camera trapping. In total, 17 species of medium and large mammals were accurately D. M.-A. K. ZAUSA et al. / Int. J. Biol. Chem. Sci. 18(6): 2281-2299, 2024 2282 identified between the three sites. Species such as Tragelaphus eurycerus and Nandinia binotata could only be observed within Comoé 1 classified forest, and Philantomba maxwellii was the only duiker species that was accurately identified. The seasonal spatial distribution of identified species showed that those sites were entirely occupied. However, the low relative abundance (Kilometric Abundance Index < 1 index/km) of so-called common species indicates a rarefaction of species, which should be a call to improve the mechanisms for preserving classified forests in Côte d'Ivoire.
... Significant inter-group differences identified by the Kruskal-Wallis test (Kruskal and Wallis, 1952) were investigated using the two-sided Wilcoxon rank sum test (Wilcoxon, 1945). This test is nonparametric and evaluates the null hypothesis that two random variables have equal medians (α = 0.05). ...
Article
Full-text available
Watershed water quality models are mathematical tools used to simulate processes related to water, sediment, and nutrients. These models provide a framework that can be used to inform decision-making and the allocation of resources for watershed management. Therefore, it is critical to answer the question “when is a model good enough?” Established performance evaluation criteria, or thresholds for what is considered a ‘good’ model, provide common benchmarks against which model performance can be compared. Since the publication of prior meta-analyses on this topic, developments in the last decade necessitate further investigation, such as the advancement in high performance computing, the proliferation of aquatic sensors, and the development of machine learning algorithms. We surveyed the literature for quantitative model performance measures, including the Nash-Sutcliffe efficiency (NSE), with a particular focus on process-based models operating at fine temporal scales as their performance evaluation criteria are presently underdeveloped. The synthesis dataset was used to assess the influence of temporal resolution (sub-daily, daily, and monthly), calibration duration (< 3 years, 3 to 8 years, and > 8 years), and constituent target units (concentration, load, and yield) on model performance. The synthesis dataset includes 229 model applications, from which we use bootstrapping and personal modeling experience to establish sub-daily and daily performance evaluation criteria for flow, sediment, total nutrient, and dissolved nutrient models. For daily model evaluation, the NSE for sediment, total nutrient, and dissolved nutrient models should exceed 0.45, 0.30, and 0.35, respectively, for ‘satisfactory’ performance. Model performance generally improved when transitioning from short (< 3 years) to medium (3 to 8 years) calibration durations, but no additional gain was observed with longer (> 8 years) calibration. Performance was not significantly influenced by the selection of concentration (e.g. mg/L) or load (e.g. kg/s) as the target units for sediment or total nutrient models but was for dissolved nutrient models. We recommend the use of concentration rather than load as a water quality modeling target, as load may be biased by strong flow model performance whereas concentration provides a flow-independent measure of performance. Although the performance criteria developed herein are based on process-based models, they may be useful in assessing machine learning model performance and we demonstrate one such assessment on a recent deep learning model of daily nitrate prediction across the United States. The guidance presented here is intended to be used alongside, rather than to replace, the experience and modeling judgement of engineers and scientist who work to maintain our collective water resources.
... (R Core Team 2023). The Wilcoxon signed-rank (Wilcoxon 1945) (wilcox. test function, paired = TRUE) and the Kruskal-Wallis ranksum tests (Kruskal and Wallis 1952) (kruskal.test) ...
Article
Full-text available
Plant detritus is abundant in grasslands but decomposes slowly and is relatively nutrient‐poor, whereas animal carcasses are labile and nutrient‐rich. Recent studies have demonstrated that labile nutrients from carcasses can significantly alter the long‐term soil microbial function at an ecosystem scale. However, there is a paucity of knowledge on the functional and structural response and temporal scale of soil microbiomes beneath large herbivore carcasses. This study compared microbiome functions and structures of soil beneath Connochaetes taurinus (hereafter ‘wildebeest’) carcasses at various postmortem intervals of decomposition to matched control samples over 18 months. Microbial functions were compared by their community‐level physiological profiles determined by sole‐carbon substrate utilisation and structures by metagenomic sequences using 16S rRNA gene markers. Overall metabolism and metabolic diversity remained increased and functionally dissimilar to control soils throughout the experimental period, with successive sole‐carbon substrate utilisation observed. Conversely, diversity was initially reduced and structurally dissimilar from the control soil but recovered within the experimental period. The study contributes to the knowledge of carcass decomposition by investigating the long‐term soil microbiome dynamics resulting from large herbivore carcasses decomposing in a mesic grassland. Microbial functional succession and ecologically relevant bacterial biomarkers of soil beneath the decomposing carcasses were identified for various postmortem intervals.
... The conservative estimate from over land is used throughout the analysis. To identify significant differences between the model experiments, we use the Wilcoxon-Mann-Whitney test (Mann and Whitney, 1947;Wilcoxon, 1945;Wilks, 2006). The Wilcoxon-Mann-Whitney test ranks all of the data, before comparing the 230 sum of the ranks for the two distributions. ...
Preprint
Full-text available
Biogenic volatile organic compounds (BVOCs), such as isoprene, impact aerosols, ozone and methane, adding uncertainty to assessments of the climate impacts of land cover change. Recent UK Earth System model (UKESM) developments allow us to study how various processes impact biosphere-atmosphere interactions and their implications for atmospheric chemistry, while advances in remote sensing provide new opportunities for assessing biases in isoprene alongside formaldehyde and aerosol optical depth (AOD). The standard setup of UKESM1.1 underestimates the regional formaldehyde column by up to 80 % seasonally, despite positive isoprene biases of over 500 %. Seasonal average AOD values are underestimated by over 60 % in parts of the Northern Hemisphere but overestimated (>180 %) in the Congo. The effects of several processes are studied to understand their impacts on satellite-model biases. Of these, changing from the default to a more detailed chemistry mechanism has the greatest impact on the simulated trace gases. Here, the isoprene lifetime decreases by 50 %, the formaldehyde column increases by >20 %, whilst reductions in upper-tropospheric oxidants decrease sulphate nucleation (-32 %). Organically-mediated boundary layer nucleation and secondary organic aerosol formation from isoprene decrease AOD values in the Northern Hemisphere, while revised BVOC emission factors and land cover representation affect the emissions of BVOCs and dust. The combination of processes substantially affects regional model-satellite biases, typically decreasing isoprene and AOD and increasing formaldehyde. We find significant differences in the aerosol direct radiative effects (+0.17 W m-2), highlighting that these processes may have substantial ramifications for impact assessments of land use change.
... In this study, the fact that the pre-intervention and postintervention analyses were conducted with 12 participants without a control group made the choice of nonparametric analysis appropriate. The Wilcoxon signed-rank test is appropriate for testing the significance of the difference between scores on two sets of related measures (Wilcoxon, 1945). The Wilcoxon signed-rank test considers both the direction and the magnitude of the difference between the scores of two sets of related measures (Büyüköztürk, 2018). ...
Article
This pilot study evaluated the effectiveness of the WomenCan Cognitive Behavioral Therapy (CBT) program on depression, anxiety, and hope among Turkish women diagnosed with cancer. Cancer, particularly breast cancer, is a prevalent condition that significantly impacts psychological well-being of patients. The WomenCan CBT program was created to specifically address the unique psychosocial stressors women face, aiming to reduce symptoms of depression and anxiety while boosting levels of hope. The study used a quasi-experimental design with a pretest-posttest method, involving 12 participants who completed the program. Results indicated significant reductions in depression and anxiety levels, alongside notable increases in hope, particularly in the “Positive Readiness/Expectancy” dimension. The findings highlight the potential of culturally adapted CBT programs to enhance mental health and quality of life for women with cancer. This research provides valuable insights into the applicability of CBT interventions in nonWestern contexts and highlights the importance of culturally sensitive approaches to psychosocial care for patients with cancer.
... The Wilcoxon signed rank test calculates the difference between two related samples and then ranks them. A positive sign is associated with each rank if the first sample has a larger value, and a negative sign if the second sample had a larger value (Wilcoxon, 1992). The ranks obtained in this way are added up and then compared with the expected amount. ...
Article
Full-text available
In our research, we examined the profitability of companies switching to IFRS and the value judgement of investors in the two accounting systems. During the examination, we established that there is no significant difference in the ROS and ROA profitability indicators in the two accounting systems. It is important to note that in the case of both indicators, for companies with a high fixed asset requirement, there is a significant difference in the two accounting systems based on the results of the Wilcoxon rank sum test. Taking into account the number of elements of the clusters, their proportion, and the value of the effect size, in our opinion, the conclusion cannot be drawn for the entire basic population that the indicators significantly differed as a result of the transition, because the difference can only be observed in the cluster with a lower number of elements, or a particularly strong relationship cannot be revealed for any of the indicators. On the other hand, for the ROE indicator, a significant difference can be clearly established in the two accounting systems, as the significant relationship can be demonstrated both in companies with low and high capital requirements. Overall, in the IFRS, the companies showed more favourable profitability with regard to the ROE indicator. The second examination of our research is related to this, which aimed to determine whether the significant deviation of the ROE indicator in the year of the transition can be attributed to the transition to IFRS. JEL classification code: M40
... The Wilcoxon test (Wilcoxon, 1992) was used to evaluate the impact of the MOOC on the first area of digital competencies called information and information literacy, in order to determine significant differences within the same group. Such a test is designed for data related to before and after measures on the same subjects, assessing whether the differences between pairs of observations (pre and post) tend to be significantly different from zero. ...
Article
Full-text available
We live in an increasingly digitalized and competitive world, so the use of technological resources in continuing education is essential. The objective was to determine the impact of the MOOC based on the flipped classroom methodology on the level of information competence and information literacy in primary school teachers. The approach was quantitative, pre-experimental design. The sample consisted of 810 teachers from the Lambayeque region. After the MOOC application, it was found that the competency of navigation, search and filtering of information, data and digital content obtained the highest score, with 2.84 in the pretest and 3.67 in the posttest on a scale of 1 to 5. The conclusion is the need to train teachers so that they are able to respond effectively and efficiently to the new challenges of this digital society.
... Image metrics were calculated only within the external mask of the planning CT mask. Wilcoxon rank-sum test was used for statistical analysis (42). To qualitatively evaluate the sCT images, two certified radiation oncologists from authors' institution rated the images using a 5-grade scale. ...
Article
Full-text available
Purpose Recent deep-learning based synthetic computed tomography (sCT) generation using magnetic resonance (MR) images have shown promising results. However, generating sCT for the abdominal region poses challenges due to the patient motion, including respiration and peristalsis. To address these challenges, this study investigated an unsupervised learning approach using a transformer-based cycle-GAN with structure-preserving loss for abdominal cancer patients. Method A total of 120 T2 MR images scanned by 1.5 T Unity MR-Linac and their corresponding CT images for abdominal cancer patient were collected. Patient data were aligned using rigid registration. The study employed a cycle-GAN architecture, incorporating the modified Swin-UNETR as a generator. Modality-independent neighborhood descriptor (MIND) loss was used for geometric consistency. Image quality was compared between sCT and planning CT, using metrics including mean absolute error (MAE), peak signal-to-noise ratio (PSNR), structure similarity index measure (SSIM) and Kullback-Leibler (KL) divergence. Dosimetric evaluation was evaluated between sCT and planning CT, using gamma analysis and relative dose volume histogram differences for each organ-at-risks, utilizing treatment plan. A comparison study was conducted between original, Swin-UNETR-only, MIND-only, and proposed cycle-GAN. Results The MAE, PSNR, SSIM and KL divergence of original cycle-GAN and proposed method were 86.1 HU, 26.48 dB, 0.828, 0.448 and 79.52 HU, 27.05 dB, 0.845, 0.230, respectively. The MAE and PSNR were statistically significant. The global gamma passing rates of the proposed method at 1%/1 mm, 2%/2 mm, and 3%/3 mm were 86.1 ± 5.9%, 97.1 ± 2.7%, and 98.9 ± 1.0%, respectively. Conclusion The proposed method significantly improves image metric of sCT for the abdomen patients than original cycle-GAN. Local gamma analysis was slightly higher for proposed method. This study showed the improvement of sCT using transformer and structure preserving loss even with the complex anatomy of the abdomen.
... The middle row contains the number of wins, ties and losses, where "win" means, that SAAI achieved a higher accuracy than the respective competitor in one experiment. The bottom row shows the p-value of the Wilcoxon signed rank test (Wilcoxon 1945), which is a non-parametric test used to compare paired samples, without assuming normal distribution of the data. The tested null hypothesis (H 0 ) is, that the distribution of differences between the paired observations are symmetric around zero. ...
Conference Paper
Full-text available
Detecting and classifying abnormal system states is critical for condition monitoring, but supervised methods often fall short due to the rarity of anomalies and the lack of labeled data. Therefore, clustering is often used to group similar abnormal behavior. However, evaluating cluster quality without ground truth is challenging, as existing measures such as the Silhouette Score (SSC) only evaluate the cohesion and separation of clusters and ignore possible prior knowledge about the data. To address this challenge, we introduce the Synchronized Anomaly Agreement Index (SAAI), which exploits the synchronicity of anomalies across multivariate time series to assess cluster quality. We demonstrate the effectiveness of SAAI by showing that maximizing SAAI improves accuracy on the task of finding the true number of anomaly classes K in correlated time series by 0.23 compared to SSC and by 0.32 compared to X-Means. We also show that clusters obtained by maximizing SAAI are easier to interpret compared to SSC.
... For water temperatures, data were available only for the three NSW rivers, with varying ranges for both downstream and upstream sections: Gwydir (2013Gwydir ( -2022, Peel (2011Peel ( -2022 and Severn (2002Severn ( -2022 (Table 1). To evaluate differences between upstream and downstream river sections, for each of the four seasons, we used the Wilcoxon rank-sum test, appropriate for two independent samples that do not follow a normal distribution (Wilcoxon 1945). To evaluate seasonal variations in discharge within each river section (downstream or upstream), we used the Kruskal-Wallis test, an extension of the Wilcoxon rank-sum test for more than two groups, which similarly does not assume a normal distribution of the data (Kruskal and Wallis 1952). ...
Article
Full-text available
Context River regulation affect freshwater species by disrupting the natural flow regime and connectivity. Aims Investigate the impact of river regulation on platypus populations on four regulated rivers within the northern Murray–Darling Basin. Methods Assessment of hydrology, live trapping downstream of large dams, multi-species environmental DNA surveys in upstream and downstream sections. Key results There were significant changes in flow seasonality and cold-water pollution as a result of river regulation. Upstream sections experienced prolonged periods of ceased flow, most recently during an extreme drought between 2017 and 2020. eDNA surveys detected platypuses downstream of all dams but failed to find evidence of them upstream in two rivers, indicating possible local extinctions. Capture of four platypuses in the Severn River and four, in very poor condition, in the Peel River, and none in the Gwydir River or Pike Creek–Dumaresq River. Significant differences in macroinvertebrate communities, implying possible impacts on platypus diet. Conclusions River regulation and habitat fragmentation affect platypus populations, namely disappearance from upstream sections, low downstream capture rates and the poor body condition. Implications Urgent need for catchment-scale river management strategies that preserve ecological functions and connectivity and improve resilience to protect and sustain platypus populations, indicating directions for future research and conservation efforts.
... Following the recommendations of Demšar [44], we applied the Friedman test VOLUME 11, 2023 [45] to assess and reject the null hypothesis with at least 95% confidence. After establishing a statistically significant difference in the ML model's performance, we proceeded with the pairwise posthoc analysis suggested by Benavoli et al. [46], using the Wilcoxon signed-rank test [47] along with Holm's alpha correction [48], [49]. In these analyses, a lower rank (positioned further to the left) indicates better performance of a model relative to the others, based on the gap metric. ...
Article
Full-text available
The quadratic multiple knapsack problem (QMKP) is a well-studied problem in operations research. This problem involves selecting a subset of items that maximizes the linear and quadratic profit without exceeding a set of capacities for each knapsack. While its solution using metaheuristics has been explored, exact approaches have recently been investigated. One way to improve the performance of these exact approaches is by reducing the solution space in different instances, considering the properties of the items in the context of QMKP. In this paper, machine learning (ML) models are employed to support an exact optimization solver by predicting the inclusion of items with a certain level of confidence and classifying them. This approach reduces the solution space for exact solvers, allowing them to tackle more manageable problems. The methodological process is detailed, in which ML models are generated and the best one is selected to be used as a preprocessing approach. Finally, we conduct comparison experiments, demonstrating that using a ML model is highly beneficial for reducing computing times and achieving rapid convergence.
... The non-parametric, paired Wilcoxon signed-rank test (WSR test) 73 is used to test if OBS IMO (3-year time series) is significantly different from NNE IMO (3-year time series), evaluating the significance of the aerosol signal due to IMO2020. The WSR test is selected for this task because (i) it does not assume an underlying probability distribution of the target variables, and (ii) it includes temporal information in the test (i.e., a paired test), which fits better in the context of aerosol-cloud interactions given that cloud susceptibility to aerosol perturbations varies seasonally in response to the seasonality in large-scale meteorological conditions 33,54 . ...
Article
Full-text available
Reduction in aerosol cooling unmasks greenhouse gas warming, exacerbating the rate of future warming. The strict sulfur regulation on shipping fuel implemented in 2020 (IMO2020) presents an opportunity to assess the potential impacts of such emission regulations and the detectability of deliberate aerosol perturbations for climate intervention. Here we employ machine learning to capture cloud natural variability and estimate a radiative forcing of +0.074 ±0.005 W m⁻² related to IMO2020 associated with changes in shortwave cloud radiative effect over three low-cloud regions where shipping routes prevail. We find low detectability of the cloud radiative effect of this event, attributed to strong natural variability in cloud albedo and cloud cover. Regionally, detectability is higher for the southeastern Atlantic stratocumulus deck. These results raise concerns that future reductions in aerosol emissions will accelerate warming and that proposed deliberate aerosol perturbations such as marine cloud brightening will need to be substantial in order to overcome the low detectability.
... The middle row contains the number of wins, ties and losses, where "win" means, that SAAI achieved a higher accuracy than the respective competitor in one experiment. The bottom row shows the p-value of the Wilcoxon signed rank test (Wilcoxon 1945), which is a non-parametric test used to compare paired samples, without assuming normal distribution of the data. The tested null hypothesis (H 0 ) is, that the distribution of differences between the paired observations are symmetric around zero. ...
Preprint
Full-text available
Detecting and classifying abnormal system states is critical for condition monitoring, but supervised methods often fall short due to the rarity of anomalies and the lack of labeled data. Therefore, clustering is often used to group similar abnormal behavior. However, evaluating cluster quality without ground truth is challenging, as existing measures such as the Silhouette Score (SSC) only evaluate the cohesion and separation of clusters and ignore possible prior knowledge about the data. To address this challenge, we introduce the Synchronized Anomaly Agreement Index (SAAI), which exploits the synchronicity of anomalies across multivariate time series to assess cluster quality. We demonstrate the effectiveness of SAAI by showing that maximizing SAAI improves accuracy on the task of finding the true number of anomaly classes K in correlated time series by 0.23 compared to SSC and by 0.32 compared to X-Means. We also show that clusters obtained by maximizing SAAI are easier to interpret compared to SSC.
... 1. To determine whether the net Sharpe ratio achieved by the network momentum model is significantly higher than that achieved by the MACD model when both are used to construct portfolios from the same price data set. We employ a one-sided Wilcoxon signed-rank test [55], a matched-pair test, to assess if the difference in net Sharpe ratios (network momentum model minus MACD model) is significantly greater than 0. 2. To examine whether the distributions of the net Sharpe ratios from the MACD model and a network momentum model are statistically different without considering the matched-pair nature of the data. We use the one-sided Kolmogorov-Smirnov test [56] to determine if the cumulative distribution function of the MACD model's net Sharpe ratios is stochastically greater than that of the network momentum model, it indicates that the MACD model generally yields lower Sharpe ratios than the network momentum model. ...
Preprint
Full-text available
We present a systematic, trend-following strategy, applied to commodity futures markets, that combines univariate trend indicators with cross-sectional trend indicators that capture so-called {\em momentum spillover}, which can occur when there is a lead-lag relationship between the trending behaviour of different markets. Our strategy utilises two methods for detecting lead-lag relationships, with a method for computing {\em network momentum}, to produce a novel trend-following indicator. We use our new trend indicator to construct a portfolio whose performance we compare to a baseline model which uses only univariate indicators, and demonstrate statistically significant improvements in Sharpe ratio, skewness of returns, and downside performance, using synthetic bootstrapped data samples taken from time-series of actual prices.
... Analysis & Results. We conducted a Shapiro-Wilk test [38] to assess normality and applied the Wilcoxon Signed Rank test [46] to evaluate the significance of all the results. The results showed that our method outperformed the baseline in all aspects (Table 3) and showed a statistical significance regarding V/A Ranking, V/A Error, Consistency, and Smoothness ( Figure 6). ...
Preprint
Recent research shows that emotions can enhance users' cognition and influence information communication. While research on visual emotion analysis is extensive, limited work has been done on helping users generate emotionally rich image content. Existing work on emotional image generation relies on discrete emotion categories, making it challenging to capture complex and subtle emotional nuances accurately. Additionally, these methods struggle to control the specific content of generated images based on text prompts. In this work, we introduce the new task of continuous emotional image content generation (C-EICG) and present EmotiCrafter, an emotional image generation model that generates images based on text prompts and Valence-Arousal values. Specifically, we propose a novel emotion-embedding mapping network that embeds Valence-Arousal values into textual features, enabling the capture of specific emotions in alignment with intended input prompts. Additionally, we introduce a loss function to enhance emotion expression. The experimental results show that our method effectively generates images representing specific emotions with the desired content and outperforms existing techniques.
... To statistically evaluate the wrist-worn accelerometer-based movement onset detection procedure, we assessed whether the 33 body landmarks indicated movement onset at that time (i.e., the RMS at 0 ms minus the RMS of the previous sample). The magnitude of acceleration of each body part was calculated and evaluated at the single-subject level by performing Wilcoxon signed-rank tests against zero (Wilcoxon, 1945). A correction for multiple comparisons was applied across body parts within participants. ...
Preprint
Full-text available
Advances in wireless electroencephalography (EEG) technology promise to record brain-electrical activity in everyday situations. To better understand the relationship between brain activity and natural behavior, it is necessary to monitor human movement patterns. Here, we present a pocketable setup consisting of two smartphones to simultaneously capture human posture and EEG signals. We asked 26 basketball players to shoot 120 free throws each. First, we investigated whether our setup allows us to capture the readiness potential (RP) that precedes voluntary actions. Second, we investigated whether the RP differs between successful and unsuccessful free-throw attempts. The results confirmed the presence of the RP, but the amplitude of the RP was not related to shooting success. However, offline analysis of real-time human pose signals derived from a smartphone camera revealed pose differences between successful and unsuccessful shots for some individuals. We conclude that a highly portable, low-cost and lightweight acquisition setup, consisting of two smartphones and a head-mounted wireless EEG amplifier, is sufficient to monitor complex human movement patterns and associated brain dynamics outside the laboratory.
... The Wilcoxon test ranks the differences in paired data and evaluates whether the median difference is significantly different from zero. This approach is well-documented in statistical literature [34,35]. ...
Article
Full-text available
Accurately estimating crop cultivation areas is critical for predicting yields and managing overproduction, particularly for staple crops grown in regions like Jeju Island, South Korea, where reporting cultivation areas is mandatory. This study developed a modified U-Net architecture for semantic segmentation, utilizing UAV-based high-resolution imagery in the open-source NIA AI HUB dataset. The dataset includes labeled RGB images of six winter crops—white radish, cabbage, onion, garlic, broccoli, and carrot—grown on Jeju Island, a key agricultural hub. The proposed model incorporates a ResNet-34 backbone, Attention Gates, and Residual Modules, achieving a mean F1 score of 85.4% and an intersection over union (IoU) of 74.6%, outperforming the original U-Net. This advancement significantly reduces misclassifications among visually similar crops, such as garlic and onion. Application to three unknown fields demonstrated an average prediction accuracy of 90.2%, effectively estimating cultivation areas with high precision. By leveraging public datasets and innovative AI techniques, this study highlights the scalability and practicality of the proposed model in enhancing precision agriculture. These findings demonstrate the model’s potential to improve crop yield prediction, optimize resource allocation, and support sustainable farming practices in diverse agricultural environments.
... Moreover, the proposed MROCARO is compared with variants of ARO like LARO [75], dFDBARO [112], and ADFBARO [113] over the classical test functions. Additionally, to demonstrate that a new algorithm MROCARO significantly improves upon existing algorithms, the nonparametric statistical tests such as Wilcoxon rank-sum test [114] and Friedman rank test [115,116] were carried out. The analysis procedures have been conducted on Windows 11, 11th Gen Intel(R) Core (TM) 2.42 GHz CPU, 16 GB RAM, and MATLAB R2021a. ...
Article
Full-text available
The Artificial rabbit optimization (ARO) algorithm replicates the survival skills of rabbits in the wild. However, like other metaheuristic approaches it possesses significant drawbacks in solving challenging problems, including sluggish convergence rate, poor exploration ability and trapped in local optima region. To alleviate these shortcomings, a novel strategy, namely Modified Random Opposition (MRO) and ten chaotic maps are integrated with ARO, termed as MROCARO. This implementation MRO technique boost the population diversity and permits the population to escape from local optima while integration of chaotic map enhances the exploitation capability. To estimate the effectiveness of the MROCARO method, the well-known CEC2005, CEC2017, CEC2019 and CEC2008lsgo test functions are considered. Moreover, non-parametric tests that include the Wilcoxon rank-sum and Friedman rank test are performed to analyze the significant difference among the compared algorithms. Furthermore, the efficiency of the MROCARO algorithm has been evaluated on various structural problems and optimal sizing of renewable energy systems. The experimental findings demonstrate that MROCARO performed optimum solution with 100% renewable sources with the lowest levelized cost of electricity of 0.0934 $/kWh as compared to other methods. Also, the simulation findings reveal that MROCARO has immense potential for addressing global optimization and structural problems as contrasted to other competing algorithms.
... For categorical variables, either Pearson's Chi-squared test or Fisher's exact test of independence was chosen, while for non-parametric continuous variables, the independent Wilcoxon rank-sum test was used. [56,93] Univariate and multivariate logistic regression with Firth's correction was used to identify variables independently predictive of central nervous system tumors. To determine significance, all tests used an alpha level of 0.05 and were two-tailed. ...
Article
Full-text available
Background One avenue to improve outcomes among brain tumor patients involves the mitigation of healthcare disparities. Investigating clinical differences among brain tumors across socioeconomic and demographic strata, such can aid in healthcare disparity identification and, by extension, outcome improvement. Methods Utilizing a racially diverse population from Hawaii, 323 cases of brain tumors (meningiomas, gliomas, schwannomas, pituitary adenomas, and metastases) were matched by age, sex, and race to 651 controls to investigate the associations between tumor type and various demographic, socioeconomic, and medical comorbidities. Tumor size at the time of diagnosis was also compared across demographic groups. Results At the time of diagnosis for benign meningiomas, Native Hawaiians and Pacific Islanders (NHPI; P < 0.05), Asians, and Hispanics exhibited nearly two-fold larger tumor volumes than Whites. For gliomas, NHPI similarly presented with larger tumor volumes relative to Whites ( P = 0.04) and Asians ( P = 0.02), while for vestibular schwannomas, NHPI had larger tumor sizes compared to Asians ( P < 0.05). Benign meningiomas demonstrated greater odds of diagnosis ( P < 0.05) among Native American or Alaskan Natives, patients comorbid with obesity class I, hypertension, or with a positive Alcohol Use Disorders Identification Test-Consumption (AUDIT-C). Malignant meningiomas demonstrated greater odds ( P < 0.05) among patients from higher median household income and urban geography. Gliomas overall exhibited increased odds ( P < 0.05) of diagnosis among Whites and reduced odds among Asians, with greater comorbidity with obesity class III; for glioblastoma specifically, there were reduced odds of asthma diagnosis. Patients with vestibular schwannomas were at increased odds ( P < 0.05) of being from the highest income quartile and having a positive AUDIT-C, yet reduced odds of psychiatric disorders. Pituitary adenomas exhibited reduced odds of diagnosis among Whites, yet greater odds among NHPI, military personnel, obesity class I, and psychiatric disorders. Intracranial metastases were more common in patients with pre-obesity, asthma, a positive AUDIT-C, and living in more affluent regions. Benign meningiomas are most often presented with seizures, while malignant meningiomas have the addition of cognitive difficulty. Gliomas often present with seizures, cognitive difficulty, dizziness/nausea/vomiting (DNV), vestibular schwannomas with DNV, and metastases with seizures. Conclusion Brain tumors exhibit unique sociodemographic disparities and clinical comorbidities, which may have implications for diagnosis, treatment, and healthcare policy.
Preprint
Full-text available
Establishing the reproducibility of radiomic signatures is a critical step in the path to clinical adoption of quantitative imaging biomarkers; however, radiomic signatures must also be meaningfully related to an outcome of clinical importance to be of value for personalized medicine. In this study, we analyze both the reproducibility and prognostic value of radiomic features extracted from the liver parenchyma and largest liver metastases in contrast enhanced CT scans of patients with colorectal liver metastases (CRLM). A prospective cohort of 81 patients from two major US cancer centers was used to establish the reproducibility of radiomic features extracted from images reconstructed with different slice thicknesses. A publicly available, single-center cohort of 197 preoperative scans from patients who underwent hepatic resection for treatment of CRLM was used to evaluate the prognostic value of features and models to predict overall survival. A standard set of 93 features was extracted from all images, with a set of eight different extractor settings. The feature extraction settings producing the most reproducible, as well as the most prognostically discriminative feature values were highly dependent on both the region of interest and the specific feature in question. While the best overall predictive model was produced using features extracted with a particular setting, without accounting for reproducibility, (C-index = 0.630 (0.603--0.649)) an equivalent-performing model (C-index = 0.629 (0.605--0.645)) was produced by pooling features from all extraction settings, and thresholding features with low reproducibility (CCC0.85\mathrm{CCC} \geq 0.85), prior to feature selection. Our findings support a data-driven approach to feature extraction and selection, preferring the inclusion of many features, and narrowing feature selection based on reproducibility when relevant data is available.
Article
A pull request (PR) is an event in Git where a contributor asks project maintainers to review code he/she wants to merge into a project. The PR mechanism greatly improves the efficiency of distributed software development in the open-source community. Nevertheless, the massive number of PRs in an open-source software (OSS) project increases the workload of developers. To reduce the burden on developers, many previous studies have investigated factors that affect the chance of PRs getting accepted and built prediction models based on these factors. However, most prediction models are built on the data after PRs are submitted for a while (e.g., comments on PRs), making them not useful in practice. Because integrators still need to spend a large amount of effort on inspecting PRs. In this study, we propose an approach named E-PRedictor (earlier PR predictor) to predict whether a PR will be merged when it is created. E-PRedictor combines three dimensions of manual statistic features (i.e., contributor profile, specific pull request, and project profile) and deep semantic features generated by BERT models based on the description and code changes of PRs. To evaluate the performance of E-PRedictor, we collect 475192 PRs from 49 popular open-source projects on GitHub. The experiment results show that our proposed approach can effectively predict whether a PR will be merged or not. E-PRedictor outperforms the baseline models (e.g., Random Forest and VDCNN) built on manual features significantly. In terms of F1@Merge, F1@Reject, and AUC (area under the receiver operating characteristic curve), the performance of E-PRedictor is 90.1%, 60.5%, and 85.4%, respectively.
Article
Full-text available
In clinical movement biomechanics, kinematic measurements are collected to characterise the motion of articulating joints and investigate how different factors influence movement patterns. Representative time-series signals are calculated to encapsulate (complex and multidimensional) kinematic datasets succinctly. Exacerbated by numerous difficulties to consistently define joint coordinate frames, the influence of local frame orientation and position on the characteristics of the resultant kinematic signals has been previously proven to be a major limitation. Consequently, for consistent interpretation of joint motion (especially direct comparison) to be possible, differences in local frame position and orientation must first be addressed. Here, building on previous work that introduced a frame orientation optimisation method and demonstrated its potential to induce convergence towards a consistent kinematic signal, we present the REference FRame Alignment MEthod (REFRAME) that addresses both rotational and translational kinematics, is validated here for a healthy tibiofemoral joint, and allows flexible selection of optimisation criteria to fittingly address specific research questions. While not claiming to improve the accuracy of joint kinematics or reference frame axes, REFRAME does enable a representation of knee kinematic signals that accounts for differences in local frames (regardless of how these differences were introduced, e.g. anatomical heterogeneity, use of different data capture modalities or joint axis approaches, intra- and inter-rater reliability, etc.), as evidenced by peak root-mean-square errors of 0.24° ± 0.17° and 0.03 mm ± 0.01 mm after its implementation. By using a self-contained optimisation approach to systematically re-align the position and orientation of reference frames, REFRAME allows researchers to better assess whether two kinematic signals represent fundamentally similar or different underlying knee motion. The openly available implementation of REFRAME could therefore allow the consistent interpretation and comparison of knee kinematic signals across trials, subjects, examiners, or even research institutes.
Preprint
Full-text available
Effective fall risk assessment is critical for post-stroke patients. The present study proposes a novel, data-informed fall risk assessment method based on the instrumented Timed Up and Go (ITUG) test data, bringing in many mobility measures that traditional clinical scales fail to capture. IFRA, which stands for Instrumented Fall Risk Assessment, has been developed using a two-step process: first, features with the highest predictive power among those collected in a ITUG test have been identified using machine learning techniques; then, a strategy is proposed to stratify patients into low, medium, or high-risk strata. The dataset used in our analysis consists of 142 participants, out of which 93 were used for training (15 synthetically generated), 17 for validation and 32 to test the resulting IFRA scale (22 non-fallers and 10 fallers). Features considered in the IFRA scale include gait speed, vertical acceleration during sit-to-walk transition, and turning angular velocity, which align well with established literature on the risk of fall in neurological patients. In a comparison with traditional clinical scales such as the traditional Timed Up & Go and the Mini-BESTest, IFRA demonstrates competitive performance, being the only scale to correctly assign more than half of the fallers to the high-risk stratum (Fischer's Exact test p = 0.004). Despite the dataset's limited size, this is the first proof-of-concept study to pave the way for future evidence regarding the use of IFRA tool for continuous patient monitoring and fall prevention both in clinical stroke rehabilitation and at home post-discharge.
Article
Establishing the reproducibility of radiomic signatures is a critical step in the path to clinical adoption of quantitative imaging biomarkers; however, radiomic signatures must also be meaningfully related to an outcome of clinical importance to be of value for per- sonalized medicine. In this study, we analyze both the reproducibility and prognostic value of radiomic features extracted from the liver parenchyma and largest liver metastases in contrast enhanced CT scans of patients with colorectal liver metastases (CRLM). A prospective cohort of 81 patients from two major US cancer centers was used to establish the reproducibility of radiomic features extracted from images reconstructed with different slice thicknesses. A publicly available, single-center cohort of 197 preoperative scans from patients who underwent hepatic resection for treatment of CRLM was used to evaluate the prognostic value of features and models to predict overall survival. A standard set of 93 features was extracted from all images using pyradiomics, with a set of eight different extractor settings. Our results show that the feature extraction settings producing the most reproducible, as well as the most prognostically discriminative feature values are highly dependent on both the region of interest and the specific feature in question. While the best overall predictive model was produced using features extracted with a particular setting, without accounting for reproducibility, (C-index = 0.630 (0.603–0.649)) an equivalent-performing model (C-index = 0.629 (0.605–0.645)) was produced by pooling features from all extraction settings, and thresholding features with low reproducibility (CCC ≥ 0.85), prior to feature selection. Our findings support a data-driven approach to feature extraction and selection, preferring the inclusion of many features, and narrowing feature selection based on feature reproducibility when relevant reproducibility data is available. Further research is needed to determine how to select reproducible feature sets when reproducibility data is not available.
Article
Chickens are one of the most economically important poultry species, and their egg-laying performance is a crucial economic trait. The intestinal microbiome plays a significant role in the egg-laying performance. To clarify the diversity of chicken intestinal microbiota and its connection to egg-laying performance, this study utilized 16S rRNA sequencing technology to characterize the intestinal microbiomes of 101 chickens from 13 breeds with varying levels of egg production. The results reveal significant differences in gut microbiota structure among chicken groups with varying egg production levels. High egg-producing chickens showed significantly higher abundances of Firmicutes , Proteobacteria , and Lactobacillus , while low egg-producing chickens displayed greater microbial α-diversity and more complex community structures. These differences in gut microbiota influence key physiological functions, including nutrient absorption and hormone regulation through metabolic pathways, and directly affect egg production performance. The low and medium production groups partially overlapped on the principal coordinates analysis plot, whereas the high-production group was distinctly separate. This study provides a scientific basis and intestinal microbiome data for selecting probiotics related to high egg production in chickens. IMPORTANCE This study elucidates the critical role of gut microbiota in the egg-laying performance of chickens, a key economic indicator in the poultry industry. By employing 16S rRNA sequencing, we uncovered distinct microbial profiles associated with varying levels of egg production. High egg-producing chickens exhibit a higher abundance of specific bacterial taxa, such as Firmicutes and Proteobacteria , which are linked to enhanced nutrient absorption and metabolic efficiency. Conversely, lower and medium egg-producing chickens display greater microbial diversity, suggesting a more complex but less efficient gut ecosystem. Our findings provide valuable insights into the relationship between gut microbiota and egg production, offering a scientific foundation for the selection of probiotics that could potentially improve the egg-laying performance of chickens. This research not only advances our understanding of avian gut microbiology but also has practical implications for optimizing poultry farming practices and enhancing economic outcomes.
Preprint
Full-text available
With the rapid development of Artificial Neural Network based visual models, many studies have shown that these models show unprecedented potence in predicting neural responses to images in visual cortex. Lately, advances in computer vision have introduced self-supervised models, where a model is trained using supervision from natural properties of the training set. This has led to examination of their neural prediction performance, which revealed better prediction of self-supervised than supervised models for models trained with language supervision or with image-only supervision. In this work, we delve deeper into the models ability to explain neural representations of object categories. We compare models that differed in their training objectives to examine where they diverge in their ability to predict fMRI and MEG recordings while participants are presented with images of different object categories. Results from both fMRI and MEG show that self-supervision was advantageous in comparison to classification training. In addition, language supervision is a better predictor for later stages of visual perception, while image-only supervision shows a consistent advantage over a longer duration, beginning from 80ms after exposure. Examination of the effect of data size training revealed that large dataset did not necessarily improve neural predictions, in particular in visual self-supervised models. Finally, examination of the correspondence of the hierarchy of each model to visual cortex showed that image-only self-supervision led to better correspondence than image only models. We conclude that while self-supervision shows consistently better prediction of fMRI and MEG recordings, each type of supervision reveals a different property of neural activity, with language-supervision explaining later onsets, while image-only self-supervision explains long and very early latencies of the neural response, with the model hierarchy naturally sharing corresponding hierarchical structure as the brain.
Article
Full-text available
Scapular morphological attributes show promise as prognostic indicators of retear following rotator cuff repair. Current evaluation techniques using single-slice magnetic-resonance imaging (MRI) are, however, prone to error, while more accurate computed tomography (CT)-based three-dimensional techniques, are limited by cost and radiation exposure. In this study we propose deep learning-based methods that enable automatic scapular morphological analysis from diagnostic MRI despite the anisotropic resolution and reduced field of view, compared to CT. A deep learning-based segmentation network was trained with paired CT derived scapula segmentations. An algorithm to fuse multi-plane segmentations was developed to generated high-resolution 3D models of the scapula on which morphological landmark- and axes were predicted using a second deep learning network for morphological analysis. Using the proposed methods, the critical shoulder angle, glenoid inclination and version were measured from MRI with accuracies of -1.3 ± 1.7 degrees, 1.3 ± 2.1 degree, and − 1.4 ± 3.4 degrees respectively, compared to CT. Inter-class correlation between MRI and CT derived metrics were substantial for the glenoid version and almost perfect for the other metrics. This study demonstrates how deep learning can overcome reduced resolution, bone border contrast and field of view, to enable 3D scapular morphology analysis on MRI.
Article
Full-text available
The exponential growth of scientific articles has presented challenges in information organization and extraction. Automation is urgently needed to streamline literature reviews and enhance insight extraction. We explore the potential of Large Language Models (LLMs) in key-insights extraction from scientific articles, including OpenAI’s GPT-4.0, MistralAI’s Mixtral 8 × 7B, 01AI’s Yi, and InternLM’s InternLM2. We have developed an article-level key-insight extraction system based on LLMs, calling it ArticleLLM. After evaluating the LLMs against manual benchmarks, we have enhanced their performance through fine-tuning. We propose a multi-actor LLM approach, merging the strengths of multiple fine-tuned LLMs to improve overall key-insight extraction performance. This work demonstrates not only the feasibility of LLMs in key-insight extraction, but also the effectiveness of cooperation of multiple fine-tuned LLMs, leading to efficient academic literature survey and knowledge discovery.
Article
Background Robot‐assisted vitreoretinal surgery makes it easier for the surgeons to perform precise and dexterous manipulations required in vitreoretinal procedures. Methods We systematically evaluated manual surgery, conventional two‐hand teleoperation, a novel one‐hand teleoperation, and automation in a needle positioning task using a realistic surgical eye model, measuring the expert surgeon's performances and the novice's learning curves. Results The proposed one‐hand teleoperation improved the positioning accuracy of expert surgeons , enabled novices to achieve a consistent accuracy more quickly , decreased the novice's workload more quickly , and made it easier for novices to learn to conduct the task quickly . Moreover, our autonomous positioning achieved an equivalent accuracy to the surgeons. Conclusions The benefits and potential of task autonomy were shown. Further work is needed to evaluate the proposed methods in a more complex task.
Article
Full-text available
Software defect prediction (SDP) models rely on various software metrics and defect data to identify potential defects in new software modules. However, the performance of these predictive models can be negatively impacted by irrelevant, redundant metrics and the imbalanced nature of defect datasets. Additionally, the previous studies mainly use conventional machine learning (ML) techniques, but their predictive performance is not superior enough. Addressing these issues is crucial to improve the accuracy and effectiveness of SDP models. This study presents a novel approach to SDP using a multi-filter wrapper feature selection technique (MFWFS). To identify a subset of relevant and informative features, we leverage the combination of filter techniques—Information gain (IG), Chi-square (CS), and Relief-F (RF) method, and a wrapper technique—Opposition-Based Whale Optimization Algorithm (OBWOA). One-dimensional-Convolutional Neural Network (CNN) with an attention mechanism is employed to enhance the classification performance of the predictive model by efficiently integrating the selected characteristics into abstract deep semantic features. We undertake experiments on seventeen open-source software datasets on four performance measures—AUC, G-mean, F-measure, and MCC and compare the obtained results with existing state-of-the-art ML and hybrid algorithms. The experimental findings demonstrate the greater efficiency of our approach, highlighting the usefulness of the multi-filter wrapper feature selection technique and 1D-CNN with attention to SDP.
ResearchGate has not been able to resolve any references for this publication.