Article

# Statistical Methods for Research Workers

Authors:
To read the full-text of this research, you can request a copy directly from the author.

## No full-text available

... For the four enrichment analyses, contingency tables were constructed counting amino acid sites with the different classifications (See Supplementary Methods: OR tests for enrichment for further details) and the OR were calculated using two-tailed and one-tailed Fisher's exact tests [70] for obtaining the corresponding P-values and 95% confidence intervals (CI 95%). It is worth clarifying that when counting residues we did not request exclusivity in the intersections, i.e. a residue with a given CCRpct can intersect with being in DOMAIN, DOSORDER_MOBILE and DNA-RNA_BIND and hence will contribute to the cells in the three corresponding contingency tables. ...
Article
Full-text available
Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein-protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder-order transitions upon binding with other protein partners and liquid-liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.
... Next, the number of datasets in which a gene is detected as DE would be considered as a gene's vote. The direct data merging approach combines all datasets into one single dataset and uses this dataset corresponding p-values for each input set for a specific pathway, the p-values will be combined using Fisher's [18] or Stouffer's method. ...
Article
Full-text available
Pathway analysis has been widely used to detect pathways and functions associated with complex disease phenotypes. The proliferation of this approach is due to better interpretability of its results and its higher statistical power compared with the gene-level statistics. A plethora of pathway analysis methods that utilize multi-omics setup, rather than just transcriptomics or proteomics, have recently been developed to discover novel pathways and biomarkers. Since multi-omics gives multiple views into the same problem, different approaches are employed in aggregating these views into a comprehensive biological context. As a result, a variety of novel hypotheses regarding disease ideation and treatment targets can be formulated. In this article, we review 32 such pathway analysis methods developed for multi-omics and multi-cohort data. We discuss their availability and implementation, assumptions, supported omics types and databases, pathway analysis techniques and integration strategies. A comprehensive assessment of each method's practicality, and a thorough discussion of the strengths and drawbacks of each technique will be provided. The main objective of this survey is to provide a thorough examination of existing methods to assist potential users and researchers in selecting suitable tools for their data and analysis purposes, while highlighting outstanding challenges in the field that remain to be addressed for future development.
... The data obtained were subjected to analysis of variance technique [96] for comparing the difference among the elevational ranges with respect to the characteristics under study. A multiple comparison test, i.e., the least significant difference (lsd), was performed for pair wise comparison of elevational ranges. ...
Article
Full-text available
Juniperus macropoda is the only tree species of a cold desert ecosystem that is experiencing high anthropogenic pressure and has a poor regeneration status due to harsh environmental conditions. Due to the limited distribution of Juniperus macropoda in this region, the species have remained largely unexplored in terms of understanding the distribution pattern along the elevation and soil fertility gradients. Therefore, the current research was carried out along the elevational gradient, starting from the base line at 3000 m above sea level (m asl) asl with an elevational plot distance of 180 m. The study revealed that the average density of J. macropoda declined gradually from the first elevation range, i.e., 3000–3180 m asl onward, and extended up to the elevation range of 3900–4080 m asl. However, the average seedling and sapling densities were highest at mid-elevation and extended up to an elevation range of 4080–4260 m asl. The J. macropoda population formed a reverse J-shaped structure only up to 3540–3720 m asl. The maximum total biomass and carbon density were recorded in the lowest elevational range, and decreased subsequently. The primary soil nutrients under study decreased sharply along the elevational gradient. Seedling, sapling and tree distributions had a significantly positive relationship (p < 0.05) with available N, P, K, SOC, silt and clay contents and were negatively correlated (p < 0.05) with sand contents. The outcome of the study will form the basis for devising a plan for the management and conservation of J. macropoda forests
... For the four enrichment analyses, contingency tables were constructed counting amino acid sites with the different classifications (See Supplementary Methods: OR tests for enrichment for further details) and the OR were calculated using two-tailed and one-tailed Fisher's exact tests [70] for obtaining the corresponding P-values and 95% confidence intervals (CI 95%). It is worth clarifying that when counting residues we did not request exclusivity in the intersections, i.e. a residue with a given CCRpct can intersect with being in DOMAIN, DISORDER_MOBILE and DNA-RNA_BIND and hence will contribute to the cells in the three corresponding contingency tables. ...
Preprint
Full-text available
Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein-protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder-order transitions upon binding with other protein partners and liquid-liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.
... P(EG), P(JOH), P(BO) and P(BDM) show the probability values of various individual cointegration tests such as Engle and Granger (1987), Johansen and Juselius (1990), Boswijk (1994) and Bannerjee et al. (1998). We use Fisher's (1971) critical statistic values to determine whether or not cointegration occurs between the variables. Once crucial values provided by Bayer and Hanck (2013) are determined to be less than the computed Fisher (1971) statistics, we can infer in favor of cointegration by rejecting the null hypothesis of no cointegration. ...
Article
Full-text available
Purpose The main purpose of the present research is to explore the possible effectiveness of information and communication technology (ICT), infrastructure development, exchange rate and governance on inbound tourism demand using time series data in India. Design/methodology/approach The stationarity of the variables is checked by using the ADF, PP and KPSS unit root tests. The paper uses the Bayer-Hanck and auto-regressive distributed lag (ARDL) bounds testing approach to cointegration to examine the existence of long-run relationships; the error-correction mechanism for the short-run dynamics and the vector error correction method (VECM) to test the direction of causality. Findings The findings of the research indicate the presence of cointegration among the variables. Further, long-run results indicate infrastructure development, word-of-mouth and ICT have a positive and significant linkage with international tourist arrivals in India. However, ICT has a positive and significant effect on tourist arrivals in the short run as well. The VECM results indicate long-run unidirectional causality from infrastructure, ICT, governance and exchange rate to tourist arrivals. Research limitations/implications This study implies that inbound tourism demand in India can be augmented by improving infrastructure, governance quality and ICT penetration. For an emerging country like India, this may have far-reaching implications for sustaining and improving tourism sector growth. Originality/value This paper is the first of its kind to empirically examine the impact of ICT, infrastructure and governance quality in India using modern econometric techniques. Inbound tourism demand research aids government and policymakers in developing effective public policies that would reposition India to gain from a highly competitive global tourism industry.
... Rather than simply referring to an individual's total available evidence, allowing for a socially-extended and conventionally-restricted evidence base better characterises inquiry in science and law. 74 J. Neyman and E. S. Pearson were the first to modify Fisher's hypothesis testing to include reference to a class of possible alternative hypotheses (Lehmann, 1993;Fisher, 1925). Mayo's work extends Neyman and Pearson's hypothesis testing by quantifying the extent to which specific alternative hypotheses are severely tested, based on observed data (Neyman & Pearson, 1967). ...
Article
Full-text available
This essay presents a unified account of safety, sensitivity, and severe testing. S’s belief is safe iff, roughly, S could not easily have falsely believed p, and S’s belief is sensitive iff were p false S would not believe p. These two conditions are typically viewed as rivals but, we argue, they instead play symbiotic roles. Safety and sensitivity are both valuable epistemic conditions, and the relevant alternatives framework provides the scaffolding for their mutually supportive roles. The relevant alternatives condition holds that a belief is warranted only if the evidence rules out relevant error possibilities. The safety condition helps categorise relevant from irrelevant possibilities. The sensitivity condition captures ‘ruling out’. Safety, sensitivity, and the relevant alternatives condition are typically presented as conditions on warranted belief or knowledge. But these properties, once generalised, help characterise other epistemic phenomena, including warranted inference, legal verdicts, scientific claims, reaching conclusions, addressing questions, warranted assertion, and the epistemic force of corroborating evidence. We introduce and explain Mayo’s severe testing account of statistical inference. A hypothesis is severely tested to the extent it passes tests that probably would have found errors, were they present. We argue Mayo’s account is fruitfully understood using the resulting relevant alternatives framework. Recasting Mayo’s condition using the conceptual framework of contemporary epistemology helps forge fruitful connections between two research areas—philosophy of statistics and the analysis of knowledge—not currently in sufficient dialogue. The resulting union benefits both research areas.
... The experiment on insect mortality and enzyme assay was conducted in three replications, and statistical analysis was carried out using a completely randomized design (Fisher and Yates, 1948). The critical differences were determined to distinguish the level of significance at 0.05% among treatments (isolates). ...
Article
Full-text available
The entomopathogenic Beauveria spp. were acquired from insect cadavers and soil rhizosphere of cotton, groundnut, and castor. Among Beauveria, five spp. derived from infected insects, eight Beauveria found from soil, and one strain of Beauveria bassiana collected from MTCC 9544. Beauveria were characterized for morphology and cuticle-degrading enzyme activity associated with virulence against Bemisia tabaci. The colony morphology, conidial arrangement, size, and shape confirmed all isolates as Beauveria. The chitinase (EC 3.2.1.14) and lipase (EC 3.1.1.3) activities were observed the highest in Beauveria JAU2, while higher protease (EC 3.4.21.4) activity found in JAU4 followed by JAU2 at 240 h. The bio-efficacy of Beauveria (1 × 107 conidia.ml−1) illustrated that potent JAU2 was examined with the highest % mortality and corrected mortality of B. tabaci at 144 h followed by JAU1. The LC90 and LC50were determined from potent (JAU1 and JAU2) and weak (JAU6), and it was found the lowest in JAU2. The most potent Beauveria JAU2, isolated from insect cadaver (Harmivora armigera), was illustrated higher virulence than other isolates. The Beauveria JAU2 were recognized as Beauveria bassiana based on the shape of conidia and size (2.00 to 2.09 µm dia) as examined in SEM. Study insight into recognition of potent Beauveria bassiana JAU2 was linked with cuticle-degrading enzyme activity for insecticidal action. The JAU2 isolate established the most positive correlation (P0.01: 0.864) between chitinase activity and corrected mortality of insect.
... 13 The higher the correlation between these two sets of rankings, the more we are reifying a particular hierarchy of subgroups and entrenching existing disparities in the data. We use Kendall's Tau [78] as a measure of rank correlation, and combine the p-values obtained across runs of random seeds using Fisher's combined probability test [38]. ...
Preprint
Full-text available
Research in machine learning fairness has historically considered a single binary demographic attribute; however, the reality is of course far more complicated. In this work, we grapple with questions that arise along three stages of the machine learning pipeline when incorporating intersectionality as multiple demographic attributes: (1) which demographic attributes to include as dataset labels, (2) how to handle the progressively smaller size of subgroups during model training, and (3) how to move beyond existing evaluation metrics when benchmarking model fairness for more subgroups. For each question, we provide thorough empirical evaluation on tabular datasets derived from the US Census, and present constructive recommendations for the machine learning community. First, we advocate for supplementing domain knowledge with empirical validation when choosing which demographic attribute labels to train on, while always evaluating on the full set of demographic attributes. Second, we warn against using data imbalance techniques without considering their normative implications and suggest an alternative using the structure in the data. Third, we introduce new evaluation metrics which are more appropriate for the intersectional setting. Overall, we provide substantive suggestions on three necessary (albeit not sufficient!) considerations when incorporating intersectionality into machine learning.
... continuous vs. categorical). Specifically, we applied Fisher's Exact test [37], a statistical significance test designed for the categorical data type that examines each feature individually and assigns an exact significance value to each feature, on 521,017 categorical genetic features and Welch's t-test [38] on 91 continuous w-score volume features, and for each feature type, we obtained F = 10 independent sets of k = 17 features with the largest effect sizes on the outcome. We used effect size rather than p-value to rank the significance of the features because p-values are affected by sample size and a statistically significant p-value may indicate that a large sample size was used rather than demonstrating an actual significant difference. ...
Article
Full-text available
Background: The increasing availability of databases containing both magnetic resonance imaging (MRI) and genetic data allows researchers to utilize multimodal data to better understand the characteristics of dementia of Alzheimer's type (DAT). Objective: The goal of this study was to develop and analyze novel biomarkers that can help predict the development and progression of DAT. Methods: We used feature selection and ensemble learning classifier to develop an image/genotype-based DAT score that represents a subject's likelihood of developing DAT in the future. Three feature types were used: MRI only, genetic only, and combined multimodal data. We used a novel data stratification method to better represent different stages of DAT. Using a pre-defined 0.5 threshold on DAT scores, we predicted whether a subject would develop DAT in the future. Results: Our results on Alzheimer's Disease Neuroimaging Initiative (ADNI) database showed that dementia scores using genetic data could better predict future DAT progression for currently normal control subjects (Accuracy = 0.857) compared to MRI (Accuracy = 0.143), while MRI can better characterize subjects with stable mild cognitive impairment (Accuracy = 0.614) compared to genetics (Accuracy = 0.356). Combining MRI and genetic data showed improved classification performance in the remaining stratified groups. Conclusion: MRI and genetic data can contribute to DAT prediction in different ways. MRI data reflects anatomical changes in the brain, while genetic data can detect the risk of DAT progression prior to the symptomatic onset. Combining information from multimodal data in the right way can improve prediction performance.
... We then compute the average Euclidean Embeddings' Distance (ED) from the corresponding centroids: 8 ED(l, e) = E n i=0 e i − c i 2 As a sanity check, we apply a significance test to the ED statistic, confirming that representations of same-class examples are close to each other. Specifically, we apply a permutation test (Fisher, 1971), with 1000 repetitions, comparing the class labels to random labels. We find that EDs for both BERT and BERT IT:CLUST are significantly different from random (p < 0.001). ...
Preprint
Full-text available
In real-world scenarios, a text classification task often begins with a cold start, when labeled data is scarce. In such cases, the common practice of fine-tuning pre-trained models, such as BERT, for a target classification task, is prone to produce poor performance. We suggest a method to boost the performance of such models by adding an intermediate unsupervised classification task, between the pre-training and fine-tuning phases. As such an intermediate task, we perform clustering and train the pre-trained model on predicting the cluster labels. We test this hypothesis on various data sets, and show that this additional classification phase can significantly improve performance, mainly for topical classification tasks, when the number of labeled instances available for fine-tuning is only a couple of dozen to a few hundred.
... It can reduce the compounded effect on the error rate of the result pairwise test like T-test method. ANOVA was developed by the English statistician Yates and Fisher [76] and has been applied in various fields for data analysis. It has been applied successfully to face recognition and classification [77,78]. ...
Article
Full-text available
The striking realism of the life-sized ceramic terracotta warriors has been attracting the interest of the public and archaeologists since they were discovered from the mausoleum complex of the first Chinese Emperor Qin Shihuang in the 1970s. It is still debated whether the life-size models were based on individual people or were just crafted from the standardized models. This research examined the facial features of the terracotta warriors in a quantitative and contactless way with the support of the High-precision 3D point cloud modelling technology and the anthropometric method. The similarities and dissimilarities were analyzed among the facial features of terracotta warriors and 29 modern Chinese ethnic groups using mathematical statistics methods such as MDS, ANOVA, ranking analysis and cluster analysis. The results reveal that the features of the terracotta warriors highly resemble those of contemporary Chinese people and indicate that terracotta warriors were crafted from real portraits and intended to constitute a real army to protect the Emperor Qin Shihuang in the afterlife.
... Analysis of the significance of variation in research data assess from various trial effects, and Fisher's test [55] was used for statistical analysis of data. The test of significance was determined by critical differences and was found significant at 5% probability. ...
Article
Full-text available
Abstract: (1) Background: Arid conditions occur due to climate abnormality in the different biogeog�raphy regions of the world. The aim of this research is to investigate the stoichiometry of manure and moisture regimes on soil properties, microbial biomass C:N:P turnover, and the grain yield of mustard crops under stress in arid conditions; (2) Methods: The field experiment was carried out for 2 years at the farms of the agriculture college of SKN, Jobner (SKRAU Bikaner, Rajasthan). The effects of organic manure, moisture regimes, and saline water treatment on soil properties, such as the soil microbial biomass build-up, loss, turnover, and recycling of carbon (Cmic), nitrogen (Nmic), and phosphorus (Pmic) in the mustard crop were investigated. The twenty-seven treatments studied are described as follows: organic manures (control, FYM @ 10 t ha−1 and vermicompost @ 5 t ha−1 ), moisture regimes (0.4, 0.6, and 0.8 IW/CPE ratio), and saline irrigation water (control, 6, 12 dSm−1 ); (3) Results: Our findings indicate that vermicompost @ 5 t ha−1 significantly increases moisture retention and the available water in soil at 33 kPa and 1500 kPa. The microbial biomass build-up of Cmic increases by 43.13% over the control and 14.36% over the FYM. Similarly, the soil microbial biomass of Nmic, and Pmic also increase considerably. The SHC of the soil is enhanced by the application of farmyard fertilizer and vermicompost. The BD and pH decrease significantly, while the SHC, OC, CEC, and ECe of the soil increase significantly. The build-up, losses, and fluxes of the soil microbial biomass of Cmic, Nmic, and Pmic increase significantly, and the turnover rate decreases under vermicompost @ 5 t ha−1. A significant increase in grain yield was observed. Irrigation with a 0.8 IW/CPE moisture regime significantly decreases the pH of the SHC; (4) Conclusions: We hypothesized the interactive outcomes of the moisture regime and found that organic manure significantly influenced grain and stover yield. The treatments of quality irrigation water and the addition of organic manure are efficient enough to improve soil properties, water holding capacity, and soil microbial biomass C:N:P in stress climatic conditions.
Article
Background US regulatory framework for advanced heart failure therapies (AHFT), ventricular assist devices, and heart transplants, delegate eligibility decisions to multidisciplinary groups at the center level. The subjective nature of decision‐making is at risk for racial, ethnic, and gender bias. We sought to determine how group dynamics impact allocation decision‐making by patient gender, racial, and ethnic group. Methods and Results We performed a mixed‐methods study among 4 AHFT centers. For ≈ 1 month, AHFT meetings were audio recorded. Meeting transcripts were evaluated for group function scores using de Groot Critically Reflective Diagnoses protocol (metrics: challenging groupthink, critical opinion sharing, openness to mistakes, asking/giving feedback, and experimentation; scoring: 1 to 4 [high to low quality]). The relationship between summed group function scores and AHFT allocation was assessed via hierarchical logistic regression with patients nested within meetings nested within centers, and interaction effects of group function score with gender and race, adjusting for patient age and comorbidities. Among 87 patients (24% women, 66% White race) evaluated for AHFT, 57% of women, 38% of men, 44% of White race, and 40% of patients of color were allocated to AHFT. The interaction between group function score and allocation by patient gender was statistically significant ( P =0.035); as group function scores improved, the probability of AHFT allocation increased for women and decreased for men, a pattern that was similar irrespective of racial and ethnic groups. Conclusions Women evaluated for AHFT were more likely to receive AHFT when group decision‐making processes were of higher quality. Further investigation is needed to promote routine high‐quality group decision‐making and reduce known disparities in AHFT allocation.
Chapter
Full-text available
Dieses Kapitel vermittelt folgende Lernziele: Wissen, was man unter qualitativer Datenanalyse versteht und verschiedene interpretative Auswertungsverfahren kennen. Wissen, was man unter quantitativer Datenanalyse versteht und unterschiedliche statistische Auswertungsansätze voneinander abgrenzen können. Die Logik des klassischen statistischen Signifikanztests zur Überprüfung von Hypothesen erläutern können. Bei quantitativen explorativen (gegenstandserkundenden und theoriebildenden) Studien Methoden der explorativen Datenanalyse beschreiben können. Bei quantitativen deskriptiven (populationsbeschreibenden) Studien die Parameterschätzung mittels Punkt- und Intervallschätzung hinsichtlich unterschiedlicher Arten von Parametern und Stichproben erklären können. Bei quantitativen explanativen (hypothesenprüfenden) Studien die Hypothesenprüfung mittels klassischem statistischem Signifikanztest hinsichtlich verschiedener Arten von Unterschieds-, Zusammenhangs- und Veränderungs-Hypothesen sowie Einzelfall-Hypothesen erläutern können.
Book
Full-text available
Der sich selbst behandelnde Kranke: Über die Herausbildung eines neuen Patiententypus am Beispiel der Diabetestherapie. Schon Jahre bevor der Hormonwirkstoff Insulin die Diabetestherapie Anfang der 1920er revolutionierte und das Leben abertausender Diabetiker weltweit von Grund auf verändern sollte, vertrat der renommierte US-amerikanische Diabetesspezialist Elliott Proctor Joslin (1869-1962) die Auffassung, dass ein diabetischer Patient seine eigene Krankenschwester, sein eigener Chemiker und der Assistent seines behandelnden Arztes sein solle. Eine bemerkenswerte Position zu einer Zeit, die gemeinhin als eine gilt, in der die Verwissenschaftlichungstendenzen einer ohnehin paternalistisch geprägten Medizin zu einer weitreichenden Marginalisierung von Patienten und Patientinnen geführt habe. Auf der Suche nach den Gründen dieser für die erste Hälfte des 20. Jahrhunderts ungewöhnlichen Interaktionspraxis zwischen Ärzten und Patienten leuchtet Oliver Falk in seinem Buch die Herausbildung und Konstituierung dieses kooperierenden, aktiven, sich selbst behandelnden Patient:innentypus aus, der lange vor organisierten Patientenbewegungen und »Citizen Science« konstitutiv für die moderne Diabetestherapie werden sollte. Dabei zeigt er detailliert den engen epistemologischen Zusammenhang zwischen therapeutischem Handeln und wissenschaftlichem Erkenntnisstreben auf und verdeutlicht, dass alltägliches therapeutisches Handeln nicht allein Resultat laborwissenschaftlicher und klinischer Forschungspraxis ist, sondern selbst zum Kern medizinisch-wissenschaftlicher Erkenntnisprozesse gezählt werden muss.
Chapter
The present study inquires about the dynamicity of the river processes and associated flood vulnerabilities of the Bhagirathi-Hugli River, particularly in the upper catchment area of the river. The geospatial techniques for analyzing the meandering, sinuosity, and other geometric features of the main channel of the Bhagirathi-Hugli River and the morphometric description of the sub-basin accentuate the dynamicity of fluvial processes. The image processing techniques have been employed to measure normalized difference vegetation index (NDVI), modified normalized difference water index (MNDWI), normalized difference built-up index (NDBI), Z-score (standard scores of annual rainfall), and rainfall erosivity index and analyze them as the major factors of flood vulnerability using the geospatial platform. A composite flood vulnerability index has been formulated to identify the potential flood risk zones in the study districts. The rainfall data of 2000 and 2015 of the study area have been collected to analyze the standardized rainfall. The maps of standardized rainfall have been compared with the surface flow raster of the year 2000 and 2015. The factors of terrain and streamflow also have an impact on flood frequency and intensity; those predict the normalized difference flood index by multiple linear regression model. The major consequences of flood hazards of the Bhagirathi-Hugli sub-basin have been identified in the northeastern and southwestern portions of Murshidabad district, northern and mid-portions of Nadia district, and eastern part of Purba Bardhaman district under the Bhagirathi-Hugli sub-basin area. Howsoever, diverse factors and mechanisms in controlling flood have been analyzed aiming to formulate integrated flood management plans. To eradicate the severe flood hazard risks and minimize its vulnerability, integrated flood management program needs to be implemented with community participation, preparedness, resilience, and capacity building in the study area. Keywords: Flood hazard, Fluvial processes, Flood vulnerability, Bhagirathi-Hugli sub-basin, Geospatial techniques
Article
This paper focuses on nonparametric procedures for testing conditional independence between random vectors using M\"obius transformation. We derive a method predicated on general empirical processes indexed by a specific class of functions. Conditional half-space and conditional empirical characteristic processes are used to demonstrate two abstract approximation theorems and their applications in real-world situations. We conclude by describing the limiting behavior of the M\"obius transformation of the empirical conditional processes indexed by functions under contiguous sequences of alternatives. Our results are proved under some standard structural conditions on the Vapnik-Chervonenkis classes of functions and some mild conditions on the model. Monte Carlo simulation results indicate that the suggested statistical test for independence behaves reasonably well in finite samples.
Chapter
Article
Full-text available
This paper estimates the price and GDP/income elasticities of residential sector gas demand in the number of gas exporting countries over 1990–2019 by applying the homogenous OLS, TSLS and GMM methods to a panel data set. The energy demand is specified by a simple partial adjustment model. The study finds that gas exporting countries are nonresponsive to price changes either in a short or long-term period. Although the results for income elasticity are not conclusive in terms of magnitude and sign, they show that short-run income elasticity is inelastic and smaller than that of long-run. The study also provides results of heterogeneous 2SLS estimators for individual countries. Comparing these results with the results of the previous similar study using the ARDL bounds testing approach shows that while there is wide variability between individual estimations, both studies have found almost similar long-run income elasticity on average. For the long-run price elasticity, however, the ARDL model seems to give more intuitive results in terms of sign and magnitude.
Article
Full-text available
Sample size and statistical power are often limited in pediatric cardiology studies due to the relative infrequency of specific congenital malformations of the heart and specific circulatory physiologies. The primary aim of this study was to determine what proportion of pediatric cardiology randomized controlled trials achieve an 80% statistical power. Secondary aims included characterizing reporting habits in these studies. A systematic review was performed to identify pertinent pediatric cardiology randomized controlled trials. The following data were collected: publication year, journal, if “power” or “sample size” were mentioned if a discrete, primary endpoint was identified. Power analyses were conducted to assess if the sample size was adequate to demonstrate results at 80% power with a p-value of less than 0.05. A total of 83 pediatric cardiology randomized controlled trials were included. Of these studies, 48% mentioned “power” or “sample size” in the methods, 49% mentioned either in the results, 12% mentioned either in the discussion, and 66% mentioned either at any point in the manuscript. 63% defined a discrete, primary endpoint. 38 studies (45%) had an adequate sample size to demonstrate differences with 80% power at a p-value of less than 0.05. A majority of these are not powered to reach the conventionally accepted 80% power target. Adequately powered studies were found to be more likely to report “power” or “sample size” and have a discrete, primary endpoint.
Article
Full-text available
A common complication that can arise with analyses of high-dimensional data is the repeated use of hypothesis tests. A second complication, especially with small samples, is the reliance on asymptotic p -values. Our proposed approach for addressing both complications uses a scientifically motivated scalar summary statistic, and although not entirely novel, seems rarely used. The method is illustrated using a crossover study of seventeen participants examining the effect of exposure to ozone versus clean air on the DNA methylome, where the multivariate outcome involved 484,531 genomic locations. Our proposed test yields a single null randomization distribution, and thus a single Fisher-exact p -value that is statistically valid whatever the structure of the data. However, the relevance and power of the resultant test requires the careful a priori selection of a single test statistic. The common practice using asymptotic p -values or meaningless thresholds for “significance” is inapposite in general.
Article
The purpose of this work is to improve the efficiency in estimating the average causal effect (ACE) on the survival scale where right censoring exists and high-dimensional covariate information is available. We propose new estimators using regularized survival regression and survival Random Forest (RF) to adjust for the high-dimensional covariate to improve efficiency. We study the behavior of the adjusted estimators under mild assumptions and show theoretical guarantees that the proposed estimators are more efficient than the unadjusted ones asymptotically when using RF for the adjustment. In addition, these adjusted estimators are n- consistent and asymptotically normally distributed. The finite sample behavior of our methods is studied by simulation. The simulation results are in agreement with the theoretical results. We also illustrate our methods by analyzing the real data from transplant research to identify the relative effectiveness of identical sibling donors compared to unrelated donors with the adjustment of cytogenetic abnormalities.
Article
Single-task assessments may not identify lingering effects following a concussion that may be detected under dual-task (DT) paradigms. The purpose of this study was to determine the effects of a novel DT paradigm and concussion history on gait and cognitive performance. Hockey and rugby club college athletes ( n = 26) completed a box drill and the color and word Stroop test under single task and DT. Distance ambulated around the box, response rate, and accuracy were recorded to calculate dual-task cost. Mean comparisons and linear mixed-effects regression models were performed. Compared to athletes with no concussion history, those with a history had a greater motor than cognitive dual-task cost and were 3.15% less accurate in Stroop responses ( p < .01). Athletes walked 0.72-m shorter distance under DT compared to single task ( p = .04). A multidirectional, low-tech DT assessment may highlight long-term motor and cognitive deficits among athletes with a concussion history, which will provide valuable information to prepare and track performance within an athletic season.
Chapter
Social capital has declined in both developments and in other developing economies, while income inequality has tended to increase. Recent studies show the correlation between social capital and income inequalities, while few studies analyse the direction of causality at a macro level. This paper aims to investigate the causal relationship between generalised trust, as well as income inequalities in 23 economies belonging to the Organisation for Economic Cooperation and Development (OECD) from 2000 to 2019. In this study, we use the application of a fully modified least squares model (FMOLS) and the canonical correlation regression estimator (CCR). In addition, the unit root and cointegration test is applied before applying the Granger causality test. The findings show that there is a bidirectional relationship between social capital and income inequality.
Article
Full-text available
The occurrence, abundance, and distribution of phytoplankton have been investigated upstream and downstream of three barrages on the river Ganga at Bijnor, Narora, and Kanpur in Uttar Pradesh, India. A total of 104 phytoplankton species belonging to eight phyla (Bacillariophyta, Charophyta, Chlorophyta, Cryptophyta, Cyanophyta, Euglenophyta, Miozoa, and Ochrophyta) were identified during the sampling period. During the summer, monsoon, and post-monsoon seasons, the density of phytoplankton (Ind. L⁻¹) ranged from 9.6 × 10⁴ to 2.03 × 10⁷, 9.6 × 10⁴ to 4.5 × 10⁵, and 2.2 × 10⁵ to 2.17 × 10⁶, respectively. The species abundance and the relative abundance showed an increasing trend from the first (Bijnor) to the third (Kanpur) barrage, suggesting a gradual decrease in river flow and an increase in residence time. Phytoplankton cell density in Kanpur, however, was unexpectedly higher and showed eutrophic conditions attributable to elevated organic load and surplus nutrients from the land runoff. One-way ANOVA (post-hoc Tukey test) showed statistically significant (p < 0.05) seasonal variation in temperature, transparency, free CO2, PO4³⁻, and dissolved organic matter. Analysis of Pearson’s correlation coefficient suggested a statistically significant correlation (p < 0.05) of mostly phytoplanktonic groups with free CO2, CO3²⁻, HCO3⁻, Cl⁻, specific conductivity, total dissolved solids, total hardness, Mg²⁺, PO4³⁻, and SiO4⁴⁻. The minimum species diversity was recorded during the monsoon season, while the maximum diversity was reported during the post-monsoon season which might be due to high nutrient load and a high concentration of PO4³⁻ post-monsoon. We concluded that aquatic biodiversity and ecological structure could be adversely influenced by a series of obstructed barrages and dams, which influenced the assemblage pattern of phytoplankton communities.
Article
In the present paper, a sensitivity analysis of pollutants and pattern factor in a model combustor due to changes in the geometrical characteristics of stabilizing jets has been carried out. The exhaust pollutants including NOx, CO and soot have been chosen due to their harmful effect on the environment. The pattern factor has been also considered owing to its impact on turbine blades. The geometrical characteristics comprise diameter, angle and position of stabilizing jets. Eulerian-Lagrangian approach has been employed to model liquid fuel injection and distribution, breakup and evaporation of droplets. For the analysis of reactive-spray flow characteristics, RANS approach, realizable k-ε turbulence model, discrete ordinates radiative heat transfer model and steady flamelet combustion model together with the chemical reaction mechanism of diesel fuel (C10H22) have been applied. NOx modeling has been performed via post-processing. Sensitivity analysis is such that by making variations in the problem inputs (diameter, angle and position of jets) in an organized manner, the effects on the outputs (NOx, CO, soot and pattern factor) are predicted. The number and order of simulations are predicted by design of experiments and full factorial model. Results have been analyzed using analysis of variance. It has been observed that if interactions among the characteristics of jets are considered, it is possible to analyze the exhaust pollutants more accurately. In fact, by using the interactions, it is likely to find a point where all output parameters are improved. Results show that by considering interactions of stabilizing jet characteristics, the maximum values of NOx, CO, soot and pattern factor change from 13.927 ppm, 11.198% mole fraction, 2.877 ppm and 0.043 to 26.233 ppm, 14.693% mole fraction, 142.357 ppm and 0.060, respectively. Furthermore, the minimum values change from 5.819 ppm, 7.568% mole fraction, 0.013 ppm and 0.029 to 6.098 ppm, 5.987% mole fraction, 0.002 ppm and 0.027, respectively.
Article
Physico-mechanical rock properties are typically investigated via laboratory tests using core samples. However, sample coring is time consuming and expensive. The longitudinal wave velocity in rock bolts embedded in a rock mass depends on the surrounding rock properties. Hence, the longitudinal wave velocity in rock bolts can be utilized to predict rock properties. This study presents the relationship between granite rock properties and longitudinal wave velocity (vL) in rock bolts to predict rock properties using the longitudinal wave velocity in a rock bolt. Chemical (saline solution) and mechanical (slake durability test) weathering processes are employed to diversify the properties of the rock specimens. Laboratory tests are conducted on rock specimens to measure the physico-mechanical rock properties, including the velocities (vp and vs) associated with constrained and shear moduli, density (ρ), Young's modulus (E), Poisson's ratio (μ), and porosity (η), compressive strength (fc), and slake durability index (SDI). The measured rock properties are used in the rock mass model, and variations in vL in the rock bolt with different properties are investigated via numerical simulations. Results show that vp, vs, ρ, E, and fc are correlated significantly with vL (R² > 0.9), whereas η and SDI are moderately correlated with vL (R² > 0.7). However, a meaningful correlation between μ and vL is not obtained. The root mean square and mean absolute percentage errors are estimated to validate the correlation equations, and errors are not considered. Results of the t-test show that the calculated t-value is higher than the critical t-value, and the p-value is smaller than the significance level of 0.05, indicating that the correlation coefficient is significant. This study shows that the velocity of longitudinal waves in a rock bolt can be a useful indicator for predicting in-situ rock properties.
Article
Full-text available
The Jeffreys–Lindley paradox exposes a rift between Bayesian and frequentist hypothesis testing that strikes at the heart of statistical inference. Contrary to what most current literature suggests, the paradox was central to the Bayesian testing methodology developed by Sir Harold Jeffreys in the late 1930s. Jeffreys showed that the evidence for a point-null hypothesis $${\mathcal {H}}_0$$ H 0 scales with $$\sqrt{n}$$ n and repeatedly argued that it would, therefore, be mistaken to set a threshold for rejecting $${\mathcal {H}}_0$$ H 0 at a constant multiple of the standard error. Here, we summarize Jeffreys’s early work on the paradox and clarify his reasons for including the $$\sqrt{n}$$ n term. The prior distribution is seen to play a crucial role; by implicitly correcting for selection, small parameter values are identified as relatively surprising under $${\mathcal {H}}_1$$ H 1 . We highlight the general nature of the paradox by presenting both a fully frequentist and a fully Bayesian version. We also demonstrate that the paradox does not depend on assigning prior mass to a point hypothesis, as is commonly believed.
Chapter
This chapter relates the probabilistic basics of statistical inference to the methodological debate about p-values and statistical significance. It describes the p-value and the null-hypothesis-significance-testing (NHST) approach and identifies their drawbacks and pitfalls. NHST downgrades the two meaningful pieces of information that we can extract from a random sample—the point estimate (signal) and the uncertainty of the estimation (noise)—first into a quotient (signal-to-noise ratio), then into a p-value (based on the usually uninformative null hypothesis of zero effect), and finally into a dichotomous significance declaration (based on an arbitrary p-value threshold such as 0.05). Nothing is gained by this downgrade. On the contrary. The associated jargon that speaks of “significant” (“positive”) as opposed to “non-significant” (“negative”) results is delusive and makes not only the consumers of research but also many researchers draw rash yes/no conclusions from individual studies. Given NHST’s poor track record, the chapter also dives back into history and explains how it came that the meaningful signal and noise information that can be extracted from a random sample was distorted almost beyond recognition into statistical significance declarations. It seems that a “wrong historical bend” in the wake of semantic confusions has led to an amalgamation of two irreconcilable approaches: “significance testing” by Fisher and “hypothesis testing” by Neyman and Pearson. Getting acquainted with the two original perspectives elucidates why cobbling them together into what is today known as NHST is bound to cause inferential errors.
Article
Full-text available
This study presents a design of tetrapeptides acting in a competitive manner against 3-hydroxy-3-methylglutaryl coenzyme A reductase. This enzyme is studied by many researchers in order to control a cholesterol level because the elevated cholesterol level is known as a risk factor of hypercholesterolemia. In previous studies, the two hypocholesterolemic peptides were isolated from soybean. Based on the obtained structural data for those peptides, a β-turn conformation was modeled in new designed tetrapeptides as a recognized structure for the binding site. A number of the tri-, tetra-, hexa- and heptapeptides were modeled in the previous studies by using the peptide fragmentation as a design approach. In the hexa- and heptapeptides, a β-turn structure was located in the N-terminus of those peptides. This work investigates a possibility to design the tetrapeptides with another location of a β-turn structure and with different amino acid sequences in comparison with the previously developed peptides by using the same design approach. The FPTA peptide was found as the most active inhibitor among all of new designed tetrapeptides. The kinetic study supported that this peptide is a competitive inhibitor of HMG-CoA with an equilibrium constant of inhibitor binding (Ki) of 1.2 ± 0.1 μM. The circular dichroism data confirmed a presence of β-turn conformation in this peptide sequence. The obtained correlation between the predicted and experimental peptide activities suggests that the proposed approach is acceptable in design of new peptide sequence which is different both in length and location of a β-turn structure compared to the previously developed peptides.
Article
Full-text available
El Niño-Southern Oscillation (ENSO) events occasionally recur one after the other in the same polarity, called multiyear ENSO. However, the dynamical processes are not well understood. This study aims to elucidate the unified mechanisms of multiyear ENSO using observations, Coupled Model Intercomparison Project Phase 6 (CMIP6) models, and the theoretical linear recharge oscillator (RO) model. We found that multiyear El Niño and La Niña events are roughly symmetric except for cases of multiyear La Niña following strong El Niño. The composite multiyear ENSO reveals that anomalous ocean heat content (OHC) in the equatorial Pacific persists beyond the first peak, stimulating another event. This prolonged OHC anomaly is caused by meridional Ekman heat transport counteracting geostrophic transport-induced recharge-discharge process that otherwise acts to change the OHC anomaly. A meridionally wide pattern of sea surface temperature anomalies observed during multiyear ENSO is responsible for the Ekman heat transport and multiple factors such as decadal variability, subtropical processes, and ENSO diversity modulate the ENSO meridional structure. CMIP6 multi-model ensemble shows a significant correlation between the ENSO meridional width and the occurrence ratio of multi-year ENSO, supporting the aforementioned mechanism. A multiyear ENSO-like oscillation was simulated using the linear RO model that incorporates a seasonally varying Bjerknes growth rate and a weak recharge efficiency representing the effect of Ekman transport. When the recharge efficiency parameter was estimated using reanalysis data based on geostrophic transport alone, a multiyear ENSO rarely occurred, confirming the importance of Ekman transport in retarding the recharge-discharge process.
Article
Full-text available
We investigate the use of sentiment dictionaries to estimate sentiment for large document collections. Our goal in this paper is a semiautomatic method for extending a general sentiment dictionary for a specific target domain in a way that minimizes manual effort. General sentiment dictionaries may not contain terms important to the target domain or may score terms in ways that are inappropriate for the target domain. We combine statistical term identification and term evaluation using Amazon Mechanical Turk to extend the EmoLex sentiment dictionary to a domain-specific study of dengue fever. The same approach can be applied to any term-based sentiment dictionary or target domain. We explain how terms are identified for inclusion or re-evaluation and how Mechanical Turk generates scores for the identified terms. Examples are provided that compare EmoLex sentiment estimates before and after it is extended. We conclude by describing how our sentiment estimates can be integrated into an epidemiology surveillance system that includes sentiment visualization and discussing the strengths and limitations of our work.
Article
Full-text available
Previous research has documented the utility of synchronous neural interactions (SNI) in classifying women veterans with and without posttraumatic stress disorder (PTSD) and other trauma-related outcomes based on functional connectivity using magnetoencephalography (MEG). Here, we extend that line of research to evaluate trauma-specific PTSD neural signatures with MEG in women veterans. Participants completed diagnostic interviews and underwent a task-free MEG scan from which SNI was computed. Thirty-five women veterans were diagnosed with PTSD due to sexual trauma and sixteen with PTSD due to non-sexual trauma. Strength of SNI was compared in women with and without sexual trauma, and linear discriminant analysis was used to classify the brain patterns of women with PTSD due to sexual trauma and non-sexual trauma. Comparison of SNI strength between the two groups revealed widespread hypercorrelation in women with sexual trauma relative to those without sexual trauma. Furthermore, using SNI, the brains of participants were classified as sexual trauma or non-sexual trauma with 100% accuracy. These findings bolster evidence supporting the utility of task-free SNI and suggest that neural signatures of PTSD are trauma-specific.
Article
The prevalence of big data has raised significant epistemological concerns in information systems research. This study addresses two of them—the deflated p -value problem and the role of explanation and prediction. To address the deflated p -value problem, we propose a multivariate effect size method that uses the log-likelihood ratio test. This method measures the joint effect of all variables used to operationalize one factor, thus overcoming the drawback of the traditional effect size method (θ), which can only be applied at the single variable level. However, because factors can be operationalized as different numbers of variables, direct comparison of multivariate effect size is not possible. A quantile-matching method is proposed to address this issue. This method provides consistent comparison results with the classic quantile method. But it is more flexible and can be applied to scenarios where the quantile method fails. Furthermore, an absolute multivariate effect size statistic is developed to facilitate concluding without comparison. We have tested our method using three different datasets and have found that it can effectively differentiate factors with various effect sizes. We have also compared it with prediction analysis and found consistent results: explanatorily influential factors are usually also predictively influential in a large sample scenario.
Article
Full-text available
Functional annotations have the potential to increase power of genome-wide association studies (GWAS) by prioritizing variants according to their biological function, but this potential has not been well studied. We comprehensively evaluated all 1132 traits in the UK Biobank whose SNP-heritability estimates were given “medium” or “high” labels by Neale’s lab. For each trait, we integrated GWAS summary statistics of close to 8 million common variants (minor allele frequency >1%) with either their 75 individual functional scores or their meta-scores, using three different data-integration methods. Overall, the number of new genome-wide significant findings after data-integration increases as a trait SNP-heritability estimate increases. However, there is a trade-off between new findings and loss of baseline GWAS findings, resulting in similar total numbers of significant findings between using GWAS alone and integrating GWAS with functional scores, across all 1132 traits analyzed and all three data-integration methods considered. Our findings suggest that, even with the current biobank-level sample size, more informative functional scores and/or new data-integration methods are needed to further improve the power of GWAS of common variants. For example, studying variants in coding sequence and obtaining cell-type-specific scores are potential future directions.
Article
The prevailing view in the current replication crisis literature is that the non-replicability of published empirical studies (a) confirms their untrustworthiness, and (b) the primary source of that is the abuse of frequentist testing, in general, and the p-value in particular. The main objective of the paper is to challenge both of these claims and make a case that (a) non-replicability does not necessarily imply untrustworthiness and (b) the abuses of frequentist testing are only symptomatic of a much broader problem relating to the uninformed and recipe-like implementation of statistical modeling and inference that contributes significantly to untrustworthy evidence. It is argued that the crucial contributors to the untrustworthiness relate (directly or indirectly) to the inadequate understanding and implementation of the stipulations required for model-based statistical induction to give rise to trustworthy evidence. It is argued that these preconditions relate to securing reliable ‘learning from data’ about phenomena of interest and pertain to the nature, origin, and justification of genuine empirical knowledge, as opposed to beliefs, conjectures, and opinions.
Article
Full-text available
Freezing of gait is a common gait disorder among patients with advanced Parkinson’s disease and is associated with falls. This paper designed the relevant experimental procedures to obtain FoG signals from PD patients. Accelerometers, gyroscopes, and force sensing resistor sensors were placed on the lower body of patients. On this basis, the research on the optimal feature extraction method, sensor configuration, and feature quantity selection in the FoG detection process is carried out. Thirteen typical features consisting of time domain, frequency domain and statistical features were extracted from the sensor signals. Firstly, we used the analysis of variance (ANOVA) to select features through comparing the effectiveness of two feature selection methods. Secondly, we evaluated the detection effects with different combinations of sensors to get the best sensors configuration. Finally, we selected the optimal features to construct FoG recognition model based on random forest. After comprehensive consideration of factors such as detection performance, cost, and actual deployment requirements, the 35 features obtained from the left shank gyro and accelerometer, and 78.39% sensitivity, 91.66% specificity, 88.09% accuracy, 77.58% precision and 77.98% f-score were achieved. This objective FoG recognition method has high recognition accuracy, which will be helpful for early FoG symptoms screening and treatment.
Article
Full-text available
Background Environmental concerns are growing globally. The world has suffered severe environmental deterioration over the years. Undeniably, the impact of environmental degradation on the earth’s geographical space is alarming, making environmental stakeholders to be worried. Existing literature has examined several factors affecting the environment, but the focus has now shifted to education and the need to maximize its potentials. Although studies have examined the direct impacts of education on the environment, those investigating its moderating role are relatively new and scarce, particularly across income groups. Understanding the channel through which education might affect the environment requires the knowledge of its moderating role. Therefore, this study employs FMOLS, DOLS, ARDL-PMG, CCEMG and heterogeneous panel causality test methodologies to investigate the direct and moderating effects of education in the growth-energy-environment linkages in heterogeneous income groups of 92 countries from 1985 to 2018. Results The findings of this study indicate that economic growth is a long-term solution to environmental deterioration in high and upper-middle-income countries, while the opposite holds for lower-middle-income and low-income countries. In addition, energy consumption is linked with environmental degradation across all income groups. Also, the study finds that education’s direct effects aggravate environmental degradation across all income groups. Moreover, its moderating role ameliorates the adverse effects of energy consumption on environmental degradation in high and upper-middle-income groups but worsens it in the lower-middle-income and low-income groups. Conclusion This study examines the role of education in economic growth, energy consumption and environmental degradation nexus. The study concludes that education is important for environmental sustainability as it encourages pro-environmental behaviors and attitudes and supports energy-efficient products and investments in green technologies. However, education may also aid energy-intensive activities and dirty technology by supporting lifestyles that are not eco-friendly. It is important, therefore, to provide education that promotes better environmental quality.
Article
Full-text available
Background The APOBEC3 (apolipoprotein B mRNA editing enzyme catalytic polypeptide 3) family of cytidine deaminases is responsible for two mutational signatures (SBS2 and SBS13) found in cancer genomes. APOBEC3 enzymes are activated in response to viral infection, and have been associated with increased mutation burden and TP53 mutation. In addition to this, it has been suggested that APOBEC3 activity may be responsible for mutations that do not fall into the classical APOBEC3 signatures (SBS2 and SBS13), through generation of double strand breaks.Previous work has mainly focused on the effects of APOBEC3 within individual tumour types using exome sequencing data. Here, we use whole genome sequencing data from 2451 primary tumours from 39 different tumour types in the Pan-Cancer Analysis of Whole Genomes (PCAWG) data set to investigate the relationship between APOBEC3 and genomic instability (GI). Results and conclusions We found that the number of classical APOBEC3 signature mutations correlates with increased mutation burden across different tumour types. In addition, the number of APOBEC3 mutations is a significant predictor for six different measures of GI. Two GI measures (INDELs attributed to INDEL signatures ID6 and ID8) strongly suggest the occurrence and error prone repair of double strand breaks, and the relationship between APOBEC3 mutations and GI remains when SNVs attributed to kataegis are excluded.We provide evidence that supports a model of cancer genome evolution in which APOBEC3 acts as a causative factor in the development of diverse and widespread genomic instability through the generation of double strand breaks. This has important implications for treatment approaches for cancers that carry APOBEC3 mutations, and challenges the view that APOBECs only act opportunistically at sites of single stranded DNA.
Article
Full-text available
Background: Longevity-related genes have been found in several animal species as well as in humans. The goal of this study was to perform genetic analysis of long-lived Cane corso dogs with the aim to find genes that are associated with longevity. Results: SNPs with particular nucleotides were significantly overrepresented in long-lived dogs in four genes, TDRP, MC2R, FBXO25 and FBXL21. In FBXL21, the longevity-associated SNP localises to the exon. In the FBXL21 protein, trypto-phan in long-lived dogs replaced arginine present in reference dogs. Conclusions: Four SNPs associated with longevity in dogs were identified using GWAS and validated by DNA sequencing. We conclude that genes TDRP, MC2R, FBXO25 and FBXL21 are associated with longevity in Cane corso dogs.
Article
Full-text available
ANOVA—the workhorse of experimental psychology—seems well understood in that behavioral sciences have agreed-upon contrasts and reporting conventions. Yet, we argue this consensus hides considerable flaws in common ANOVA procedures, and these flaws become especially salient in the within-subject and mixed-model cases. The main thesis is that these flaws are in model specification. The specifications underlying common use are deficient from a substantive perspective, that is, they do not match reality in behavioral experiments. The problem, in particular, is that specifications rely on coincidental rather than robust statements about reality. We provide specifications that avoid making arguments based on coincidences, and note these Bayes factor model comparisons among these specifications are already convenient in the BayesFactor package. Finally, we argue that model specification necessarily and critically reflects substantive concerns, and, consequently, is ultimately the responsibility of substantive researchers. Source code for this project is at github/PerceptionAndCognitionLab/stat_aov2 .
Article
Full-text available
Environmental problems are very concerning, particularly in many cities of developing countries, because they obstruct the creation of a sustainable urban environment. Dhaka, Bangladesh was chosen as the research area of this study, as Bangladesh is a developing country with pollution; moreover, the level of residents' environmental perception was assessed, and their environmental attitudes and awareness were examined in relation to their demographic characteristics. A face-to-face questionnaire survey involving 400 respondents was conducted across various zones of the study area. The mean score, standard deviation, and p value of each respondent's answer were calculated separately using a one-way analysis of variance (ANOVA). Then, a grand mean, average standard deviation, and combined p values for environmental perception and attitude themes were computed theme-wise. Descriptive statistics were produced to illustrate the respondents' level of environmental awareness. The study results revealed that the respondents had a moderate to high level of perceived knowledge about the causes and effects of environmental pollution. They also had an intention to reduce the environmental pollution in their surroundings. The score differences (p < .05) across the age groups, education levels, occupation types, and income groups were nearly all significant, except for those pertaining to the gender of the respondents. Surprisingly, only 18% of the respondents were aware of their home's and neighborhood's garbage management procedures. It is urgent to influence citizens' environmental behaviors to ensure the city's long-term sustainability. This study's findings can be used in decision-making processes regarding sustainable urban environments worldwide.
Article
In this paper a review is made from the primordia of the history of likelihood ratio tests for covariance structures and equality of mean vectors through the development of likelihood ratio tests that refer to elaborate covariance structures. Relations are established among several covariance structures, taking more elaborate ones as umbrella structures and examining then their particular cases of interest. References are made to bibliography where the corresponding likelihood ratio tests are developed and the distributions of the corresponding statistics addressed. Most of the likelihood ratio test statistics for one-way manova models where the covariance matrices have elaborate structures were developed quite recently. Also for these likelihood ratio tests a similar approach is taken. Although we start with the common test that uses unstructured covariance matrices, then we go on to consider tests with more elaborate covariance structures, and subsequently we specify them to their particular cases of interest. Some special attention is also given to the so-called Wilks Λ statistics.
Article
Full-text available
In general linear modeling (GLM), eta squared (η2) is the dominant statistic for the explaining power of an independent variable. This article discusses a less studied deficiency in η2: its values are seriously deflated because the estimates by coefficient eta (η) are seriously deflated. Numerical examples show that the deflation in η may be as high as 0.50–0.60 units of correlation and in η2 as high as 0.70–0.80 units of explaining power. A simple mechanism to evaluate and correct the artificial attenuation is proposed. Because the formulae of η and point-biserial correlation are equal, η can also get negative values. While the traditional formulae give us only the magnitude of nonlinear association, a re-considered formula for η gives estimates with both magnitude and direction in binary cases, and a short-cut option is offered for the polytomous ones. Although the negative values of η are not relevant when η2 is of interest, this may be valuable additional information when η is used with non-nominal variables.
Article
Full-text available
We report on the functional connectivity (FC), its intraclass correlation (ICC), and heritability among 70 areas of the human cerebral cortex. FC was estimated as the Pearson correlation between averaged prewhitened Blood Oxygenation Level-Dependent time series of cortical areas in 988 young adult participants in the Human Connectome Project. Pairs of areas were assigned to three groups, namely homotopic (same area in the two hemispheres), ipsilateral (both areas in the same hemisphere), and heterotopic (nonhomotopic areas in different hemispheres). ICC for each pair of areas was computed for six genetic groups, namely monozygotic (MZ) twins, dizygotic (DZ) twins, singleton siblings of MZ twins (MZsb), singleton siblings of DZ twins (DZsb), non-twin siblings (SB), and unrelated individuals (UNR). With respect to FC, we found the following. (a) Homotopic FC was stronger than ipsilateral and heterotopic FC; (b) average FCs of left and right cortical areas were highly and positively correlated; and (c) FC varied in a systematic fashion along the anterior–posterior and inferior-superior dimensions, such that it increased from anterior to posterior and from inferior to superior. With respect to ICC, we found the following. (a) Homotopic ICC was significantly higher than ipsilateral and heterotopic ICC, but the latter two did not differ significantly from each other; (b) ICC was highest for MZ twins; (c) ICC of DZ twins was significantly lower than that of the MZ twins and higher than that of the three sibling groups (MZsb, DZsb, SB); and (d) ICC was close to zero for UNR. Finally, with respect to heritability, it was highest for homotopic areas, followed by ipsilateral, and heterotopic; however, it did not differ statistically significantly from each other.
Article
Full-text available
As machine learning has gradually entered into ever more sectors of public and private life, there has been a growing demand for algorithmic explainability. How can we make the predictions of complex statistical models more intelligible to end users? A subdiscipline of computer science known as interpretable machine learning (IML) has emerged to address this urgent question. Numerous influential methods have been proposed, from local linear approximations to rule lists and counterfactuals. In this article, I highlight three conceptual challenges that are largely overlooked by authors in this area. I argue that the vast majority of IML algorithms are plagued by (1) ambiguity with respect to their true target; (2) a disregard for error rates and severe testing; and (3) an emphasis on product over process. Each point is developed at length, drawing on relevant debates in epistemology and philosophy of science. Examples and counterexamples from IML are considered, demonstrating how failure to acknowledge these problems can result in counterintuitive and potentially misleading explanations. Without greater care for the conceptual foundations of IML, future work in this area is doomed to repeat the same mistakes.
Article
We clarify fundamental aspects of end-user elicitation, enabling such studies to be run and analyzed with confidence, correctness, and scientific rigor. To this end, our contributions are multifold. We introduce a formal model of end-user elicitation in HCI and identify three types of agreement analysis: expert , codebook , and computer . We show that agreement is a mathematical tolerance relation generating a tolerance space over the set of elicited proposals. We review current measures of agreement and show that all can be computed from an agreement graph . In response to recent criticisms, we show that chance agreement represents an issue solely for inter-rater reliability studies and not for end-user elicitation, where it is opposed by chance disagreement . We conduct extensive simulations of 16 statistical tests for agreement rates, and report Type I errors and power. Based on our findings, we provide recommendations for practitioners and introduce a five-level hierarchy for elicitation studies.
Article
Full-text available
Is science in the midst of a crisis of replicability and false discoveries? In a recent article, Alexander Bird offers an explanation for the apparent lack of replicability in the biomedical sciences. Bird argues that the surprise at the failure to replicate biomedical research is a result of the fallacy of neglecting the base rate. The base-rate fallacy arises in situations in which one ignores the base rate—or prior probability—of an event when assessing the probability of this event in the light of some observed evidence. By extension, the replication crisis would result from ignoring the low prior probability of biomedical hypotheses. In this paper, my response to Bird’s claim is twofold. First, I show that the argument according to which the replication crisis is due to the low prior of biomedical hypotheses is incomplete. Second, I claim that a simple base-rate fallacy model does not account for some important methodological insights that have emerged in discussions of the replication crisis.
Article
Full-text available
Background A new technology for a self-powered acoustic tag (SPT) was developed for active tracking of juvenile fish, intended to avoid the typical battery life constraints associated with active telemetry technology. We performed a laboratory study to evaluate a subdermal tagging technique for the SPT and effects of the tag on survival, tag retention, and growth in juvenile white sturgeon ( Acipenser transmontanus ). Results Survival was associated with tag retention. White sturgeon implanted with the SPT ( n = 30) had 93% survival and tag retention by day 28, 67% by day 101, and 38% by day 595 post-tagging. Sturgeon implanted with a passive integrated transponder (PIT) tag only (control group) had 96% survival and tag retention by day 28, and through day 101 post-tagging. Fish in the PIT group were repurposed after day 101, so no comparisons with this group were made at day 595 post-tagging. Specific growth rate (SGR) for fork length was a median of 0.25% day ⁻¹ by day 28 for the SPT group, which was significantly lower than the PIT group (median: 0.42% day ⁻¹ ; n = 27). The SPT and PIT groups had similar SGR fork length by day 101 post-tagging (0.22 and 0.25% day ⁻¹ , respectively). SGR weight was also lower for the SPT group compared to the PIT group on day 28 (1.39 and 2.11% day ⁻¹ , respectively), but the difference again dissipated by day 101 (0.79 and 0.88% day ⁻¹ , respectively). Conclusion The tagging technique and placement of the SPT allowed the tag to remain upright along the flank of the sturgeon to ensure maximum battery output of the SPT; however, retention rates of the SPT were not ideal. We provided suggestions to improve the tagging technique. Suggestions included tagging fish that are > 400 mm FL, moving the incision location to extend the cavity and create a pocket for the placement of the SPT, and performing a quantitative wound-healing evaluation. Future studies are therefore recommended to evaluate these suggestions.
ResearchGate has not been able to resolve any references for this publication.