Fig 1 - uploaded by Per-Anders Esseen
Content may be subject to copyright.
Population median probabilities (dashed line), population-averaged probabilities (solid line), and ten realizations of the cluster-specific probabilities p jk 1 CS (dotted lines).
Source publication
Large-scale surveys, such as national forest inventories and vegetation monitoring programs, usually have complex sampling designs that include geographical stratification and units organized in clusters. When models are developed using data from such programs, a key question is whether or not to utilize design information when analyzing the relati...
Similar publications
Biotic and abiotic forces govern the evolution of trophic niches, which profoundly impact ecological and evolutionary processes and aspects of species biology. Herbivory is a particularly interesting trophic niche because there are theorized trade-offs associated with diets comprised of low quality food that might prevent the evolution of herbivory...
Background. This prospective study was conducted to identify a suitable alternative to birth weight and establish its cutoff point to facilitate the identification of low-birth-weight (LBW) infants in Enugu, Southeast Nigeria. Methods. The study involved newborn babies within the first 48 hours of life. Five anthropometric measurements (head, chest...
Data on how many scientific findings are reproducible are generally bleak and a wealth of papers have warned against misuses of the p-value and resulting false findings in recent years. This paper discusses the question of what we can(not) learn from the p-value, which is still widely considered as the gold standard of statistical validity. We aim...
Introduction: The androgen receptor (AR) regulates immune-related epithelial-to-mesenchymal transition (EMT), and prostate cancer (PCa) metastasis. Primary tumor-infiltrating lymphocytes (TILs) [CD3⁺, CD4⁺, and CD8⁺ TILs] are potential prognostic indicators in PCa, and variations may contribute to racial disparities in tumor biology and PCa outcome...
Mixed effects multilevel models are often used to investigate cross-level interactions, a specific type of context effect that may be understood as an upper-level variable moderating the association between a lower-level predictor and the outcome. We argue that multilevel models involving cross-level interactions should always include random slopes...
Citations
... 6. Sensitivity tests 6.1 Clustering standard errors at the firm level According to Ekström et al. (2018), the standard logistic regression model's assumption of independence among observations is a notable limitation of the standard logistic regression model. When correlations within the data are overlooked, this can lead to significantly biased standard errors in the logistic regression coefficient estimates, which are usually underestimated but sometimes overestimated (Ekström et al., 2018;Hogan and Blazar, 2000;Adam et al., 2021;Ekström et al., 2018) recommend using clustered standard errors to address this issue. ...
... 6. Sensitivity tests 6.1 Clustering standard errors at the firm level According to Ekström et al. (2018), the standard logistic regression model's assumption of independence among observations is a notable limitation of the standard logistic regression model. When correlations within the data are overlooked, this can lead to significantly biased standard errors in the logistic regression coefficient estimates, which are usually underestimated but sometimes overestimated (Ekström et al., 2018;Hogan and Blazar, 2000;Adam et al., 2021;Ekström et al., 2018) recommend using clustered standard errors to address this issue. Clustering standard errors at the firm level adjusts for intra-firm correlation, thereby preventing the underestimation of standard errors. ...
... 6. Sensitivity tests 6.1 Clustering standard errors at the firm level According to Ekström et al. (2018), the standard logistic regression model's assumption of independence among observations is a notable limitation of the standard logistic regression model. When correlations within the data are overlooked, this can lead to significantly biased standard errors in the logistic regression coefficient estimates, which are usually underestimated but sometimes overestimated (Ekström et al., 2018;Hogan and Blazar, 2000;Adam et al., 2021;Ekström et al., 2018) recommend using clustered standard errors to address this issue. Clustering standard errors at the firm level adjusts for intra-firm correlation, thereby preventing the underestimation of standard errors. ...
Purpose
This paper aims to identify key determinants of ethical conduct by examining the impact of audit committee and external auditor attributes on business bribery, corruption and fraud (BCF) in Gulf Cooperation Council (GCC) countries.
Design/methodology/approach
A logistic regression model explores the relationship between the audit committee, external auditor attributes and BCF occurrences in GCC-listed firms from 2020 to 2023. Robust standard errors control for firm clustering and heteroscedasticity.
Findings
The authors found a significant positive relation between audit committee size, meetings and members’ expertise and BCF. Also, the authors found a positive relation between audit fees and BCF and a negative relation between audit firm size and BCF.
Research limitations/implications
The paper provides valuable insights for enhancing corporate governance and reducing BCF in GCC countries.
Practical implications
For auditors, establishing robust audit committees and strengthening regulatory frameworks improve BCF detection. Regulators should mandate stricter audit committee requirements and enforce internal audit regulations to combat BCF. For investors, prioritizing companies with more extensive, reputable auditors and sufficient audit fees may signal lower BCF risks, offering valuable insights for governance improvements.
Social implications
The study expands agency theory by investigating how audit committee and external auditor attributes influence BCF in GCC markets, where weak governance frameworks exacerbate corruption risks, extending the theory’s relevance to emerging markets.
Originality/value
The paper challenges traditional views on the effectiveness of audit committees, showing how specific attributes can hinder BCF detection. In addition, it highlights the critical role of large audit firms in reducing BCF risks in emerging markets.
... In simple words, the independent variables are not in the form of the binary outcome, so the logit functions are used to fit the data in the form of [0, 1] and this is known as a logistic regression classifier. Log odds refer to the ratio between the likelihood of = 1 and the likelihood of = 0 [40]. LR is an approach to fitting models using logistic functions [41]. ...
... In general, statistical models assume independence and identical distribution of data (Overmars et al., 2003). Latimer et al. (2006) and Ekström et al. (2018) highlighted that the use of models that neglect these assumptions can result in incorrect conclusions due to inaccurate parameter estimates and biased standard errors. An alternative to overcome this problem is application of the logistic mixed effects model (LMEM) (Twisk, 2006). ...
In this study, we applied a multivariate logistic regression model to identify deforested areas and evaluate the current effects on environmental variables in the Brazilian state of Rondônia, located in the southwestern Amazon region using data from the MODIS/Terra sensor. The variables albedo, temperature, evapotranspiration, vegetation index, and gross primary productivity were analyzed from 2000 to 2022, with surface type data from the PRODES project as the dependent variable. The accuracy of the models was evaluated by the parameters area under the curve (AUC), pseudo R², and Akaike information criterion, in addition to statistical tests. The results indicated that deforested areas had higher albedo (25%) and higher surface temperatures (3.2 °C) compared to forested areas. There was a significant reduction of the EVI (16%), indicating water stress, and a decrease in GPP (18%) and ETr (23%) due to the loss of plant biomass. The most precise model (91.6%) included only surface temperature and albedo, providing important information about the environmental impacts of deforestation in humid tropical regions.
... In each cluster site, the total number of streets, homes visited, homes visited that met eligibility criteria, and refusal rates and homes where no one appeared present/opened the door at time of visit were tabulated. Because of the cluster sampling design, the "logistic regression for spatially correlated data collected using complex sampling designs…[included] unweighted and weighted analyses" (Ekström et al., 2018) performed by the PI in Stata. There was no major difference between the sets of estimated coefficients. ...
In recent decades, the Government of Liberia (GOL) and international partners have prioritized combatting child sexual abuse, including illicit and harmful early sexual practices involving girls and adult men. Previous studies indicate high rape rates among Liberian female populations, yet more research on specific forms of abuse is needed to better understand the magnitude of the problem. Applying Bronfenbrenner’s ecological framework, this paper presents the results of a 2018 mixed-methods study of 719 Liberian young women (ages 18–35) and 493 of their parents, from urban/rural districts in Montserrado. The purpose is to contribute a large-scale representative study establishing the rate of female statutory rape and key correlates. The survey captures data measuring early sexual activity (ESA), education, socio-economic status, demographics, and knowledge, attitudes, and behaviors (KABs) associated with cultural ethnic customs, rural/urban settings, and gender rights. The statistical analysis indicates that 35.1% (95% CI 30.1–37.1) of Liberian women report experiencing ESA that qualifies as statutory rape under Liberian law. Age, ethnicity, location, SES, education, and most individual KABs are not correlated with lower rates (p < 0.05). The following are associated (unadjusted odds ratio [OR]): advanced education (OR 2.63, 95% CI 1.26–5.50); saying no to sex (0.57, 0.36–0.89); equitable work opportunities (2.15, 1.28–3.62); living with a man as a minor (0.47, 0.31–0.74); and early pregnancy (0.45, 0.32–0.65). Additionally, 39.7% (95% CI 31.2–44.1) of male assailants hold school-based occupations. As the ecology of girls is increasingly shifting in low-income nations, it is crucial to better understand the face of abuse to protect children’s welfare.
... Moreover, we compared the different methods under two common types of bias. From a causal perspective, sampling bias can be categorized into three types: 1) spatial clustering/autocorrelation of samples; 2) non-spatial clustering caused by the environment (variables); and 3) a combination of the first two (Ekström et al., 2018;Hoque et al., 2020;Tirozzi et al., 2022;Vollering et al., 2019). The observation station bias simulated in this study was similar to the first sampling bias resulting from spatial autocorrelation. ...
Correcting sampling bias in species distribution models (SDMs) is challenging. The difficulty lies in accurately identifying and quantifying bias and the scarcity of samples, which greatly impedes the implementation of bias correction. Current methods often adjust the distribution of presence or background points within geographic or environmental spaces to correct the sampling bias in probability estimation within SDMs. However, these methods may lead to information loss, rely on subjective assumptions, and often separate geography and environment when correcting for bias. This study proposes a novel and easily implementable method termed “aggregation background.” This method selects background data based on the aggregation degree of presence points in the geographic and environmental feature space, thereby approximating the representation and correction of sampling bias in the presence samples. We compared this new method with other prevalent sampling bias correction methods in the existing literature by analyzing ecological authenticity. Under varying biases and sample sizes, the aggregation background and geographic filtering methods achieved more accurate species distribution predictions compared to the target group background and other methods. Notably, when the sample size was small (≤70), the aggregation background was superior to that obtained using the geographic filtering method. These findings underscore the effectiveness of the aggregation background in improving bias correction using limited available presence sample data, without relying on assumptions about sampling bias. Our method provides a new approach for correcting complex unknown biases in SDMs.
... KLR is one of the machine-learning methods of LR development [18]. In its development, this KLR is used to overcome overfitting and non-optimized accuracy of LR modeling. ...
Stroke is the second leading cause of death in the world and has a high contribution to disability. Many stroke sufferers do not recognize the symptoms of stroke or do not even have knowledge related to stroke. This causes many sufferers to be late to the hospital for first aid. This can lead to even greater risk. One effort that can be done is to find out the factors that influence stroke so that it can be prevented. This study aims to create a classification model that can be used to predict the type of stroke and to find out what factors have a significant effect on the type of stroke. The method used is Kernel Logistic Regression (KLR) which is the development of Logistic Regression (LR) by using a linear combination of regularized LR. In the modeling, two scenarios for the distribution of training and testing data were also carried out, namely, scenarios 7:3 and 8:2. The results of the accuracy of the two scenarios, are 75.97% for scenario 8:2 and 73.97% for scenario 7:3. The accuracy of the KLR is 92.12% which increased by 16.15% from the LR. From the modeling results for scenario 8:2, it was found that four predictors affected the type of stroke significantly, namely cholesterol level, temperature, length of stay, and disease history.
... A benefit of LR is that the level of influence each variable has on the classification is output as regression coefficients, providing additional information into which AIS variables are essential in the classification of wildlife-viewing vessel behaviour. Without considering the complexity of spatial correlation between the binary classes defined in this study, regression coefficient estimates may be biased and errors may be underestimated (Ekström et al., 2018). It is recognized that an attempt to remedy this by including latitude and longitude as LR covariates may be insufficient. ...
A continued rise in global ocean vessel activity has led to growing concerns for the health of whales around the world. Of particular interest is the increase in recreation vessels, including those related to whale-watching activities. However, there is an absence of established procedures to identify vessels engaged in whale-watching, thus limiting the ability to quantify whale-watching impacts on whales. This study evaluates three computational classification models and their ability to utilize Automatic Identification System (AIS) data to describe wildlife-viewing vessel behaviour. These models include a density-based spatial clustering application with noise (DBSCAN), a hidden Markov model (HMM), and logistic regression (LR), all of which have been previously used to classify vessel behaviour in industries, such as fishing, shipping, and marine security. The results of each model's classification were validated against observed whale sighting data using statistical performance and accuracy metrics. The findings suggest that all three classification models sufficiently detect wildlife-viewing behaviour, but the HMM and LR had preferable performance metrics compared to DBSCAN. Further, although LR provides an informative glance at which AIS variables are most important to detecting wildlife-viewing events, the HMM has comparable performance metrics and requires less data processing. Therefore, this study recommends the use of HMM due to its computational efficiency and because it provides an accurate classification of wildlife-viewing behaviour for whale-watching vessels. The results of this study can be used to support policy decisions, monitor regulation compliance, and inform marine conservation initiatives.
... Ekstrom used Monte Carlo simulations to compare the performance of standard logistic regression models with two approaches for modeling correlated binary responses and cluster-specific and population-averaged logistic regression models [1]. ...
Environmental factors have a direct impact on the development of agriculture, so it is particularly important to detect the environment of the agroecological cycle index system. Although there are some plans for environmental monitoring, most of them are environmental monitoring for a larger concept, but this study is specific to the actual object. This article takes farmland as the research object and comprehensively uses embedded technology, information monitoring technology, and network technology based on the analysis of the research status and system application of the farmland environmental monitoring system. This article studies and designs a real-time online remote monitoring system for farmland ecological environment based on embedded architecture and GPRS technology. By using sensors to obtain farmland information, it is displayed on the monitoring center and mobile client, and data information is obtained in real time. It manages and protects the farmland environment in advance. This article measures and analyzes the data from the final experiment. The experimental results show that the monitoring system can accurately collect farmland data in real time. And through the embedded server and the Internet, the data can be remotely transmitted in real time and displayed on the monitoring center and mobile client software. The results showed that PM2.5 was 30 μg/m3 at 20:00. The experimental data have the characteristics of real time and stability, which can meet the requirements of real-time network remote monitoring and transmission of data.
... In simple words, the independent variables are not in the form of the binary outcome, so the logit functions are used to fit the data in the form of [0, 1] and this is known as a logistic regression classifier. Log odds refer to the ratio between the likelihood of = 1 and the likelihood of = 0 [30]. Logistic regression is an approach to fitting models using logistic functions [31]. ...
Drastic change in climatic conditions is a very big and challenging task for people around the globe. Most of the biological, constructional, transportation and agricultural sectors get affected due to uneven weather conditions, i.e. flood, rainfall, drought, etc. As part of the weather system, rainfall being most prominent phenomena, its rate is treated as one of the most important variables. Meteorological scientists try to identify the parameters of the atmosphere such as temperature, sunshine, cloudiness and humidity of the earth by applying conventional techniques and developing a prediction model. These days, Machine Learning (ML) techniques are more evolving and give more accurate results than the traditional approaches. ML is a subset of artificial intelligence (AI) which is used in this paper for predicting the next day's rainfall from the past 10 year’s weather dataset of Australia. This paper presents the ML classifiers such as Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Light Gradient Boost Machine (LGBM), Cat Boost (CB), and Extreme Gradient Boost (XGB) to predict the rainfall of the next day. The Python software package having an inbuilt library like Pandas, Numpy Scikitlearn, and Matplotlib is extensively used for data management, mathematical computation, ML modeling, and visualization tools, respectively. This is followed by sequential stages of data visualization, training, testing, modeling, and cross-validation. The evaluation metrics like Area under the Receiver Operating Characteristic (AUROC) curve, recall, accuracy, precision, and Cohen kappa are used to check the performance of ML algorithms.
... is that an individualized fitting approach, where separate binary logistic models are fitted, is useful for finding suitable non-linear transformations.Without taking the sampling design into account, we used this approach for finding preliminary main-effects models. After this stage, the preliminary main-effect models were refitted taking the complex sampling design of the NFI into account (Appendix S2; cf.Ekström et al., 2018), after which we considered possible interactions between the main effects. Any model selected was considered preliminary until we evaluated its fit. ...
Thin, hair‐like lichens (Alectoria, Bryoria, Usnea) form conspicuous epiphyte communities across the boreal biome. These poikilohydric organisms provide important ecosystem functions and are useful indicators of global change. We analyse how environmental drivers influence changes in occurrence and length of these lichens on Norway spruce (Picea abies) over 10 years in managed forests in Sweden using data from >6000 trees. Alectoria and Usnea showed strong declines in southern‐central regions, whereas Bryoria declined in northern regions. Overall, relative loss rates across the country ranged from 1.7% per year in Alectoria to 0.5% in Bryoria. These losses contrasted with increased length of Bryoria and Usnea in some regions. Occurrence trajectories (extinction, colonization, presence, absence) on remeasured trees correlated best with temperature, rain, nitrogen deposition, and stand age in multinomial logistic regression models. Our analysis strongly suggests that industrial forestry, in combination with nitrogen, is the main driver of lichen declines. Logging of forests with long continuity of tree cover, short rotation cycles, substrate limitation and low light in dense forests are harmful for lichens. Nitrogen deposition has decreased but is apparently still sufficiently high to prevent recovery. Warming correlated with occurrence trajectories of Alectoria and Bryoria, likely by altering hydration regimes and increasing respiration during autumn/winter. The large‐scale lichen decline on an important host has cascading effects on biodiversity and function of boreal forest canopies. Forest management must apply a broad spectrum of methods, including uneven‐aged continuous cover forestry and retention of large patches, to secure the ecosystem functions of these important canopy components under future climates. Our findings highlight interactions among drivers of lichen decline (forestry, nitrogen, climate), functional traits (dispersal, lichen colour, sensitivity to nitrogen, water storage), and population processes (extinction/colonization).