To read the full-text of this research, you can request a copy directly from the author.
... Using the HR max values, we define the reaction to the road object as the difference between the HR at the moment of change point and the HR max value. We use Linear Mixed Effect (LME) models to understand the variation in drivers' HR responses around each environmental attribute [46], [47]. LME models are similar to a simple linear regression with taking into account of the variability across participants in their responses as random factors while accounting for the effect of fixed factors (each perturbation). ...
... In the case of facing a dependent variable that is of a count nature, we use a generalized linear model with a negative binomial process distribution [48]. The idea behind LME is described in detail in [46], [47]. The analysis above is performed through the LME4 [49] package written in R programming language [50]. ...
... Based on the notation provided in [47], we can define a LME model as: ...
Understanding and mitigating drivers' negative emotions, stress levels, and anxiety is of high importance for decreasing accident rates, enhancing road safety, and providing a healthy lifestyle to the community of drivers. While detecting drivers' stress and negative emotions can significantly help with this goal, understanding what might be associated with increases in drivers' negative emotions and high stress level, might better help with planning interventions. While studies have provided significant insight into detecting drivers' emotions and stress levels; not many studies focused on the reasons behind changes in stress levels and negative emotions. In this study, by using a naturalistic driving study database, we analyze the changes in the driving scene, including road objects and the dynamical relationship between the ego vehicle and the lead vehicle with respect to changes in drivers' psychophysiological metrics (i.e., heart rate (HR) and facial expressions). We find that different road objects might be associated with varying levels of increase in drivers' HR as well as different proportions of negative facial emotions detected through computer vision. Our results indicate that larger vehicles on the road, such as trucks and buses, are associated with the highest amount of increase in drivers' HR as well as negative emotions. Additionally, we provide evidence that shorter distances to the lead vehicle in naturalistic driving, as well as the higher standard deviation in the distance, might be associated with a higher number of abrupt increases in drivers' HR, depicting a possible increase in stress level. Lastly, our results indicate more positive emotions, lower facial engagement, and a lower abrupt increase in HR at a higher speed of driving, which often happens in highway driving.
... The function used to find the weighting function in robust regression is an objective function [8] . One of the weighting functions that can be used is the Tukey Bisquare weighting function as follows: The c value is called the tunning constant, and the tunning constant for the Tukey Bisquare weighting function in the M-estimator estimation method is c = 4,685 [8] ...
... The function used to find the weighting function in robust regression is an objective function [8] . One of the weighting functions that can be used is the Tukey Bisquare weighting function as follows: The c value is called the tunning constant, and the tunning constant for the Tukey Bisquare weighting function in the M-estimator estimation method is c = 4,685 [8] ...
... Because function isn't linier, then the parameter estimation is solved by the iterative weighting least squares estimation method known as Iteratively Reweighted Least Square (IRLS) (Fox, 2002). For parameters where m is the number of iterations, the initial estimate 0 is as follows: ...
Spatial regression model is a model used to determine the relationship between response variables and predictor variables that have spatial influence in them. If the two variables have spatial influence, then the model that will be formed is the Spatial Durbin Model. One of the causes of the inaccuracy of the spatial regression model in predicting is observations of outliers. Removing outliers in spatial analysis can change the composition of spatial effects on the data. One method of settlement due to outliers in the spatial regression model is to use robust spatial regression. The application of the M-estimator parameter estimator principle is done in estimating the coefficient of spatial regression parameters that are robust to outliers. The results of modelling by applying the principle of M-estimator estimator on estimating the robust Spatial Durbin Model regression parameters are expected to be able to accommodate the existence of outliers in the spatial regression model. One example of the application of the Spatial Durbin Model Robust is the case of life expectancy modelling.
... We find a strong positive Pearson correlation of 0.65. 15 The GVIFs are transformed to cVIFs via the equation c = д 1 d where д is the GVIF and d is the degrees of freedom of the covariate [12]. The degrees of freedom for a numeric covariate is one and for a categorical covariate is the number of categories. ...
... Specifically, the congruence principle specifies congruence between the data space and the analysis space. Refer to Iacus et al.[18, Section 4.2] for further discussion.12 In other words, the maximum number of matchings in a stratum is the minimum of the treatment and control samples sizesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, No. 1, Article 8. Publication date: March 2018. ...
Mobile users today interact with a variety of mobile device types including smartphones, tablets, smartwatches, and others. However research on mobile device type substitution has been limited in several respects including a lack of detailed and robust analyses. Therefore, in this work we study mobile device type substitution through analysis of multidevice usage data from a large US-based user panel. Specifically, we use regression analysis over paired user groups to test five device type substitution hypotheses. We find that both tablets and PCs are partial substitutes for smartphones with tablet and PC ownership decreasing smartphone usage by about 12.5 and 13 hours/month respectively. Additionally, we find that tablets and PCs also prompt about 20 and 57 hours/month respectively of additional (non-substituted) usage. We also illustrate significant inter-user diversity in substituted and additional usage. Overall, our results can help in understanding the relative positioning of different mobile device types and in parameterizing higher level mobile ecosystem models.
... A new model will be built by using a similar concept with different methods using robust regression. Robust regression is a method used in statistics as an alternative when linear least-squares estimates falter to fit a model to data that may contain non-normal error distribution or the errors exhibit heavy tails [38]. Robust regression aims to provide more stable estimates of the relationship between variables by minimizing the influence of outliers with the use of weighting functions. ...
... One commonly used weighting function in robust regression is the bisquare. The objective and weight functions for bisquare are equations (2) and (3), respectively [38]: ...
There are many methods that discuss the prediction of flyrock distance in blasting operation, but none of them specifically discusses flyrock distance in sedimentary rock with low strength. The empirical method based on a statistical approach also has no specific research on it. This study aims to obtain a formula for predicting flyrock distance due to blasting on a sedimentary rock with low strength using ammonium nitrate fuel oil. A total of 196 samples were obtained from the blasting that has been operated. The variables included for building the new prediction model of flyrock distance are stemming, blast-hole height, powder factor, and average charge per blast hole. The analysis was carried out using a statistical approach with the concept of regression and correlation. Unlike the previous model that applied a dimensional approach, the new model set each of the predictor variables to have their respective regression coefficients in order to see how they play a role in predicting the flyrock distance. The results show that burden, stemming, blast-hole height, powder factor, and average charge per blast hole significantly affect the flyrock distance. The variance in flyrock distance can be explained uniquely 3.50% by burden, 10.74% by stemming, 2.55% by blast-hole height, 2.32% by powder factor, and 2.76% by average charge per blast hole. The new proposed model of flyrock distance is better than the previous model by looking at the mean absolute percentage error. To predict the flyrock distance of sedimentary rock with low strength, the new model can be used.
... A more appropriate solution is to use robust regression (Berk, 1990;Fox, 2002;Huang et al., 2015;Rousseeuw & LeRoy, 1988;Western, 1995). According to Fox, (2002), "a fitting criterion that is not as vulnerable as least squares to unusual data" (p. 1). ...
... A more appropriate solution is to use robust regression (Berk, 1990;Fox, 2002;Huang et al., 2015;Rousseeuw & LeRoy, 1988;Western, 1995). According to Fox, (2002), "a fitting criterion that is not as vulnerable as least squares to unusual data" (p. 1). It is a good compromise because the outlier cases are kept in the analysis but are weighted differently. ...
Media outlets tend to cover rare events like school shootings. However, some school shootings receive more media coverage than others, and little is empirically known why, or what school shooting characteristics might attract greater media attention. This study addresses this gap and conducts a distortion analysis using data from The American School Shooting Study (TASSS), a national, open-source database. TASSS includes all publicly known shootings that resulted in at least one injury on K-12 school grounds in the United States between January 1, 1990, and December 31, 2016. The findings reveal that school shooters with a criminal record, who have psychological issues, committed a shooting post-Columbine, and who injured or killed more victims received more coverage.
... Fungsi yang digunakan untuk mencari fungsi pembobot pada regresi robust adalah fungsi obyektif [8] . Salah satu fungsi pembobot yang dapat digunakan adalah fungsi pembobot Tukey Bisquare sebagai berikut: Nilai c disebut tunning constant, dan tunning constant untuk fungsi pembobot Tukey Bisquare pada metode estimasi M-estimator adalah c = 4,685 [8] . ...
... Fungsi yang digunakan untuk mencari fungsi pembobot pada regresi robust adalah fungsi obyektif [8] . Salah satu fungsi pembobot yang dapat digunakan adalah fungsi pembobot Tukey Bisquare sebagai berikut: Nilai c disebut tunning constant, dan tunning constant untuk fungsi pembobot Tukey Bisquare pada metode estimasi M-estimator adalah c = 4,685 [8] . ...
Spatial regression is a model used to determine relationship between response variables and predictor variables that gets spatial influence. If there are spatial influences on both variables, the model that will be formed is Spatial Durbin Model. One reason for the inaccuracy of the spatial regression model in predicting is the existence of outlier observations. Removing outliers in spatial analysis can change the composition of spatial effects on data. One way to overcome of outliers in the spatial regression model is by using robust spatial regression. The application of M-estimator is carried out in estimating the spatial regression parameter coefficients that are robust against outliers. The aim of this research is obtaining model of number of life expectancy in Central Java Province in 2017 that contain outliers. The results by applying M-estimator to estimating robust spatial durbin model regression parameters can accommodate the existence of outliers in the spatial regression model. This is indicated by the change in the estimating coefficient value of the robust spatial durbin model regression parameter which can increase adjusted R2 value becomes 93,69% and decrease MSE value becomes 0,12551.Keywords: Outliers, M-estimator, Spatial Durbin Model, Number of Life Expectancy.
... The model for handling such cases consists of "fixed" and "random" effects. Fixed effect" can be defined as those parameters that differ within individual but are constant between individuals (Allison et al. [19]) while random effect are the parameters that vary randomly across individuals (Verbeke et al. [20]) The linear mixed model as proposed by Laird et al. 1982 [21] and described by Fox et al. [22] is; ...
... In equation 2.8, R is qˆq covariance matrix for random effect and D is n iˆni covariance matrix for errors in individual i (Fox et al. [22]). ...
The migratory birds stop at different stopover sites during migration. The presence of
resources in these stopover sites is essential to regain the energy of these birds. This thesis aims to compare the resource abundance and intake at the two stopover sites: Reda and Wisla river estuaries. How a bird’s mass changes during its stay at an estuary is considered as a proxy for the resource abundance of a site. The comparison is made on different subsets, including those which has incomplete data, i.e. next day is not exactly one day after the previous capture. Multiple linear regression, Generalized additive model, and Linear mixed effect model are used for analysis. Expectation maximization and an iterative predictive process is implemented to deal with incomplete data. We found that "Reda" has higher resource abundance and intake as compared to that of Wisla river estuary.
... y is the value of the dependent variable , is the value of the parameters of the model, xi stands for matrix of independent variables and iξ is the limit of the random error that is supposed to be distributed naturally. The parameters of the model are often estimated in the maximum likelihood (ML), the OLS or other methods, by using the following relationship [3][4][5]: ...
... The larger weight for the close and distant observations has the lowest weight. The multiple kernel function has the following functions [4,5,9]: 1-The uniform kernel function and its formula: Before estimating the polynomial models, the observations are weighted through the use of Nadaraya-Watson that the sample size for a stage is known by two phases: 1-For the estimation of (yi) a fixed number is taken. 2-The closest neighbor (k nearest neighbors) which is a complementary function by limiting the size of the sample in the learning phase to the value of (k) of observation ...
The importance of statistics appears in trying of demonstrating different phenomena by models that are nearer to the reality. These models may be causative and are built on the basis of reason and result this is called Regression Model. It has function shape and based on specific assumptions. But, sometimes we come up with a more flexible approach in case of absence of the knowledge of the studied phenomena or the first time made experiment or when we can't mention the causative function between the variables. This type of models is called Non parametric Regression. It is a type of regression in which the value of the independent variable doesn't take a specific shape, but built from information taken from the data and this requires a sample of volume bigger than the usual volume. For the parametric regression, the data suggest the structure of model and parameter estimation. Due to the importance of the petroleum products in lives of people that is considered as a source of civil and civilized development. Three petroleum products are taken (white oil, diesel oil, and fuel oil) through the non-parametric regression application. Besides, five non-parametric methods are taken (Lowes, Robust Lowes, Mean, Median, and Polynomial). They have been compared between each other, and we concluded that the robust non-parametric regression is the best way to compare the value of mean square error.
... To such an aim, we use one out of three M-estimators for this coefficient: either the familiar Least Square (LS) estimator, the Huber estimator, or the Tukey bisquare (or biweight) estimator. The two latter are robust estimators [12]. Then for each k = j, we compute the estimatesâ ij|k according to one of these three estimators and derive the p-value p ij,k from the standard significance test: ...
In this paper, we propose a novel inference method for dynamic genetic networks which makes it possible to face with a number of time measurements n much smaller than the number of genes p. The approach is based on the concept of low order conditional dependence graph that we extend here in the case of Dynamic Bayesian Networks. Most of our results are based on the theory of graphical models associated with the Directed Acyclic Graphs (DAGs). In this way, we define a minimal DAG G which describes exactly the full order conditional dependencies given the past of the process. Then, to face with the large p and small n estimation case, we propose to approximate DAG G by considering low order conditional independencies. We introduce partial qth order conditional dependence DAGs G(q) and analyze their probabilistic properties. In general, DAGs G(q) differ from DAG G but still reflect relevant dependence facts for sparse networks such as genetic networks. By using this approximation, we set out a non-bayesian inference method and demonstrate the effectiveness of this approach on both simulated and real data analysis. The inference procedure is implemented in the R package 'G1DBN' freely available from the CRAN archive.
... The non-normal distributions observed in the focal and control variables make additive nonparametric regression more suitable than OLS regression for testing Hypothesis 3. 2 We implemented this method using the 'R' package mgcv, which provides nonparametric estimates for generalized additive models (R: MGCV, 2022). In the first column of Table 4, s(Variable) indicates the use of smoothing splines for that variable; we retained the default smoothing parameters of the s function in the mgcv package (Fox, 2002). Based on an ANOVA test of two models, one with only the control variables and the second with the addition of knowledge intensity, we find evidence (ΔR 2 = 0.14; p = 0.004) of an association between the knowledge intensity of a domain and the lognormal scale Note: n = 44; † p < 0.10; *p < 0.05; **p < 0.01; ***p < 0.001. ...
This study extends emerging theories of star performers to digital platforms, an increasingly prevalent entrepreneurial context. It hypothesizes that the unique characteristics of many digital platforms (e.g., low marginal costs, feedback loops, and network effects) produce heavy-tailed performance distributions, indicating the existence of star entrepreneurs. Using longitudinal data from an online learning platform, proportional differentiation is identified as the most likely generative mechanism and lognormal distribution as the most likely shape for distributions of entrepreneurial performance in digital contexts. This study contributes theory and empirical evidence for non-normal entrepreneurial performance with implications for scholars and practitioners of digital entrepreneurship. Executive summary The performance of 'star' entrepreneurs on digital platforms can be 100-or 1000-fold that of their average competitors. When performance is plotted as a distribution, star performers reside in the tails of these distributions. The assumption of a normal distribution of performance in the bulk of entrepreneurship research implies that most performance observations are clustered around the average. Instead, most entrepreneurs on digital platforms exhibit sub-par performance, while a minority captures a major fraction of the generated value. This paper argues that the unique characteristics of digital contexts-nearly zero marginal costs, feedback loops, and network effects-drive such extreme performance. Using data from Udemy, a digital platform where independent producers (entrepreneurs) offer educational videos (digital products) to a large pool of potential customers, we provide evidence that entre-preneurial performance is lognormally rather than normally distributed. We further identify proportional differentiation as the underlying generative mechanism. Thus, star performance on digital platforms is not driven only by the rich-get-richer effect. Instead, both the initial value of performance and the rate at which it is accumulated play important roles in explaining extreme performance outcomes. This discovery has important implications for entrepreneurship theory and practice. Our findings, for example, signal that some late entrants who successfully pursue high customer accumulation rates in domains with high knowledge intensity can become star entrepreneurs.
... Additionally, the slope of change in a peconductancealevel signals across different conditions can be very different relative to another participant. LMM is similar to linear regression in measuring the main effects in a study but with a difference that it accounts for random changes across participants referred to as random factors [47], [48] through random intercept and random slope. An LMM is defined as follows: ...
The National Highway Traffic Safety Administration reported that the number of bicyclist fatalities has increased by more than 35% since 2010. One of the main reasons associated with cyclists' crashes is the adverse effect of high cognitive load due to distractions. However, very limited studies have evaluated the impact of secondary tasks on cognitive distraction during cycling. This study leverages an Immersive Virtual Environment (IVE) simulation environment to explore the effect of secondary tasks on cyclists' cognitive distraction through evaluating their behavioral and physiological responses. Specifically, by recruiting 75 participants, this study explores the effect of listening to music versus talking on the phone as a standardized secondary tasks on participants' behavior (i.e., speed, lane position, input power, head movement) as well as, physiological responses including participants' heart rate variability and skin conductance metrics. Our results show that (1) listening to high-tempo music can lead to a significantly higher speed, a lower standard deviation of speed, and higher input power. Additionally, the trend is more significant for cyclists who had a strong habit of daily music listening (> 4 hours/day). In the high cognitive workload situation (simulated hands-free phone talking), cyclists had a lower speed with less input power and less head movement variation. Our results indicate that participants' HRV (HF, pnni-50) and EDA features (numbers of SCR peaks) are sensitive to cyclists' cognitive load changes in the IVE simulator.
... Bootstrap methods avoid making distributional assumptions by resampling from the observed data over a large number of iterations (10,000 iterations in our case); from the resampled distribution, nearly any statistic (means and quantiles in our case) can be subsequently computed. Our bootstrap estimation procedure first used model-based resampling (Fox, 2002) to take into account uncertainty associated with survey sample weights. Next, respondents were selected into the bootstrapped survey sample with probability equal to their respective survey sample weights. ...
Context
School-based student protests have received little scholarly attention, yet they have the potential to impact the school community, students’ civic development, and larger social movements. Principals are key actors in responding to school-based student protests. As school leaders, principals’ actions affect the outcome of student protests and shape many students’ first experiences as activists.
Purpose
This study examines U.S. public high school principals’ responses to school-based student protests in 2018, a year of heightened protest activity in response to gun violence in schools. The purpose of our study is to understand how a national sample of principals responded to student protests and to quantify general trends in their responses.
Research Design
Using a mixed methods approach, we surveyed 491 principals during the summer of 2018; follow-up interviews were conducted with 38 principals. Analyses are grounded in the Deter-Manage-Educate framework, a new conceptual framework that we develop in this paper, organized around the three broad goals principals pursue when responding to student protests. Using this framework, we determined how and how many principals deterred, managed, and educated.
Results
Findings show that very few principals outright deterred student protests. Nearly all principals managed by setting parameters around protests in an effort to balance students’ right to free speech with concerns for order and safety. A majority of principals also educated, using student protest as an opportunity to encourage civic development. Our findings suggest that an important distinction exists between principals who channel students toward (or away from) a particular manner of protest and principals who facilitate reflection to help students realize their own vision of civic engagement.
Implications
This study has implications not only for principals, but also for district leaders and educational leadership organizations: Although many principals receive support for managing the logistical (and legal) challenges of responding to student protests, more attention needs to be directed toward helping principals leverage the educative opportunities that student protest can provide.
... Le curve LOESS sono una generalizzazione della regressione polinomiale, si basano sull'adattamento di modelli di regressione polinomiali di primo o di stimare una funzione non lineare continua che descriva la parte deterministica della variabilità osservata, punto per punto. 10,11 Modelli età, periodo, coorte ...
The SENTIERI Project analyses the health profile of the populations residing in Italian national priority contaminated sites in specific calendar periods using a cross-sectional approach. An aspect that has not been evaluated so far is the analysis over a long period, for understanding the changes in health profiles over time and studying them also in function of the changes occurred in the territories. This article studies temporal trends by birth cohort and calendar period for overall mortality and lung cancer mortality from 1980 to 2018, separately for men and women, for three sites: Priolo (Sicily Region, Southern Italy), Pitelli (Liguria Region, Northern Italy), and Terni-Papigno (Umbria Region, Central Italy). A method for selecting the temporal model that best fits the data is then proposed. General mortality presents complex temporal profiles when considering cumulative risks, and usually the most important temporal axis is the birth cohort for cumulative SMRs (i.e., after adjusting for trends in the reference population). For lung cancer, the most important time axis is the birth cohort and the age-cohort model is the most appropriate, in particular for men of Priolo and Terni.
... (1) for LMER models, where y ij is the value of one of our life history response variables for the jth observation of n i observations on the ith tree, α is the intercept, the various β values are fixed-effect coefficients to be estimated (β 1 = % bryophyte cover), x values are the fixed effect regressors, γ i is the host tree-specific error term (our random effect), and ε ij is the plot-specific unobservable contribution (Fox, 2002;Zuur et al., 2009). GLMMs took the same form, but with a relevant link function, g, associated with y ij . ...
The co-occurrence of orchids and bryophytes at occupied sites on host trees has been documented on several occasions, particularly in the tropics, and it may represent an important symbiotic relationship that supports epiphytic orchid populations. Despite continuing interest from ecologists, the specific life history traits that are affected by associations of orchids with bryophytes, and how they are affected, remain unclear. Clarifying the nature of the association will improve our understanding of orchid ecology and have practical implications for applied conservation efforts, particularly for rare species in restricted habitats. In this study, we explored the relationship between the abundance of bryophyte cover on host trees and various life history traits related to size, survival and reproduction of a rare tropical epiphytic orchid, Lepanthes caritensis. The results demonstrated that bryophyte abundance on host trees had variable effects on individual aspects of an orchid’s life history. Orchid recruitment was positively correlated with the abundance of bryophyte cover, but survival and flower production were negatively correlated with bryophyte abundance. Our findings revealed that an apparent commensal symbiotic relationship between L. caritensis and bryophytes exists at the recruitment stage, but this is lost during later life stages, when the abundance of bryophytes appears to negatively affect this species.
... The smoothed fit used locally estimated scatterplot smoothing (LOESS). 18 A subgroup analysis was performed on groups divided into the following patient size (effective diameter) categories to parallel prior publications: 21-25 cm, 25-29 cm, 29-33 cm, 33-37 cm, and 37-41 cm. Most exams were included in one of these categories; only 29 out of 13,320 exams were excluded in this analysis. ...
Objectives
To provide our oncology-specific adult abdominal-pelvic CT reference levels for image noise and radiation dose from a high-volume, oncologic, tertiary referral center.
Methods
The portal venous phase abdomen-pelvis acquisition was assessed for image noise and radiation dose in 13,320 contrast-enhanced CT examinations. Patient size (effective diameter) and radiation dose (CTDIvol) were recorded using a commercial software system, and image noise (Global Noise metric) was quantified using a custom processing system. The reference level and range for dose and noise were calculated for the full dataset, and for examinations grouped by CT scanner model. Dose and noise reference levels were also calculated for exams grouped by five different patient size categories.
Results
The noise reference level was 11.25 HU with a reference range of 10.25–12.25 HU. The dose reference level at a median effective diameter of 30.7 cm was 26.7 mGy with a reference range of 19.6–37.0 mGy. Dose increased with patient size; however, image noise remained approximately constant within the noise reference range. The doses were 2.1–2.5 times than the doses in the ACR DIR registry for corresponding patient sizes. The image noise was 0.63–0.75 times the previously published reference level in abdominal-pelvic CT examinations.
Conclusions
Our oncology-specific abdominal-pelvic CT dose reference levels are higher than in the ACR dose index registry and our oncology-specific image noise reference levels are lower than previously proposed image noise reference levels.
Advances in knowledge
This study reports reference image noise and radiation dose levels appropriate for the indication of abdomen-pelvis CT examination for cancer diagnosis and staging. The difference in these reference levels from non-oncology-specific CT examinations highlight a need for indication-specific, dose index and image quality reference registries.
... Time-series data are typically spatiotemporally autocorrelated, a violation of the independence assumption which can result in over-fitted models (Arts et al. 2008). Linear-mixed effects models (LMM) are able to handle correlated errors where data observations are not independent, and offers an objective way of ranking model explanatory power (Fox 2002). Covariates and their combinations were selected on the basis of potential biological relevance, availability of data, and correlation between covariates. ...
Animals are rarely distributed uniformly throughout their range, rather moving throughout their habitats in response to environmental variability, life history strategy, and individual preferences. Antarctic pack-ice represents one of the most dynamic marine habitats on Earth, undergoing large seasonal changes in extent, and being driven into constant motion by underlying currents. Such habitats are crucial for numerous pinniped species, but they may prevent them from foraging from a ‘central-place’ (as many do). While leopard seals are the most widespread of the Antarctic pack-ice seals, few studies have investigated their long-term spatial behaviour, or whether they maintain a central-place. Given individual movement drives spatial structure and trophic effects on prey populations, this is a substantial gap in our knowledge.
I sought to quantify leopard seal spatial behaviour along the West Antarctic Peninsula using satellite tracking technology, to determine whether leopard seals undertake large-scale north–south migration that follows the seasonal extension of the pack-ice edge, or whether they maintain a central-place strategy. Further, I quantified small-scale haul-out and swimming behaviour to assess whether travel driven by ice-flow and travel driven by conscious movement would differ. I hypothesised that leopard seals would migrate with the pack-ice edge, and in doing so would not maintain a central-place strategy.
I used Argos location data for 24 leopard seals fitted with satellite trackers and USNIC ice extent data to determine the distances of seals from the pack-ice edge and from their capture site, as well as the distances travelled while hauled-out and swimming.
I found all response variables varied over time, with sex having a significant effect on distance from the capture site and swimming distance. Most seals did not move the substantial distances needed for a large-scale migration which follows the pack-ice edge. The results suggest intersexual differences (potentially due to sexual dimorphism) and individual variability drove long-distance and small-scale movement in leopard seals.
While the leopard seals did not maintain a central-place strategy, their movement based on a trade-off between the pack-ice edge and the coastline suggests great flexibility, and future studies across other Antarctic sectors should investigate whether leopard seals are as flexible elsewhere in their range. Since leopard seals can have substantial top-down effects on multiple trophic levels, such information about their spatial behaviour is important for ecosystem-based fisheries modelling and future operations in Antarctic waters.
... Gaussian blur can also be called Gaussian smoothing. Gaussian blur [7] is generally obtained by calculating the pixels around each point of the image through the weighted average method [8][9][10][11]. Its blur function can be expressed as ...
With the rapid development of modern science and technology, people gradually lose the importance of traditional art and old street culture, and it plays a vital role in improving the knowledge and cultural level of the people. This paper aims to effectively integrate traditional art and old street culture construction by studying fuzzy algorithms. In the era of rapid technological development, traditional culture and art must keep pace with the times and combine traditional art and old street culture through more scientific algorithms through fuzzy algorithm to solve the effective integration of traditional art and old street culture in today's society. Based on the fuzzy algorithm and visualization technology, this paper checks the meaning of traditional art and collective research by analyzing the fuzzy algorithm, uses experiments to verify the theory, refines cultural symbols, and proposes activation methods to integrate traditional art and old street culture as a whole. Through the defuzzification algorithm and the hierarchical evaluation fuzzy algorithm, people's subjective evaluation of the integration of traditional art and old street culture is calculated, and the algorithm is optimized from the above aspects, which substantially contributes to the effective integration of traditional art and old street culture. This case study can also provide new ideas for the protection of historical and cultural blocks in many cities and the integration of traditional art, remedy the dying traditional art and old street culture, and bring a gluttonous feast to the construction of traditional art and old street culture.
... In the literature there exists several loss functions [26], we focus on the Huber loss in this paper. It is a compromise between the l 1 -norm (less sensitive to outliers, but not differentiable at zero) and the l 2 -norm (differentiable everywhere, but highly sensitive to outliers) loss functions, behaving quadratically for small residuals and linearly for large ones. ...
Global Navigation Satellite System (GNSS) is the widely used technology when it comes to outdoor positioning. But it has severe limitations with regard to safety-critical applications involving unmanned autonomous systems. Namely, the positioning performance degrades in harsh propagation environment such as urban canyons. In this paper we propose a new algorithm for GNSS navigation in challenging environments based on robust statistics. M-estimators showed promising results in this context, but are limited by some fixed hyper-parameters. Our main idea is to adapt this parameter, for the Huber cost function, to the current environment in a data-driven manner. Doing so, we also present a simple yet efficient way of learning with satellite data, whose number may vary over time. Focusing the learning problem on a single parameter enables to efficiently learn with a lightweight neural network. The generalization capability and the positioning performance of the proposed method are evaluated in multiple contexts scenarios (open-sky, trees, urban and urban canyon), with two distinct GNSS receivers, and in an airplane ground inspection scenario. The maximum positioning error is reduced by up to 68% with respect to M-estimators.
... When any of the observation is skipped from the data in deviance, the change is then given by likelihood residual. It is also named as studentized residuals, externally studentized residuals, and deleted studentized residuals [27]. e studentized residual are a type of standardized residuals that can be used to identify outliers. ...
A common practice is to get reliable regression results in the generalized linear model which is the detection of influential cases. For the identification of influential cases, the present study focuses to compare empirically the performance of various existing residuals for the case of the Poisson regression model. Furthermore, we computed Cook’s distance for the stated residuals. In order to show the effectiveness of proposed methodology, data have been generated by using simulation, and further applicability of methodology is shown with the help of real data that followed the Poisson regression. The comparative analysis of the residuals is carried out for the detection of influential cases.
... , X n as the sample from the population, and the kernel density estimate of the population density function as f (x). At any point, x is defined as [29]:f ...
This paper sheds light on the effect of combination modes on the evaluation of berthing capacity for Sanya Yazhou Fishing Port (SYFP) under hypothetical typhoon conditions. By statistically analysing the maximum probability of moving speeds and directions of historical typhoons passing through the fishing port, the representative typhoon path was determined with the nonparametric regression method. The designed typhoon wind fields of levels 12–17 were generated based on Holland’s parametric wind model. Then, the MIKE 21 BW model was used to obtain the high-precision wave distribution in the fishing port. The boundary conditions (significant wave height and peak period) of the MIKE 21 BW model were calculated by combining the MIKE 21 SW model with the designed typhoon wind fields. In SYFP, ships usually adopt the modes of multi-ship side-by-side and single anchor mooring during typhoons. In fair weather, approximately 158 vessels can be berthed if they are all large ones, while approximately 735 vessels can be moored if they are all small ones. However, with an increase in typhoon levels, the anchoring area for small vessels decreases. From the perspective of wave distribution in the fishing port, the number of large vessels moored was hardly affected by typhoons. This can be attributed to the breakwater, which significantly decreases the large wave height in the fishing port. Finally, a study on the framework of a method for hazard assessment of berthing capacity in the coming typhoon-driven storm waves was set up.
... Beyond norms, if b is a {−1, 1} label vector, and error is measured via the logistic loss, Munteanu et al. [MSSW18] show that poly(d, µ, 1/ǫ) samples suffice, where µ is a complexity measure of A. This bound has recently been tightened toÕ(dµ 2 /ǫ 2 ) [MMR21], using Lewis weight sampling. 4 For other loss functions, such as the Tukey loss and Huber's M -estimators for robust regression [Fox02], we are not aware of any known results solving Problem 1.1. Chen and Dereziński leave extending their results on ℓ p regression to such loss functions as an open question. ...
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector and output a near minimizer to , where is a design matrix and is some loss function. For norm regression for any , we give an algorithm based on Lewis weight sampling that outputs a approximate solution using just queries to b. We show that this dependence on d is optimal, up to logarithmic factors. Our result resolves a recent open question of Chen and Derezi\'{n}ski, who gave near optimal bounds for the norm, and suboptimal bounds for regression with . We also provide the first total sensitivity upper bound of for loss functions with at most degree p polynomial growth. This improves a recent result of Tukan, Maalouf, and Feldman. By combining this with our techniques for the regression result, we obtain an active regression algorithm making queries, answering another open question of Chen and Derezi\'{n}ski. For the important special case of the Huber loss, we further improve our bound to an active sample complexity of and a non-active sample complexity of , improving a previous bound for Huber regression due to Clarkson and Woodruff. Our sensitivity bounds have further implications, improving a variety of previous results using sensitivity sampling, including Orlicz norm subspace embeddings and robust subspace approximation. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every norm.
... This reduced the impact of outliers on the trend results, and the iterative process was ended when the regression coefficients tended to converge [9]. The glacier elevation variation in the Yigong River Basin was fitted using the robustfit function in MATLAB with default parameters [10,11]. ...
... The coefficients of the regression equation were obtained using the Huber Regressor class in the scikit-learn library of the Python software [53]. To ensure maximum robustness to outliers and asymptotic efficiency of at least 95 percent, an adaptive value of the tuning constant in the Huber estimator's t function was evaluated using the method discussed in [74] and Equation (18). The Mean Absolute Deviation of the residuals was used as the robust measure of spread instead of the Standard Deviation. ...
In this study, hyperspectral imaging (HSI) and chemometrics were implemented to develop prediction models for moisture, colour, chemical and structural attributes of purple-speckled cocoyam slices subjected to hot-air drying. Since HSI systems are costly and computationally demanding , the selection of a narrow band of wavelengths can enable the utilisation of simpler mul-tispectral systems. In this study, 19 optimal wavelengths in the spectral range 400-1700 nm were selected using PLS-BETA and PLS-VIP feature selection methods. Prediction models for the studied quality attributes were developed from the 19 wavelengths. Excellent prediction performance (RMSEP < 2.0, r 2 P > 0.90, RPDP > 3.5) was obtained for MC, RR, VS and aw. Good prediction performance (RMSEP < 8.0, r 2 P = 0.70-0.90, RPDP > 2.0) was obtained for PC, BI, CIELAB b*, chroma, TFC, TAA and hue angle. Additionally, PPA and WI were also predicted successfully. An assessment of the agreement between predictions from the non-invasive hyperspectral imaging technique and experimental results from the routine laboratory methods established the potential of the HSI technique to replace or be used interchangeably with laboratory measurements. Additionally, a comparison of full-spectrum model results and the reduced models demonstrated the potential replacement of HSI with simpler imaging systems.
... Figure 1 shows the year-wise smoothed distributions of the number of studies published between 1993-2019 that contain each of the terms above. We smoothed the year-wise distributions by using the R loess() function that is based on Local regression, a kind of non-parametric regression model [21]. In the figure, note that the distribution of the "Digital Farming" term overlaps that of "Agriculture 4.0". ...
The application of digital technologies in agriculture can improve traditional practices to adapt to climate change, reduce Greenhouse Gases (GHG) emissions, and promote a sustainable intensification for food security. Some authors argued that we are experiencing a Digital Agricultural Revolution (DAR) that will boost sustainable farming. This study aims to find evidence of the ongoing DAR process and clarify its roots, what it means, and where it is heading. We investigated the scientific literature with bibliometric analysis tools to produce an objective and reproducible literature review. We retrieved 4995 articles by querying the Web of Science database in the timespan 2012-2019, and we analyzed the obtained dataset to answer three specific research questions: i) what is the spectrum of the DAR-related terminology?; ii) what are the key articles and the most influential journals, institutions, and countries?; iii) what are the main research streams and the emerging topics? By grouping the authors’ keywords reported on publications, we identified five main research streams: Climate-Smart Agriculture (CSA), Site-Specific Management (SSM), Remote Sensing (RS), Internet of Things (IoT), and Artificial Intelligence (AI). To provide a broad overview of each of these topics, we analyzed relevant review articles, and we present here the main achievements and the ongoing challenges. Finally, we showed the trending topics of the last three years (2017, 2018, 2019).
... and a constant value [14]. [15] proposed the regression estimates associated with M-scales to S estimation. ...
Many authors defined the modified version of the mean estimator by using two auxiliary variables. These proposed estimators highly depend on the calculated regression coefficients. In the presence of outliers, these estimators do not give satisfactory results. In this study, we improve the suggested estimators using several robust regression techniques while obtaining the regression coefficients. We compared the efficiencies between the suggested estimators and the estimators presented in the literature. We used two numerical examples and a simulation study to support these theoretical results. Empirical results show that the modified ratio estimator performs well in the presence of outliers when adopting robust regression techniques.
... The M-estimator has many applications in different multivariate techniques e.g. robust regression (Fox, 2002). It has been used to develop the variants of well-known RANSAC algorithm in computer vision e.g. ...
Laser scanning has spawned a renewed interest in automatic robust feature extraction. Three-dimensional (3D) point cloud data obtained from laser scanner based mobile mapping systems commonly contain outliers and/or noise. The presence of outliers and noise means that most of the frequently used methods for point cloud processing and feature extraction produce inaccurate and unreliable results i.e. are termed non-robust. Dealing with the problems of outliers and noise for automatic robust feature extraction in mobile laser scanning 3D point cloud data has been the subject of this research.
This thesis develops algorithms for statistically robust planar surface fitting based on robust and/or diagnostic statistical approaches. The algorithms outperform classical methods such as least squares and principal component analysis and show distinct advantages over current robust methods including RANSAC and its derivations in terms of computational speed, sensitivity to the percentage of outliers or noise, number of points in the data and surface thickness. Two highly robust outlier detection algorithms have been developed for accurate and robust estimation of local saliency features such as normal and
curvature. Results for artificial and real 3D point cloud data experiments show that the methods have advantages over other existing popular techniques in that they (i) are computationally simpler, (ii) can successfully identify high percentages of uniform and clustered outliers, (iii) are more accurate, robust and faster than existing robust and diagnostic methods developed in disciplines
including computer vision (RANSAC), machine learning (uLSIF) and data mining (LOF), and (iv) have the ability to denoise point cloud data. Robust segmentation algorithms have been developed for multiple planar and/or non-planar complex surfaces e.g. long cylindrical and approximately cylindrical surfaces (poles), lamps and sign posts extraction. A region growing approach has been developed for segmentation algorithms and the results demonstrate that the proposed methods reduce segmentation errors and provide more robust feature extraction. The developed methods are promising for surface edge detection, surface reconstruction and fitting, sharp feature preservation, covariance statistics based point cloud processing and registration. An algorithm has also been introduced for merging several sliced segments to allow large volumes of laser scanned data to be processed seamlessly. In addition, the thesis presents a robust ground surface extraction method that has the potential for being used as a pre-processing step for large point cloud data processing tasks such as segmentation, feature extraction, classification of surface points, object detection and modelling. Identifying and removing the ground then allows more efficiency in the segmentation of above ground objects.
... The M-estimator has many applications in different multivariate techniques e.g. robust regression (Fox, 2002). It has been used to develop the variants of well-known RANSAC algorithm in computer vision e.g. ...
Short Abstract: Due to the direct acquisition of high density and accurate spatial data, laser scanning (LS) has spawned a renewed interest for feature extraction and has been drawn a great importance in photogrammetry, remote sensing, computer vision, robotics and reverse engineering. 3D point cloud data collected by LS systems and LiDAR contain outliers and noise. The presence of outliers and noise means most of the methods used for feature extraction produce non-robust results. We investigate the problems of outliers and noise for feature extraction. This thesis develops algorithms for outlier detection, point cloud denoising, planar surface fitting, segmentation and ground surface points filtering. The algorithms outperform classical methods e.g. least squares and principal component analysis and show distinct advantages over robust methods e.g. RANSAC, and well-known data mining, computer vision, pattern recognition and machine learning techniques in terms of computational speed, sensitivity to outliers/noise, number of points in the data and surface nature.
... Figure 1 shows the year-wise smoothed distributions of the number of studies published between 1993-2019 that contain each of the terms above. We smoothed the year-wise distributions by using the R loess() function that is based on Local regression, a kind of non-parametric regression model [21]. In the figure, note that the distribution of the ''Digital Farming'' term overlaps that of ''Agriculture 4.0''. ...
The application of digital technologies in agriculture can improve traditional practices to adapt to climate change, reduce Greenhouse Gases (GHG) emissions, and promote a sustainable intensification for food security. Some authors argued that we are experiencing a Digital Agricultural Revolution (DAR) that will boost sustainable farming. This study aims to find evidence of the ongoing DAR process and clarify its roots, what it means, and where it is heading. We investigated the scientific literature with bibliometric analysis tools to produce an objective and reproducible literature review. We retrieved 4995 articles by querying the Web of Science database in the timespan 2012-2019, and we analyzed the obtained dataset to answer three specific research questions: i) what is the spectrum of the DAR-related terminology?; ii) what are the key articles and the most influential journals, institutions, and countries?; iii) what are the main research streams and the emerging topics? By grouping the authors' keywords reported on publications, we identified five main research streams: Climate-Smart Agriculture (CSA), Site-Specific Management (SSM), Remote Sensing (RS), Internet of Things (IoT), and Artificial Intelligence (AI). To provide a broad overview of each of these topics, we analyzed relevant review articles, and we present here the main achievements and the ongoing challenges. Finally, we showed the trending topics of the last three years (2017, 2018, 2019).
... The literature recognizes two general approaches to bootstrap (Eq. 1) by considering the covariates as either random or fixed. 32 The densities used in Eq. 1 are observed from the Ás computed by each equation. Being each equation specific for a certain set of Á, it is assumed that the Ás are fixed observations. ...
Raman spectroscopy has been used extensively to calculate CO2 fluid density in many geological environments, based on the measurement of the Fermi diad split (Î; cm-1) in the CO2 spectrum. While recent research has allowed the calibration of several Raman CO2 densimeters, there is a limit to the inter-laboratory application of published equations. These calculate two classes of density values for the same measured Î, with a deviation of 0.09 ± 0.02 g/cm3 on average. To elucidate the influence of experimental parameters on the calibration of Raman CO2 densimeters, we propose a bottom-up approach beginning with the calibration of a new equation, to evaluate a possible instrument-dependent variability induced by experimental conditions. Then, we develop bootstrapped confidence intervals for density estimate of existing equations to move the statistical analysis from a sample-specific to a population level.
We find that Raman densimeter equations calibrated based on spectra acquired with similar spectral resolution calculate CO2 density values lying within standard errors of equations and are suitable for the inter-laboratory application. The statistical analysis confirms that equations calibrated at similar spectral resolution calculate CO2 densities equivalent at 95% confidence, and that each Raman densimeter does have a limit of applicability, statistically defined by a minimum Î value, below which the error in calculated CO2 densities is too high.
... The regression function can be written as a ℎ -order weighted least square polynomial of on . Furthermore, the larger value of gives more flexible smoothness Fox (2002). ...
... However, estimation based on the linear smoother may not produce successful results for real life effects, since most of them show a nonlinear trend. Nonparametric regression [10], one of the flexible statistical methods, should be used for characterizing such a nonlinear trend. However, if the number of explanatory variables T j (j = 1, 2, . . . ...
A useful model for data analysis is the partially nonlinear model where response variable is represented as the sum of a nonparametric and a parametric component. In this study, we propose a new procedure for estimating the parameters in the partially nonlinear models. Therefore, we consider penalized profile nonlinear least square problem where nonparametric components are expressed as a B-spline basis function, and then estimation problem is expressed in terms of conic quadratic programming which is a continuous optimization problem and solved interior point method. An application study is conducted to evaluate the performance of the proposed method by considering some well-known performance measures. The results are compared against parametric nonlinear model.
... is chosen, where Med: = middle value [11]. The basic algorithm for computing M-estimator for regression is iteratively leastsquares (IRLS) [15]. ...
Neuroscience is a combination of different scientific disciplines which investigate the nervous system for understanding of the biological basis. Recently, applications to the diagnosis of neurodegenerative diseases like Parkinson’s disease have become very promising by considering different statistical regression models. However, well-known statistical regression models may give misleading results for the diagnosis of the neurodegenerative diseases when experimental data contain outlier observations that lie an abnormal distance from the other observation. The main achievements of this study consist of a novel mathematics-supported approach beside statistical regression models to identify and treat the outlier observations without direct elimination for a great and emerging challenge in humankind, such as neurodegenerative diseases. By this approach, a new method named as CMTMSOM is proposed with the contributions of the powerful convex and continuous optimization techniques referred to as conic quadratic programing. This method, based on the mean-shift outlier regression model, is developed by combining robustness of M-estimation and stability of Tikhonov regularization. We apply our method and other parametric models on Parkinson telemonitoring dataset which is a real-world dataset in Neuroscience. Then, we compare these methods by using well-known method-free performance measures. The results indicate that the CMTMSOM method performs better than current parametric models.
... Hence, for each observation, all the related sub-models' estimates are combined into one ensemble estimate. Furthermore, a robust fit of the sub-models' estimates is performed ( Andrews, 1974 ;Meer et al., 1991 ;Dumouchel and O'Brien, 1991 ;Holland and Welsch, 1977 ;Fox, 2002 ). This method results in robust estimates of the MLR coefficients. ...
Low-flow estimation at ungagged sites is a challenging task. Ensemble-based machine learning regression has recently been utilized in modeling hydrologic phenomena and showed improved performance compared to classical regional regression approaches. Ensemble modeling mainly revolves around developing a proper training framework of the individual learners and combiners. An ensemble framework is proposed in this study to drive the generalization ability of the sub-ensemble models and the ensemble combiners. Information mixtures between the subsamples are introduced and, unlike common ensemble frameworks, are explicitly devoted to the ensemble members as well as ensemble combiners. The homogeneity paradigm is developed via a two-stage resampling approach, which creates sub-samples with controlled information mixture levels for the training of the individual learners. Artificial neural networks are used as sub-ensemble members in combination with a number of ensemble integration techniques. The proposed model is applied to estimate summer and winter low-flow quantiles for catchments in the province of Québec, Canada. The results show significant improvement when compared to the other models presented in the literature. The obtained homogeneity levels from the optimum ensemble models demonstrate the importance of utilizing the diversity concept in ensemble learning applications.
... adalah skala sisaan pada observasi ke-i dan nilai adalah nilai tuning constant yang telah ditetapkan untuk menentukan tingkat ke-robust-an [7]. Tabel 1 adalah nilai tuning constant untuk setiap breakdown point [9] Tabel (4) ...
Analisis regresi adalah suatu analisis yang bertujuan membentuk hubungan antara satu variabel tak bebas (Y) dengan satu atau lebih variabel bebas (X) dalam suatu model matematis. Metode untuk mengestimasi parameter yang sering digunakan adalah metode kuadrat terkecil. Ketika terdapat data pencilan metode tersebut kurang efektif digunakan karena dapat menyebabkan estimator yang diperoleh menjadi bias. Regresi robust adalah salah satu metode yang digunakan untuk mengestimasi parameter ketika distribusi dari galat tidak normal dan atau terdapat data pencilan. Pembobotan yang digunakan dalam penelitian ini adalah pembobotan Tukey Bisquare. Tujuan penelitian ini adalah melakukan estimasi parameter dan menunjukkan keefektifan metode estimasi-S pada analisis regresi robust dengan pembobotan Tukey Bisquare. Studi kasus yang digunakan dalam penelitian ini adalah pengaruh IPM (X1), angka partisipasi sekolah usia (APS) 16-18 tahun (X2) dan konsumsi (X3) terhadap kemiskinan (Y) di Indonesia pada tahun 2016. Berdasarkan uji DFFITS dan boxplot data yang digunakan teridentifikasi data pencilan sehingga diperlukan prosedur regresi robust untuk mengestimasi parameter model matematisnya. Dari model regresi robust estimasi-S dengan pembobotan Tukey Bisquare diperoleh model matematis yaitu dimana variabel bebas berpengaruh signifikan terhadap variabel tak bebas secara simultan dan parsial dengan nilai adjusted-R square sebesar 0,951 dan nilai standard error sebesar 0,01247. Kata Kunci: Estimasi-S, Regresi Robust, Tukey Bisquare
... Fungsi yang digunakan untuk mencari fungsi pembobot pada regresi robust adalah fungsi obyektif (Fox, 2002). ...
Spatial regression analysis is regression method used for type of data has a spatial effect. Spatial regression showing the presence of spatial effects on the response variable (Y) is a Spatial Autoregressive (SAR). Outlier often found in research spatial data. The outlier is called the spatial outliers. The analysis can be used to handle outliers in general is Robust Regression. There are several estimator that can be used in which the estimator Robust Regression S, M, MM and LTS. Meanwhile, Robust Regression were used to handle spatial outlier is a combination of SAR and Regression Robust method to form a new method that is Robust Spatial Autoregressive (Robust SAR). Type estimator used in this study is the S-Estimator. This study was conducted to determine the best model on a case study Life Expectancy of East Java Province. The best model is analyzed by comparing the methods of SAR and SAR Robust method. Based on the analysis results obtained MSE and Adjusted R2 values for the SAR method are 1.7521 and 55.54% while for the Robust SAR method are 0.7456 and 62.30%. The Robust SAR model has a lower MSE value and a higher Adjusted R2 when compared to the SAR model. Thus the best model for modeling the life expectancy in East Java is Robust SAR models.Keywords:Spatial Autoregressive (SAR), Robust SAR, Life expectancy
... Fungsi yang digunakan untuk mencari fungsi pembobot pada regresi robust adalah fungsi obyektif (Fox, 2002). ...
Multiple Linear Regression can be solved by using the Ordinary Least Squares (OLS). Some classic assumptions must be fulfilled namely normality, homoskedasticity, non-multicollinearity, and non-autocorrelation. However, violations of assumptions can occur due to outliers so the estimator obtained is biased and inefficient. In statistics, robust regression is one of method can be used to deal with outliers. Robust regression has several estimators, one of them is Scale estimator (S-estimator) used in this research. Case for this reasearch is fish production per district / city in Central Java in 2015-2016 which is influenced by the number of fishermen, number of vessels, number of trips, number of fishing units, and number of households / fishing companies. Approximate estimation with the Ordinary Least Squares occur in violation of the assumptions of normality, autocorrelation and homoskedasticity this occurs because there are outliers. Based on the t- test at 5% significance level can be concluded that several predictor variables there are the number of fishermen, the number of ships, the number of trips and the number of fishing units have a significant effect on the variables of fish production. The influence value of predictor variables to fish production is 88,006% and MSE value is 7109,519. GUI Matlab is program for robust regression for S-estimator to make it easier for users to do calculations. Keywords: Ordinary Least Squares (OLS), Outliers, Robust Regression, Fish Production, GUI Matlab.
... Fungsi yang digunakan untuk mencari fungsi pembobot pada regresi robust adalah fungsi obyektif (Fox, 2002). Fungsi obyektif yang diturunkan terhadap akan menghasilkan suatu fungsi yang disebut sebagai fungsi pengaruh ( ( )). ...
Robust regression is one of the regression methods that robust from effect of outliers. For the regression with the parameter estimation used Ordinary Least Squares (OLS), outliers can caused assumption violation, so the estimator obtained became bias and inefficient. As a solution, robust regression M-estimation with Andrew, Ramsay and Welsch weight function can be used to overcome the presence of outliers. The aim of this study was to develop a model for case study of poverty in Central Java 2017 influenced by the number of unemployment, population, school participation rate, Human Development Index (HDI), and inflation. The result of estimation using OLS show that there is violation of heteroskedasticity caused by the presence outliers. Applied robust regression to case study proves robust regression can solve outliers and improve parameter estimation. The best robust regression model is robust regression M-estimation with Andrew weight function. The influence value of predictor variables to poverty is 92,7714% and MSE value is 370,8817. Keywords: Outliers, Robust Regression, M-Estimator, Andrew, Ramsay, Welsch
Survival analysis is a branch of statistical analysis which analyzing the envisioned of time till one event occurs. it is a widely used technique in biomedical and health services researches. In this study, the data sample takes with size (300) persons from all classes of society in Kurdistan, from the age of (15 years) to (82 years) to analyze Cox regression and bootstrapping test for indicating the risk of smoking for this purpose. The dependent variable which it is the effect of smoking on several diseases represents in the independent (predictor) variables like mouth odor, Lung diseases, Dental disease, Gastrointestinal diseases, Liver diseases, Heart and blood system diseases, Alzheimer, Mental diseases and Cancer. The result of the analyses confirmed that Smoking caused several diseases such as cancer, lung diseases, dental diseases, heart and blood system diseases. For these analyses the IBM SPSS Statistics is a powerful statistical software platform used.
The important issue in the main applications of statistical represented by the distribution and the assumptions for the parent population (from which the sample is drawn) has a specific distribution characteristics to be represented community representation, but in many cases does not know the form of the basis distribution so we needs statistical techniques do not depend on the distribution or assumptions about the phenomenon required study (depend on the free distribution for the data), and these methods are the nonparametric regression methods that depend directly on the data when estimating equation. In this research was review some methods in nonparametric regression methods, like the method, (Local polynomial regression) the some methods for estimating the smoothing parameters (with one of these methods have been proposed to find an initial value for the smoothing parameter with Kernel functions) , and then compared the results of the methods mentioned above, among themselves using tests and statistical standards following: (MSE, RMSE , 2 R , adj R 2 , and F-statistic). That by application for the real data is the (elements of climate), daily average temperatures for the period from (1/1/2011 to 31/12/2013) registered with the Directorate of Meteorological and Seismology in Sulaimani with different sample sizes (365, 730,1095), to show which the sample size with climate data and geography is more efficient with simple nonparametric regression model (Local Linear) and multiple regression model (Additive model).To achieve the objective of this study we will using statistical programs through the program (SYSTA-12).
Swimmer’s itch is a rash caused by snail-borne parasitic flatworms, whose infectious cercariae normally target birds but sometimes try to infect humans. We know little about what environmental factors drive temporal patterns of cercaria abundance, particularly at daily time scales. I surveyed 14 sites on 8 Michigan lakes, measuring daily cercaria abundance in the water using a real-time PCR assay and collecting hourly water temperature measurements. I also measured snail densities, water nutrients, periphyton growth, and land use for each site. I hypothesized that cercaria production by infected snails would decrease following exposure to energetically stressful warmer temperatures. In support of this hypothesis, daily cercaria abundance was best predicted by a combination of past cercaria abundance, today’s minimum temperature, and a negative effect of mean temperature over the previous 5 days. Among-site variation in cercaria abundance was best predicted by snail density, which was associated with periphyton growth and urbanization.
The important issue in the main applications of statistical represented by the distribution and the assumptions for the parent population (from which the sample is drawn) has a specific distribution characteristics to be represented community representation, but in many cases does not know the form of the basis distribution so we needs statistical techniques do not depend on the distribution or assumptions about the phenomenon required study (depend on the free distribution for the data), and these methods are the nonparametric regression methods that depend directly on the data when estimating equation. In this research was review some methods in nonparametric regression methods, like the method, (Local polynomial regression) the some methods for estimating the smoothing parameters (with one of these methods have been proposed to find an initial value for the smoothing parameter with Kernel functions) , and then compared the results of the methods mentioned above, among themselves using tests and statistical standards following: (MSE, RMSE , 2 R , adj R 2 , and F-statistic). That by application for the real data is the (elements of climate), daily average temperatures for the period from (1/1/2011 to 31/12/2013) registered with the Directorate of Meteorological and Seismology in Sulaimani with different sample sizes (365, 730,1095), to show which the sample size with climate data and geography is more efficient with simple nonparametric regression model (Local Linear) and multiple regression model (Additive model).To achieve the objective of this study we will using statistical programs through the program (SYSTA-12).
In the structure of fuzzy inference systems, the decision-making process is based on certain rules called “if-then”. It is highly difficult to determine these rules. The fuzzy regression functions approach proposed to overcome this difficulty is a fuzzy inference system method based on fuzzy set theory and multiple regression analysis. Even though the fuzzy regression functions approach gives successful forecasting results, its performance is affected by the outliers in the data set. In this study, the parameter estimations of regression functions are obtained by robust regression based on Andrews, Bisquare, Talwar, Huber, Fair, Logistic and Cauchy functions. Thus, the forecasting performance of the proposed method is not affected even if the data set contains an outlier. The forecasting performance of the proposed method, with and without the data set containing an outlier, is compared with many forecasting methods in the literature. Its superior forecasting performance is supported by the analysis results. It is seen that the proposed method has a superior forecasting performance.
When monitoring the breeding ecology of birds, the causes and times of nest failure can be difficult to determine. Cameras placed near nests allow for accurate monitoring of nest fate, but their presence may increase the risk of predation by attracting predators, leading to biased results. The relative influence of cameras on nest predation risk may also depend on habitat because predator numbers or behaviour can change in response to the availability or accessibility of nests. We evaluated the impact of camera presence on the predation rate of artificial nests placed within mesic tundra habitats used by Arctic-breeding shorebirds. We deployed 94 artificial nests, half with cameras and half without, during the shorebird-nesting season of 2015 in the East Bay Migratory Bird Sanctuary, Nunavut. Artificial nests were distributed evenly across sedge meadow and supratidal habitats typically used by nesting shorebirds. We used the Cox proportional hazards model to assess differential nest survival in relation to camera presence, habitat type, placement date, and all potential interactions. Artificial nests with cameras did not experience higher predation risk than those without cameras. Predation risk of artificial nests was related to an interaction between habitat type and placement date. Nests deployed in sedge meadows and in supratidal habitats later in the season were subject to a higher risk of predation than those deployed in supratidal habitats early in the season. These differences in predation risk are likely driven by the foraging behaviour of Arctic fox (Vulpes lagopus), a species that accounted for 81% of observed predation events in this study. Arctic fox prey primarily on Arvicoline prey and goose eggs at this site and take shorebird nests opportunistically, perhaps more often later in the season when their preferred prey becomes scarcer. This study demonstrates that, at this site, cameras used for nest monitoring do not influence predation risk. Evaluating the impact of cameras on predation risk is critical prior to their use, as individual study areas may differ in terms of predator species and behaviour.
Relative potency is widely used in toxicological and pharmacological studies to characterize potency of chemicals. The relative potency of a test chemical compared to a standard chemical is defined as the ratio of equally effective doses (standard divided by test). This classical concept relies on the assumption that the two chemicals are toxicologically similar—that is, they have parallel dose–response curves on log‐dose scale—and thus have constant relative potency. Nevertheless, investigators are often faced with situations where the similarity assumption is deemed unreasonable, and hence the classical idea of constant relative potency fails to hold; in such cases, simply reporting a single constant value for relative potency can produce misleading conclusions. Relative potency functions, describing relative potency as a function of the mean response (or other quantities), is seen as a useful tool for handling nonconstant relative potency in the absence of similarity. Often, investigators are interested in assessing nonconstant relative potency at a finite set of some specific response levels for various regulatory concerns, rather than the entire relative potency function; this simultaneous assessment gives rise to multiplicity, which calls for efficient statistical inference procedures with multiplicity adjusted methods. In this paper, we discuss the estimation of relative potency at multiple response levels using the relative potency function, under the log‐logistic dose–response model. We further propose and evaluate three approaches to calculating multiplicity‐adjusted confidence limits as statistical inference procedures for assessing nonconstant relative potency. Monte Carlo simulations are conducted to evaluate the characteristics of the simultaneous limits.