ArticlePDF Available

Abstract

Segmentation has vital employment in regression analysis where data have some change point. Traditional estimation methods such as Hudson, D.J.;(1966) and Muggeo, V. M., (2003) have been reviewed. But these methods do not take into account robustness in the presence of outliers values. However, third method was used as rank-based method, where the analysis will be devoted to the ranks of data instead of the data themselves. Our contribution in this paper is to use M-estimator methodology with three distinct weight functions (Huber, Tukey, and Hampel) which has been combined with Muggeo version approach to gain more robustness, Thus we get robust estimates from the change point and regression parameters simultaneously. We call our new estimator as robust Iterative Rewrighted M-estimator:IRWm-method with respect to its own weight function. Our primary interest is to estimate the change point that joins the segments of regression curve, and our secondary interest is to estimate the parameters of segmented regression model. The real data set were used which concerned to bed-loaded transport as dependent variable (y) and discharge explanatory variable (x). The comparison has been conducted by using several criteria to select the most appropriate method for estimating the change point and the regression parameters. The superior results were marked for IRWm-estimator with respect to Tukey weight function.
A preview of the PDF is not available
Preprint
Full-text available
In this research, the segmented regression model with multiple change points will be dealt with. the maximum likelihood estimator (MLE) and the two methods of robust repetitive weights (IRWm), (IRWs) will be used to estimate the model parameters and change points and then compare between these methods to choose the best method between them. A simulation process will be created with several scenarios, with different sample sizes and contamination rates (Outliers values) (15%, 10%, 5%, 0%) to Procedure the comparison process and find the best method for estimation. The simulation results, after completing the comparison process using the comparison standard, mean squared error (MSE), showed the efficiency of the Iterative Weights method (IRWm) when there were contamination rates in the data, and the efficiency of the maximum likelihood method (MLE) when the data did not contain any contamination.
Article
Full-text available
Segmented regression consists of several sections separated by different points of membership, showing the heterogeneity arising from the process of separating the segments within the research sample. This research is concerned with estimating the location of the change point between segments and estimating model parameters, and proposing a robust estimation method and compare it with some other methods that used in the segmented regression. One of the traditional methods (Muggeo method) has been used to find the maximum likelihood estimator in an iterative approach for the model and the change point as well. Moreover, robust estimation method (IRWm method) has used which depends on the use of the robust M-estimator technique in segmentation idea and using the Tukey weight function. Our contribution to this research lies in the suggestion to use the S-estimator technique and using the Tukey weight function, to obtain a robust method against cases of violation of the normal distribution condition for random errors or the effect of outliers, and this method will be called IRWs. The aforementioned methods have been applied to a real data set related to the bed-load of Tigris River/ Baghdad city as a response variable and the amount of water discharge as an explanatory variable. The results of the comparison showed the superiority of the proposed method.
Article
Full-text available
In regression analysis the use of ordinary least squares, (OLS) method would not be appropriate in solving problem containing outlier or extreme observations. Therefore, we need a method of robust estimation where the value of the estimation is not much affected with these outlier or extreme observations. In this paper, six methods of estimation will be compared in order to reach the best estimation, and these methods are M.Humpel estimation method, M.Bisquare estimation method, M.Huber estimation method, S-estimation method, MM(S)-estimation method, and MM estimation method in robust regression to determine a regression model. We find that, the best three method, through this study, are M-estimation method, MM(S)-estimation method and MM estimation method. Since M-estimation method is an extension of the maximum likelihood method, while MM estimation method is the development of M-estimation method and MM(S) estimation method is the development of S-estimation method. Robust regression methods can considerably improve estimation precision, but should not be applied automatically instead of the classical methods.
Article
Full-text available
This paper considers a robust piecewise linear regression model with an unknown number of change points. Our estimation framework mainly contains two steps: First, we combine the linearization technique with rank-based estimators to estimate the regression coefficients and the location of thresholds simultaneously, given a large number of change points. The associated inferences for all the parameters are easily derived. Second, we use the LARS algorithm via generalized BIC to refine the candidate threshold estimates and obtain the ultimate estimators. The rank-based regression guarantees that our estimators are less sensitive to outliers and heavy-tailed data, and therefore achieves robustness. Simulation studies and an empirical example on BMI and age relationship illustrate the proposed method.
Article
Full-text available
We introduce robust procedures for analyzing water quality data collected over time. One challenging task in analyzing such data is how to achieve robustness in presence of outliers while maintaining high estimation efficiency so that we can draw valid conclusions and provide useful advices in water management. The robust approach requires specification of a loss function such as the Huber, Tukey’s bisquare and the exponential loss function, and an associated tuning parameter determining the extent of robustness needed. High robustness is at the cost of efficiency loss in parameter loss. To this end, we propose a data-driven method which leads to more efficient parameter estimation. This data-dependent approach allows us to choose a regularization (tuning) parameter that depends on the proportion of “outliers” in the data so that estimation efficiency is maximized. We illustrate the proposed methods using a study on ammonium nitrogen concentrations from two sites in the Huaihe River in China, where the interest is in quantifying the trend in the most recent years while accounting for possible temporal correlations and “irregular” observations in earlier years. © 2018 Springer International Publishing AG, part of Springer Nature
Article
Full-text available
This paper is concerned with interval estimation for the breakpoint parameter in segmented regression. We present score-type confidence intervals derived from the score statistic itself and from the recently proposed gradient statistic. Due to lack of regularity conditions of the score, non-smoothness and non-monotonicity, naive application of the score-based statistics is unfeasible and we propose to exploit the smoothed score obtained via induced smoothing. We compare our proposals with the traditional methods based on the Wald and the likelihood ratio statistics via simulations and an analysis of a real dataset: results show that the smoothed score-like statistics perform in practice somewhat better than competitors, even when the model is not correctly specified.
Article
Full-text available
Change-point detection in abrupt change models is a very challenging research topic in many fields of both methodological and applied Statistics. Due to strong irregularities, discontinuity and non-smootheness, likelihood based procedures are awkward; for instance, usual optimization methods do not work, and grid search algorithms represent the most used approach for estimation. In this paper a heuristic, iterative algorithm for approximate maximum likelihood estimation is introduced for change-point detection in piecewise constant regression models. The algorithm is based on iterative fitting of simple linear models, and appears to extend easily to more general frameworks, such as models including continuous covariates with possible ties, distinct change-points referring to different covariates, and further covariates without change-point. In these scenarios grid search algorithms do not straightforwardly apply. The proposed algorithm is validated through some simulation studies and applied to two real datasets.
Article
High‐throughput biological experiments are essential tools for identifying biologically interesting candidates in large‐scale omics studies. The results of a high‐throughput biological experiment rely heavily on the operational factors chosen in its experimental and data‐analytic procedures. Understanding how these operational factors influence the reproducibility of the experimental outcome is critical for selecting the optimal parameter settings and designing reliable high‐throughput workflows. However, the influence of an operational factor may differ between strong and weak candidates in a high‐throughput experiment, complicating the selection of parameter settings. To address this issue, we propose a novel segmented regression model, called segmented correspondence curve regression, to assess the influence of operational factors on the reproducibility of high‐throughput experiments. Our model dissects the heterogeneous effects of operational factors on strong and weak candidates, providing a principled way to select operational parameters. Based on this framework, we also develop a sup‐likelihood ratio test for the existence of heterogeneity. Simulation studies show that our estimation and testing procedures yield well‐calibrated type I errors and are substantially more powerful in detecting and locating the differences in reproducibility across workflows than the existing method. Using this model, we investigated an important design question for ChIP‐seq experiments: how many reads should one sequence to obtain reliable results in a cost‐effective way? Our results reveal new insights into the impact of sequencing depth on the binding‐site identification reproducibility, helping biologists determine the most cost‐effective sequencing depth to achieve sufficient reproducibility for their study goals. This article is protected by copyright. All rights reserved
Article
We propose a two stage robust regression procedure to be applied when outliers and leverage points are present. It takes advantage of a high breakdown point MCD estimator and an efficient redescending M-estimator. Its performance was assessed by processing examples from the statistical literature and by carrying out a Monte Carlo simulation experiment; this procedure performs better than other robust regression methods. By pinpointing, labeling and investigating the data generation process of outlying observations, the procedure promotes an interactive attitude in the user and stimulates a thorough scrutiny of data based on subject matter knowledge.
Article
The adjustment of the precise planar networks may be subject to gross errors that occur in coordinates of control points and that substantially affect the estimated coordinates. Therefore, robust estimation methods, e.g. Huber M-estimation, are typically applied for fitting high-accuracy planar networks to the unstable points of national control networks. The classic Huber method may result in unreliable results in some cases, especially when assuming small values of initial reference coordinate errors. This paper presents a linear modification of the Huber method that overcomes this limitation. The proposed method is validated on a precise planar network consisting of 12 points, in which 3 out of 7 control points are outliers and thus demand robust estimation. The proposed linear method has a simple geometrical interpretation and concise formulae, and gives very similar results to other high-accuracy and advanced robust estimation methods.