Science topic

Least-Squares Analysis - Science topic

Least-Squares Analysis is a principle of estimation in which the estimates of a set of parameters in a statistical model are those quantities minimizing the sum of squared differences between the observed values of a dependent variable and the values predicted by the model.
Questions related to Least-Squares Analysis
  • asked a question related to Least-Squares Analysis
Question
3 answers
In Brewer, K.R.W.(2002), Combined Survey Sampling Inference: Weighing Basu's Elephants, Arnold: London and Oxford University Press, Ken Brewer proved not only that heteroscedasticity is the norm for business populations when using regression, but he also showed the range of values possible for the coefficient of heteroscedasticity.  I discussed this in "Essential Heteroscedasticity," https://www.researchgate.net/publication/320853387_Essential_Heteroscedasticity, and further developed an explanation for the upper bound. 
Then in an article in the Pakistan Journal of Statistics (PJS), "When Would Heteroscedasticity in Regression Occur, https://www.researchgate.net/publication/354854317_WHEN_WOULD_HETEROSCEDASTICITY_IN_REGRESSION_OCCUR, I discussed why this might sometimes not seem to be the case, but argued that homoscedastic regression was artificial, as can be seen from my abstract for that article. That article was cited by other authors in another article, an extraction of which was sent to me by ResearchGate, and it seemed to me to incorrectly say that I supported OLS regression. However, the abstract for that paper is available on ResearchGate, and it makes clear that they are pointing out problems with OLS regression.
Notice, from "Essential Heteroscedasticity" linked above, that a larger predicted-value as a size measure, where simply x will do for a ratio model as bx still gives the same relative sizes, means a larger sigma for the residuals, and thus we have the term "essential heteroscedasticity."  This is important for finite population sampling.
So, weighted least squares (WLS) regression should generally be the case, not OLS regression. Thus OLS regression really is not "ordinary." The abstract for my PJS article supports this. (Generalized least squares (GLS) regression may even be needed, especially for time series applications.)
Relevant answer
Answer
Shaban Juma Ally, that is why one should use weighted least squares (WLS) regression. When the coefficient of heteroscedasticity is zero - which should not happen - then WLS regression becomes OLS regression.
  • asked a question related to Least-Squares Analysis
Question
6 answers
The threshold least square regression model by Hansen (2000) divides the series into two regimes endogenously. The Regime above the threshold and below the threshold, and then regress both regimes individually by OLS. this method also involve bootstrap replication. In my case the regime above the threshold only remain with 17 number of observation. Does it creates loss of degree of freedom issue in the data?
Relevant answer
Answer
It is not possible to answer that question without more information. First, you should say how many observations you have. A regression with 17 observations is questionable but it depends on the number of explanatory variables. Since these observations correspond to above the threshold, I fear that they are outliers, hence we are in the worst situation. Did you consider a transformation on the dependent variable? That would perhaps improve the situation provided the relationship with the other variables allows for it. I would say that each of the two regimes should have enough observations.
  • asked a question related to Least-Squares Analysis
Question
5 answers
our dependent variables is stationary level while independent variables are stationary at level and first difference
Relevant answer
Answer
Marius Ole Johansen The threshold least square regression model by Hansen (2000) divides the series into two regimes endogenously. The Regime above the threshold and below the threshold, and then regress both regimes individually by OLS. this method also involve bootstrap replication. In my case the regime above the threshold only remain with 17 number of observation. Does it creates loss of degree of freedom issue in the data?
  • asked a question related to Least-Squares Analysis
Question
5 answers
Dear academician friends, I have a question about econometrics. I evaluated the relationship between the number of researchers in the health field and patents with a balanced panel analysis over 11 countries and 10 years. The data is regular; I evaluated the model with least squares and then performed causality and co-integration analyses. However, one peer reviewer insists that the data should be counted and recommends counted panel analysis. I looked at the subject, but there was no need for such an analysis, so I proceeded according to the suitability of econometric evaluations and diagnostic tests. How can I make such an analysis (counted data) on Eviews? Thanks.
Relevant answer
Answer
Counted panel analysis, also known as count data panel analysis, involves statistical techniques used to analyze panel data where the dependent variable represents counts or the number of occurrences of an event. This type of data is common in various fields, such as economics, and social sciences, where the interest lies in modeling the frequency of events over time for different entities (such as individuals, firms, or countries).
  • asked a question related to Least-Squares Analysis
Question
1 answer
I have performed several meta-analysis using STATA, but never the dose-response analysis. I have been reading articles explaining about the drmeta package where it can utilize the generalized least-squares regression, but some variables such as SE and LogRR seems unfamiliar. How can you calculate those variables in STATA when the provided data are: OR, RR, HR, 95% CI, MD, or SMD? I need help for step-by-step command for the dose-response analysis.
A comment from previous thread said the analysis can be done through meta-regression. But dose-response and meta-regression are completely two different analyses, aren't they?
Relevant answer
Answer
If the provided data includes OR, RR, HR, 95% CI, MD, or SMD, you can use se = (ln(upper CI) - ln(lower CI)) / (2 * 1.96) formula to calculate the standard error (SE).
  • asked a question related to Least-Squares Analysis
Question
2 answers
Suppose I compute a least squares regression with the growth rate of y against the growth rate of x and a constant. How do I recover the elasticity of the level of y against the level of x from the estimated coefficient?
Relevant answer
Answer
The elasticity of y with respect to x is defined as the percentage change in y resulting from a one-percent change in x, holding all else constant. In the context of your regression model, where you have regressed the growth rate of y (which can be thought of as the percentage change in y) against the growth rate of x (the percentage change in x), the estimated coefficient on the growth rate of x is an estimate of this elasticity directly.
Here's why: If you run the following regression:
Δ%y=a+b(Δ%x)+ϵ
where Δ%y is the growth rate of y (dependent variable), Δ%x is the growth rate of x (independent variable), a is the constant term, b is the slope coefficient, and ϵ is the error term, the coefficient b represents the change in Δ%y for a one-unit change in Δ%x. Because Δ%y and Δ%x are already in percentage terms, the coefficient b is the elasticity of y with respect to x.
So, if you have estimated the coefficient b from this regression, you have already estimated the elasticity. There is no need to recover or transform the coefficient further; the estimated coefficient b is the elasticity of y with respect to x.
It's important to note that this interpretation assumes that the relationship between y and x is log-linear, meaning the natural logarithm of y is a linear function of the natural logarithm of x, and the model is correctly specified without omitted variable bias or other issues that could affect the estimator's consistency.
  • asked a question related to Least-Squares Analysis
Question
1 answer
For example, There is no doubt that global sea level is rising, and based on the global mean sea level (GMSL)data, we can calculated the trend of the GMSL. However, we all know that that must be some interannual/decadal variations of the GMSL, and even the alising errors of our data. We can get the linear trend of GMSL timeseires based on least-square method. However, how can we estimate the uncertainty range of this trend? 1, GMSL timeseires have autocorrelation; 2, the variations of GMSL timeseries are not the white noises, the standard deviation of GMSL anomalies is not 1.
Relevant answer
Answer
I suggest that you employ an ARCH/ARMAX method where the ARCH component models the conditional variance of the error term, the ARMA component models the autoregressive nature of your data, and the X component models the effects of the exogenous variables.
Here is a link to a recent application of the method:
  • asked a question related to Least-Squares Analysis
Question
2 answers
I am planning to assess the extent of different income diversification strategies on rural household welfare. Considering simultaneous causality between different livelihood strategies and welfare indicators, the Two Stage Least Square (2SLS) method with instrumental variables will applied to estimate the impact of the strategies on household welfare.
Please check the attached file also. I just need to know which regression was used in table 4 of this paper and which tool (SPSS, STATA, R, etc.) I need to use to analyse the data.
Relevant answer
Answer
To perform two-stage least squares (2SLS) methods, you can follow these steps:
  1. Identify your endogenous and exogenous variables. The endogenous variable is the variable that you are interested in explaining. The exogenous variables are the variables that you believe influence the endogenous variable.
  2. Find instrumental variables.Instrumental variables are variables that are correlated with the endogenous variable but not with the error term.
  3. Run the first stage regression. In the first stage regression, you will regress the endogenous variable on the instrumental variables.
  4. Use the predicted values from the first stage regression in the second stage regression. In the second stage regression, you will regress the endogenous variable on the exogenous variables and the predicted values from the first stage regression.
The coefficient on the endogenous variable in the second stage regression is the 2SLS estimate of the effect of the exogenous variables on the endogenous variable.
Here is an example of how to perform 2SLS methods in R:
  • asked a question related to Least-Squares Analysis
Question
3 answers
I am running FGLS using Stata and I want to know how I can get the value of Pseudo R2. Your help would be much appreciated.
Relevant answer
Answer
In summary, the method to obtain the Pseudo R2 value in Stata depends on the type of regression model used. To obtain the Pseudo R2 value in Stata, there are different methods depending on the type of regression model used. Here are some ways to obtain Pseudo R2 in Stata: For a feasible generalized least square (XTGLS) regression model, the Pseudo R2 value can be obtained by running the command "xtgls Y X1 X2 Xn, panels( ...)" in Stata. For a generalized linear model (GLM) regression, Nagelkerke's R2 can be obtained using the "roctab" command to create a table of predicted probabilities and then calculate the Pseudo R2 value. For a random-effects probit model using xtprobit, the Pseudo R2 value is not directly available in the output. However, it can be calculated using the formula 1 - loglikelihood of the model/loglikelihood of the null model. If a Stata command does not supply an R-squared value, a Pseudo R2 value can be calculated using different approximations or analogues to R-squared. These approximations are often labeled "pseudo" and can be found in the literature of the field. I am able to add....Best of luck
  • asked a question related to Least-Squares Analysis
Question
5 answers
Greetings,
I am currently in the process of conducting a Confirmatory Factor Analysis (CFA) on a dataset consisting of 658 observations, using a 4-point Likert scale. As I delve into this analysis, I have encountered an interesting dilemma related to the choice of estimation method.
Upon examining my data, I observed a slight negative kurtosis of approximately -0.0492 and a slight negative skewness of approximately -0.243 (please refer to the attached file for details). Considering these properties, I initially leaned towards utilizing the Diagonally Weighted Least Squares (DWLS) estimation method, as existing literature suggests that it takes into account the non-normal distribution of observed variables and is less sensitive to outliers.
However, to my surprise, when I applied the Unweighted Least Squares (ULS) estimation method, it yielded significantly better fit indices for all three factor solutions I am testing. In fact, it even produced a solution that seemed to align with the feedback provided by the respondents. In contrast, DWLS showed no acceptable fit for this specific solution, leaving me to question whether the assumptions of ULS are being violated.
In my quest for guidance, I came across a paper authored by Forero et al. (2009; DOI: 10.1080/10705510903203573), which suggests that if ULS provides a better fit, it may be a valid choice. However, I remain uncertain about the potential violations of assumptions associated with ULS.
I would greatly appreciate your insights, opinions, and suggestions regarding this predicament, as well as any relevant literature or references that can shed light on the suitability of ULS in this context.
Thank you in advance for your valuable contributions to this discussion.
Best regards, Matyas
Relevant answer
Answer
Thank you for your question. I have searched the web for information about the Diagonally Weighted Least Squares (DWLS) and Unweighted Least Squares (ULS) estimators, and I have found some relevant sources that may help you with your decision.
One of the factors that you should consider when choosing between DWLS and ULS is the sample size. According to Forero et al. (2009)1, DWLS tends to perform better than ULS when the sample size is small (less than 200), but ULS tends to perform better than DWLS when the sample size is large (more than 1000). Since your sample size is 658, it falls in the intermediate range, where both methods may provide similar results.
Another factor that you should consider is the degree of non-normality of your data. According to Finney and DiStefano (2006), DWLS is more robust to non-normality than ULS, especially when the data are highly skewed or kurtotic. However, ULS may be more efficient than DWLS when the data are moderately non-normal or close to normal. Since your data have slight negative skewness and kurtosis, it may not be a serious violation of the ULS assumptions.
A third factor that you should consider is the model fit and parameter estimates. According to Forero et al. (2009)1, both methods provide accurate and similar results overall, but ULS tends to provide more accurate and less variable parameter estimates, as well as more precise standard errors and better coverage rates. However, DWLS has higher convergence rates than ULS, which means that it is less likely to encounter numerical problems or estimation errors.
Based on these factors, it seems that both DWLS and ULS are reasonable choices for your data and model, but ULS may have some advantages over DWLS in terms of efficiency and accuracy. However, you should also check the sensitivity of your results to different estimation methods, and compare them with other criteria such as theoretical plausibility, parsimony, and interpretability.
I hope this answer helps you with your analysis. If you need more information, you can refer to the sources that I have cited below.
1: Factor analysis with ordinal indicators: A Monte Carlo study comparing DWLS and ULS estimation by Carlos G. Forero, Alberto Maydeu-Olivares & David Gallardo-Pujol in British Journal of Mathematical and Statistical Psychology (2009)
: Non-normal and categorical data in structural equation modeling by Sara J. Finney & Christine DiStefano in Structural equation modeling: A second course (2006)
Good luck
  • asked a question related to Least-Squares Analysis
Question
10 answers
i am runing an instrumental variable regression.
Eviews is providing two different models for instrumetenal variables i.e., two-stage least squares and generalized method of moments.
how to choose between the two models.
thanks in advance
Relevant answer
Answer
Least squares method is, as I have been experienced, more convenient than the method of moments.
In the method of moments, first you have to derive the theoretical moments up to order p if the regression equation consists of p parameters in order to obtain p equations and then to solve the p equations substituting the values of the moments obtained from the sample in the equations.
Derivation of theoretical moments is more difficult than to set up normal equations (required in least squares method) since it depends upon the nature of the probability distribution followed by the parent population of the sample.
  • asked a question related to Least-Squares Analysis
Question
2 answers
>> size(output_data(1:L-1,:))
ans =
3359 1
>> size(U1)
ans =
1000 1
>>
>> output_data(1:L-1,:) * U1;
Error using *
Inner matrix dimensions must agree.
% Load experimental input-output data
load('data.mat');
input_data = input;
output_data = Output2;
% Define the number of inputs and outputs
num_inputs = 1;
num_outputs = 1;
% Define the order of the model
n = 2; % number of states
m = 1; % number of inputs
p = 1; % number of outputs
% Construct the Hankel matrix
L = 1000; % number of rows in the Hankel matrix
H = hankel(input_data(1:L), input_data(L:end));
% Apply the least squares method to estimate the model parameters
[U,S,V] = svd(H, 'econ');
U1 = U(:, 1:n);
U2 = U(:, n+1:end);
S1 = S(1:n, 1:n);
S2 = S(n+1:end, n+1:end);
V1 = V(:, 1:n);
V2 = V(:, n+1:end);
Ahat = U2 * pinv(S2) * V2' * output_data(1:L-1,:) * U1;
% compute Ahat
Ahat = U2 * pinv(S2) * V2' * output_data(1:L-1,:) * U1';
%Ahat = U2*S2^(-1/2)*V2'*U1'*output_data(1:L-1)';
Bhat = U2*S2^(1/2)*V2(1:num_inputs, :)';
Chat = output_data(1:p)*Bhat*S2^(-1/2)*V2(:, 1:p)';
Dhat = output_data(1:p)*Bhat*S2^(1/2)*V2(1:num_inputs, 1:p);
Relevant answer
Answer
thank you, issue resolved....
  • asked a question related to Least-Squares Analysis
Question
2 answers
How does Least Squares Estimation work
Relevant answer
  • asked a question related to Least-Squares Analysis
Question
7 answers
I have a dataset whic has around 20k datapoints (n=20k). My research question necessitates using Weighted Least Squares(WLS). But i am facing autocorrelation issue (sample has negetive autocorrelation, dw-statistic range from 2.05 to 2.2). I use R-Package for my analysis and it seems Cochrane-Orcutt is incompatible to WLS.
I got the following error message:
"Error in lmtest::dwtest(reg) : weighted regressions are not supported."
Changes in Version 0.9-29 o dwtest() now catches weighted regressions and throws an error because weighted regressions are not supported (yet).
Kindly let me know are there any ways i can handle the autocorrelation issue.
Regards,
Karthik N
Relevant answer
Answer
Why do you need autocorrelation? Because you think there is a periodicity in your data?
  • asked a question related to Least-Squares Analysis
Question
7 answers
Hi everyone,
I have a problem with crystallite size in Topas refinement. I'm using the LVol_FWHM_CS_G_L macro and the problem I'm constantly facing is that the Gaussian contribution goes to a large number (infinite crystallite size), while the Lorentzian contribution gives at least a reasonable value.
LVol_FWHM_CS_G_L(1, 5.50322798`, 0.89, 7.68859526`, csgc, 2356.57349`_LIMIT_MIN_0.3, cslc, 8.63459021`)
And this is obviously simultaneous with wrong strain value in e0_from_Strain macro where both values G and L of strain approach the min limit.
I'm dealing with highly disordered alumina materials if you'd like to know that.
Thank you in advance!
Jamal
Relevant answer
Answer
Hi Jamal,
From your volume-weighted crystallite size (referring to it here as such even if we don't yet trust the value), I assume you have relatively broad peaks and the broadening contribution from strain is likely not obvious when comparing fits with / without strain terms refined. Since you don't have complementary characterization of your samples, separating size and strain contributions is critical. As Andreas said, the LVol and e0 macros in TOPAS assume isotropic size and strain broadening (e.g., angle-dependent rather than hkl-dependent) and these assumptions do not have to be correct, especially for small (< 10 nm) nanoparticles.
Are your samples single phase? If so, an approach worth trying is using single-peak fitting to extract the peak breadths directly to make a Williamson-Hall plot. While this is a qualitative approach, here it will give you a sense for 1) if your sample has strain (proportional to slope of WH fit line) and 2) if there is some hkl-dependence indicating anisotropic size and / or strain broadening. You can also test the Stephens model for anisotropic strain broadening (Stephens 1999) and Ectors model for anisotropic size broadening (Ectors 2015, Ectors 2017) to see if these better describe your peak broadening. The anisotropic size macros are not in the TOPAS GUI and have to be implemented in launch mode - details are on the TOPAS wiki.
Single-peak fitting comes with some large warnings:
Correlation between your background and peak tails will be much higher when using single-peaks and it would be worthwhile to constrain the peak intensities (integrated area) to scale as they would when using a structural model. Likewise, the peak positions should be constrained or refined with some scale so that they are related as they would be in a Rietveld refinement. Without these constraints, your fit will necessarily improve but it won't be clear if you have resolved an issue with the peak shape, intensity, or position.
I would also suggest comparing your Rietveld refinement to a Pawley / Le Bail fit where intensity is freely refined for each peak while their shape and position is still constrained to the isotropic size-strain models and a unit cell, respectively. If you don't see an improvement, this points to a peak shape model issue over an issue with the structural model.
Best,
Adam
  • asked a question related to Least-Squares Analysis
Question
2 answers
Hello, I am studying a system and I will do a refinement (Rietveld). But my data is in cps and I want to convert to only counts.
Relevant answer
Answer
Thanks a lot for your answer. I trying to rmake Rietveld refinement using fullproof software, but I need to convert the intensity in a XRD difractogram from counts per second to only counts. By the way, how can convert an XY file to .dat extension?
  • asked a question related to Least-Squares Analysis
Question
3 answers
What are the conditions to apply iterated weighted least squares regression model to apply for a panel data?
Suppose, I am finding the effect of GDP on ROA of a company. WHat will be the mathematical equation of the model if we apply iterated weighted least squares?
Relevant answer
Answer
Iterated Weighted Least-Squares Regression is more an algorithm to minimize a selected objective function (cost function) than a statistical tool per se. So the real question is not if IWLSR can be applied to your data, but in first which objective function you want to minimize and which algorithms are available for that.
  • asked a question related to Least-Squares Analysis
Question
4 answers
From D.Simon book, a linear recursive estimator is defined with two equations:
y_{k} = H_{k}*x + v_{k}
x^_{k} = x^_{k-1} + K_{k}( y_{k} - H_{k}*x^_{k-1} )
The examples are for the estimate of 1 constant, but what if I need to estimate 15 constants? I mean, let's say I can measure 3 quantities at a time, the y_{k} vector is (3x1) while the vector of the unknown constants I want to estimate is (15x1). The H_{k} matrix will obviously be rank-deficient if its dimensions are (3x15). In least-squares the H matrix must be full-rank. Is it the same with recursive estimation? Am I missing something? Thank you all in advance
Relevant answer
Answer
The recursive least squares algorithm (RLS) is the recursive application of the well-known least squares (LS) regression algorithm, so that each new data point is taken in account to modify (correct) a previous estimate of the parameters from some linear (or linearized) correlation thought to model the observed system. The method allows for the dynamical application of LS to time series acquired in real-time. As with LS, there may be several correlation equations and a set of dependent (observed) variables. For the recursive least squares algorithm with forgetting factor (RLS-FF), acquired data is weighted according to its age, with increased weight given to the most recent data.
A particularly clear introduction to RLS is found at: Karl J. Åström, Björn Wittenmark, "Computer-Controlled Systems: Theory and Design", Prentice-Hall, 3rd ed., 1997.
Years ago, while investigating adaptive control and energetic optimization of aerobic fermenters, I have applied the RLS-FF algorithm to estimate the parameters from the KLa correlation, used to predict the O2 gas-liquid mass-transfer, hence giving increased weight to most recent data. Estimates were improved by imposing sinusoidal disturbance to air flow and agitation speed (manipulated variables). The power dissipated by agitation was accessed by a torque meter (pilot plant). The proposed (adaptive) control algorithm compared favourably with PID. Simulations assessed the effect of numerically generated white Gaussian noise (2-sigma truncated) and of first order delay. This investigation was reported at (MSc Thesis):
  • asked a question related to Least-Squares Analysis
Question
3 answers
Hi Everyone,
I am trying to use the SURE model in Nlogit, the software uses generalized least square regression for estimation. Is there any way or command to use OLS instead of GLS to be used in the SURE model?
Thanks
Relevant answer
Answer
Maybe “R” code is better?
  • asked a question related to Least-Squares Analysis
Question
5 answers
I am trying to convert vector into an image using the code below
clear variables
load(Exe4_2022.mat')
n = length(b);
figure,
imagesc(reshape(b,sqrt(n),sqrt(n))),
colormap(gray),
axis off;
But I am getting this error. Could anybody tells me how to resolve this issue??
Error using reshape
Size arguments must be real integers.
I have attached the "Exe4_2022.mat" file with this post.
Thanks
Relevant answer
Answer
The numbers of lines and columns of the matrix representing the image you want to obtain must be integers and their product must equal the length of the vector you want to convert. In your example the length of the vector is n=55929. You want to obtain a squared matrix with square(n) lines and columns, but in your example m=square(n)=2.364931288642442e+02, which is not an integer. If we chose a number of 3 columns and a number of 18643 lines we will obtain the following Matlab code which works.
clear variables
load('Exe42022.mat');
n = length(b);
m=sqrt(n);
c=reshape(b,18643,3);
figure,
imagesc(c),
colormap(gray),
axis off;
Please try this code.
  • asked a question related to Least-Squares Analysis
Question
7 answers
I am looking for study material (books, articles, codes or even YouTube videos) on parameter estimation of differential equation models (using any method but preferably least squares). I would like to calibrate some mathematical models in the form of ODEs and PDEs; I have a time series data set for the dependent variables. I could simply Google but I would like material that has demonstrations for easy learning.
Relevant answer
Answer
Use particle swarm optimization (PSO) or other metaheuristic algorithms for finding the optimal parameters of a system of ordinary differential equations (SODE) by minimizing the error (e.g. mean absolute percentage error or MAPE) between data and the model's solution. In Matlab, you can use ode45 package to solve the SODE. The step is:
0. Suppose you want to estimate the parameters in region interval I.
1. Generate m random vectors X in I, these vectors act as the initial guess of parameters.
2. Solve the SODE using ode45 for each vectors value. Then calculate the MAPE between SODE's solution with the data.
3. Find the minimum of the obtained MAPE. The vector vapue X* which gives the minimum of MAPE will act as the initial optimal parameters.
4. Next, you use the updating formula in the PSO to update the previous vectors X (you can find this in many books or article).
5. Redo the process 2-4 using the new vectors X.
6. At the desire stopping criteria, for example maximum iteration, you will get the estimated parameters value of your model.
Feel free to contact me for more detail explanation. I can help you.
  • asked a question related to Least-Squares Analysis
Question
5 answers
linear least squares problem::
fit f(t) = sin(t) on the interval [0, π] with polynomials of degree n with n = 1, 2, 3, · · · , 10. Here, we use equally spaced nodes.
Solve the normal equation with a backslash in Matlab. Save the norm of the residual and the condition numbers of AT A??
Could anybody please tell me how can find x, y, and A in that case ??
Relevant answer
Answer
The question seems incorrect. We needn't find x and y.
We have {(xi, yi = sin(xi))}, i=1,...,N and seek (a_n, ..., a_0) for a polynomial function
f(x) = a_n * x^n + ... + a_0.
Least Square minimization gives
A=
( Sum(xi^n) Sum(xi^(n-1)) ... Sum(xi^0);
Sum(xi^(n+1)) Sum(xi^(n-2)) ... Sum(xi^1);
.............................................
Sum(xi^(2*n)) Sum(xi^(2*n-1)) ... Sum(xi^n);
);
b = (Sum(yi); Sum(xi*yi);); ... Sum(xi^n*yi) );
At last,
(a_n a_(n-1) ... a_0) = A^(-1)*b
  • asked a question related to Least-Squares Analysis
Question
7 answers
Hello,
My dependent variable is having both positive and negative values. Also it has heteroskedasticity poble. By using Wighted Least Square regression, i was able to address the issue of heteroskedasticity. However, residuals of the model are not normal.
I find many of the standard solutions to address normality does not apply since i have negative values in dependent variable. Can anyone let me know what are the ways i can address non-normality? By any way i can defend regression output inspite of normality issue?
My sample size exceed 5000.
Regards,
Karthik
Relevant answer
Answer
In standard regression non-normality of the residuals is not a big issue ; outliers , that is really discrepant values , potentially are a much bigger problem.
  • asked a question related to Least-Squares Analysis
Question
2 answers
Hello,
I have a linear model (both with categorical factors and covariates) with a continuous response variable. I know that the observed means can be different from least square means based on the model, so I was wondering whether it is appropriate to create plots (e.g. boxplots that compare means from different factor categories) based on the observed means or if one should build them based on the least square means. Also, when presenting the results in the results section of a paper, should one present the least square means or is it still appropriate to present the observed means?
Thank you.
Relevant answer
Answer
the method of least squares is used to find the linear model of two variables.
  • asked a question related to Least-Squares Analysis
Question
6 answers
Hello dear researchers
I want to make a rietveld refinement for a new material (doped material) using fullprof.
The problem is that I need the cif file, but it's a new material that doesn't have a cif file. Can I use the cif file of the undoped material at the beginning of the refinement and then use the cif file generated by fullprof to finish the refinement?
Please, if you have any ideas, help me!
If you also have tips on how to do the rietveld refinement better, feel free to mention them.
Relevant answer
Answer
1. The new Material will have a cif file, if you make this cif file. Try e.g. VESTA.
2. Depending on the doping level you will likely be able to use the structure model of the pure material (it is the structure model, the information of which is contained in the cif file).
  • asked a question related to Least-Squares Analysis
Question
5 answers
From my experiment I get two values. One is a control value, and another one is the value of my interest. I get both values from a non-linear least-square model built on many data points; therefore, I can get their 95% confidence intervals and p-values. Thus, I got 0.350 (0.336-0.365 – 95%CI) in control and 0.882 (0.810-0.959) in sample of interest. Also, I know that this change is highly significant, p-value is 1.29E-9. Next, I repeat the experiment and get the following numbers: 0.389 (0.370-0.409) in control and 0.845 (0.775-0.920) in sample of interest. Again, the change is highly significant, p-value is 6.29E-9. Due to biological nature of my samples, it is difficult (if not possible) to standardize experimental conditions. But my idea is to test the control sample in the same conditions as sample of interest.
To summarize results of experiments I’d like to express them as a fold-change compared to control. In this case I get 0.882/0.365 = 2.41 and 0.845/0.389 = 2.17. So, I’d like to calculate 95% confidence intervals for these values. I found a method which is supposed to do so (https://journals.sagepub.com/doi/10.3102/1076998620934125). However, it gives me very wide interval including 1: 0.612-10.17. Apparently, it does not take into account my knowledge of high significance of the changes I observe.
My question is how I can calculate reasonable 95% percent confidence intervals in such case? References for R packages would be very valuable.
Thanks in advance!
Relevant answer
Answer
It seems you have two paired samples. The p-values for the individual samples are not at all informative for you, and also the individual 95% CIs are not helpful. All the relevant information is in the two point estimates and their variance.
Assuming the ratios (2.41 and 2.17) are approximately log-normal distributed, a one-sample test on the logarithms gives p = 0.04. Using a Gamma model woth log-link I get a p-value of 0.03. Althogh the sample size is tiny, the variance is small enough to conclude an up-regulation.
  • asked a question related to Least-Squares Analysis
Question
5 answers
Machine learning, Support vector machine, least squares support vector regression.
Relevant answer
Answer
Nabeel Hameed Al-Saati SVM is a supervised machine learning technique that may be used for both classification and regression.
Support Vector Machine may also be utilized as a regression approach while retaining all of the algorithm's fundamental characteristics (maximal margin). With a few minor exceptions, the Support Vector Regression (SVR) utilizes the same classification concepts as the SVM.
  • asked a question related to Least-Squares Analysis
Question
2 answers
I need to estimate a set of 15 constant parameters, which are not directly measured. My state vector is therefore fixed and is made up of these constants so the Kalman filter equations rearrange to those of the Recursive Least Squares. Only 3 quantities, different from the state, can be measured and from these I have to estimate the state vector. Results indicates that the state vector is estimated well but the rank of the observation matrix used in the calculations is much less than 15 because the equations(measures) are less than the constants to be estimated. Is it normal? Must the rank be equal to the number of parameters? Thank you
Relevant answer
Answer
Dear Michel Crimeni,
The rank of the measurement matrix H is less than or equal to the rank of the system (process) matrix F.
For more detail and information about this subject, I suggest you see links on the topic.
Best regards
  • asked a question related to Least-Squares Analysis
Question
16 answers
Good day.
I am doing linear regression between a set of data and predictions made by two models, that I'll call A and B. Both models have the same number of parameters.
If I do a simple regression with excel, I get the following:
- Model A has R2 = 0.97.
- Model B has R2 = 0.29.
- The least-squares fit to model A has a slope m = 2.43.
- The slope for model B is m = 0.29
From this simple analysis, I would conclude that model A is better than model B in capturing the trend of experimental outcomes. I even tested it on a set of unseen data and it performed better at predicting the trends.
Now, I was asked to confirm this by hypothesis testing, and here it gets tricky probably due to my lack of experience. Due to the large slope of model A, the residual sum of squares for model A is huge, almost 5 times larger than that for model B. Since the number of data points and parameters is the same for both models, this suggests that model B is better than model A.
What am I doing wrong? I feel that I'm not formulating my problem correctly, but I'm honestly lost.
Also, I've seen that there are endless flavors of hypothesis testing, but the more I read the less I know where to start.
Is there a simple prescription to formulate and test my hypothesis?
Many thanks in advance!
Relevant answer
The sample size for A and B is equally 11, which is too small (n=11). Sometimes, it is not enough for regression analysis.
  • asked a question related to Least-Squares Analysis
Question
4 answers
Hi, I conducted a Hausman test whereby I got the results as show in the screenshot. Both ROE and ROA shows fixed effects, and Tobin Q comes up with random effects. Which one should I use in this case? Or am I meant to run a fixed effects regression separately and then a random effects regression for Tobin Q?
Also, what is the difference between using Least Squares Dummy Variable (LSDV) approach vs Fixed effects panel data regression & Random effects panel data regression
Relevant answer
Answer
if you have any steps to use fixed and random effect in the meta-analysis, please help me
  • asked a question related to Least-Squares Analysis
Question
2 answers
Hello, I'm currently working on fitting some data with an Anand model. Data is in the form of strain-stress graph from the process of a hot-working pure titanium.
I'm using a Matlab Curve fitting tool and I have to do some tweaking for my lower and upper parameteres bounds calculated with nonlinear least square method. One of the parameter, which I'm looking for is "s0" - initial deformation resistance. In many works s0 values are around ~50 MPa. How can I determine if the value found by Matlab is reasonable? Can I somehow take an educated guess based on my data to find if this value is correct?
Relevant answer
Answer
The initial value of deformation resistance, s0,
see the following document
  • asked a question related to Least-Squares Analysis
Question
4 answers
Dear All,
I docked a ligand using Autodock vina and performed MD simulations for about 100ns.
I need help in analysis of the results. If we see the results can we say the ligand detached from the protein binding site and attached itself to some other site? If the ligand is detach can we say it is due to protein (unstable) on the basis of RMSD results?
1. Periodic boundary conditions were removed.
2. Protein structure was made using I-TASSAR (no PDB structure was available).
3. We we validate these results in wet lab.
4. Also please help in RMSD. What should be the criteria for selecting the best RMSD? Stable RMSD?
5. RMSD is Backbone least square fitting to heavy atoms of ligand.
6. RMSD of protein and ligand separate.
Really need help as I am totally new to MD simulations.
Relevant answer
Answer
Dear Martin Klvana,
I have seen the trajectory. Ligand slowly detached from the protein and after few steps it reattached itself to some other place (other than the binding site).
  • asked a question related to Least-Squares Analysis
Question
5 answers
Hello everyone,
Does anyone have any recommendations on how to calculate a diffusion coefficient from drug release data? I know that I will acquire the drug release over time and with this I can perform a mean least squares test to determine which kinetic model (Korsmeyer-Peppas, ect) is best fitted. This is where I am stuck, how do I get a diffusion coefficient from this kinetic model/data?
This isn't my usual background so I am struggling with this, any help would be greatly appreciated.
Cheers,
Sabrina
  • asked a question related to Least-Squares Analysis
Question
4 answers
Hello everybody,
I am running a confirmatory factor analysis, treating data as ordinal rather than continuous. I am therefore using robust weighted least squares estimator (WLSMV) based on the polychoric correlation matrix of latent continuous response variables. Since I am not able to produce any BIC and AIC, I was wondering what estimator I could use for model comparison. I read that in Mplus one could run a DIFFTEST. Does anyone know how to do it in JASP, Jamovi or SPSS?
Thank you very much!
Relevant answer
Answer
I don't think you can have any model comparison in JASP or jamovi. Both programs use the lavaan package in the background to run CFA/SEM models. If you look up the lavaan package in R, the lavTestLRT function can be used for comparing multiple models.
  • asked a question related to Least-Squares Analysis
Question
9 answers
The assumption of robust least square regression and supporting scholars.
Relevant answer
Answer
  • asked a question related to Least-Squares Analysis
Question
3 answers
please I am working on state estimation of power systems using the weighted least square and i have zero clue on how to go about it using MATLAB. I am using an IEEE 14 bus system. please i need help. Anyone has the complete research and project please kindly reach out.
Relevant answer
Answer
I agree with Xingyu Zhou.
  • asked a question related to Least-Squares Analysis
Question
8 answers
Hi, this is Dwira
I would like help in changing the the following exponential equation into the method of least squares to determine the values of "Aref" and "ho" form:
A(x)=Aref (1- e^-x/ho)
Thank you.
Relevant answer
Answer
Don't do any transformation!
This is very easy with this software:
  • asked a question related to Least-Squares Analysis
Question
10 answers
I have a big dataset (n>5,000) on corporate indebtedness and want to test wether SECTOR and FAMILY-OWNED are significant to explain it. The information is in percentage (total liabilities/total assets) but is NOT bounded: many companies have an indebtedness above 100%. My hypothesis are that SERVICES sector is more indebted than other sectors, and FAMILY-OWNED companies are less indebted than other companies.
If the data were normally distributed and had equal variances, I'd perform a two-way ANOVA.
If the data were normally distributed but were heteroscedastic, I'd perform a two-way robust ANOVA (using the R package "WRS2")
As the data is not normally distributed nor heteroscedastic (according to many tests I performed), and there is no such thing as a "two-way-kruskall wallis test", which is the best option?
1) perform a generalized least squares regression (therefore corrected for heteroscedasticity) to check for the effect of two factors in my dependent variable?
2) perform a non-parametric ANCOVA (with the R package "sm"? Or "fANCOVA"?)
What are the pros and cons of each alternative?
Relevant answer
Answer
Often, the log-normal distribution is a sensible assumption for percentages that are not bounded at 100%. Are there any arguments against this in your case?
It makes a huge difference for interactions if you analyze the data on the original scale or on the log scale. If is is sensible to assume relative effects (a given impulse will change the response depending on the "base value" of the response, so absolute changes will differ when the base values differ), interactions seen on the original scale are highly misleading in terms of the functional interpretation (i.e. when you aim to understand the interplay of the factors).
  • asked a question related to Least-Squares Analysis
Question
6 answers
Reading some literatures, I understand that there are two methods; (1) the inner product method, and (2) the least square fitting method.
I made some simulations to compare these two methods, and found that both methods give almost the same results.
Of cource the 1st one is simpler and faster.
However the least square fitting method seems to be more popular, at least in industry.
Is there any reason to use the least square fitting method instead of the inner product method?
Relevant answer
Answer
I don't know the engineering, but it does sound like you would be throwing away information if you did use ordinary least squares (OLS) regression. General least squares (GLS) regression means including nonzero off-diagonal (but symmetric) parts of matrices. The information lost by OLS regression is with regard to heteroscedasticity and autocorrelation. (If you just include heteroscedasticity, which typically should be there, you have weighted least squares (WLS) regression, which I highly endorse for finite population survey statistics.)
So I just wondered if what you were talking about might be missing off-diagonal information if you used OLS regression. But I could be completely off topic.
Anyway, it definitely sounds like you have two methods, where one ignores information that the other does not ignore. Does the 'least squares fitting method' use any information that the 'inner product method' ignores? If not it seems you have a good case for the method that uses all relevant information. That is at least analogous to using GLS regression rather than OLS regression when there is heteroscedasticity and/or autocorrelation. So it seems you have a case.
You said the following:
"I made some simulations to compare these two methods, and found that both methods give almost the same results."
I wonder if that is because the additional information was not very substantial, or was the difference so small that it is only round-off calculation differences, and both methods, somehow, do include all the relevant information?
Cheers - (no Dr) Jim Knaub
  • asked a question related to Least-Squares Analysis
Question
4 answers
In MIMO channel estimation, the linear minimum mean squares estimator (LMMSE) yields better performance than the least-squares (LS) estimator. However, it requires knowing the channel covariance matrix (which constitutes prior information). In practice, the channel covariance has to be estimated based on previous channel estimations, but how are these previous channels estimated? With LS? With MMSE using an identity matrix as covariance? Anything else?
Relevant answer
Answer
You can check out the work by Saeid Haghighatshoar and Giuseppe Caire, such as https://arxiv.org/pdf/2110.03324
  • asked a question related to Least-Squares Analysis
Question
13 answers
I want to examine the simultaneous relationship between the two using microeconomic (household) data. I am using consumption as a proxy variable for income. I am confused about the modelling of it.
Relevant answer
Answer
As your education data do not refer to the education (spending) itself, but to the educational level (which was gained by past education), it is clear that you try to analyse the effect of this level on income. Therefore, you have to find a specification of an income equation with the educational level as one of the explaining variables. You must also include family characteristics, because you use consumption as a proxy for income, and consumption will of course be dependent on the number and age etc. of the household members. These characteristics are likely to influence income, too. This influence will, of course, be weaker, and can hardly be separated. Before an econometric estimate you should look at diagrams of the variables chosen and calculate correlations between variables (also between explaining ones). If you have enough data, I recommend, to choose household types and to analyse the relations for every type separately. If your data include time series, you can, of course, use aggregate (National Accounts) data as explaining variables.
For analysing the influence of income on the education level, you would need very long time series for a fixed set of households, and even then it would likely not be a promising project.
Two questions: The education level is an individual characteristic, how is it transformed into a household date (level of the main earner or some average of the (earning) members)?
To José-Ignacio: What is the reason to include a link between distribution and growth in this study?
  • asked a question related to Least-Squares Analysis
Question
2 answers
I am currently doing a topology optimization of a given elastic tensor. My cost function is the square of the difference between the target elastic tensor and the elastic tensor, and my volume constraint is in the form of an equation, but I don’t know how to modify the MMA The parameters in make it applicable to least square optimization. I have read the notes of Professor Krister Svanberg and tried some, but still no success. Has anyone done similar optimizations? Can you give me a little help? Thank you everyone.
Relevant answer
Answer
You may focus on topological manifolds optimization. Very beautiful work on this area is here : https://www.manopt.org/downloads.html
  • asked a question related to Least-Squares Analysis
Question
8 answers
ran OLS regression on my data and found issues with auto correlation due to non-stationarity of data (time series data). I need to conduct a Generalized least Square regression as it is robust against biased estimators
Relevant answer
Videos of Generalized Least Square regression (GLS) in SPSS
Check on Youtube there you found step by step how to do the GLS in SPSS
  • asked a question related to Least-Squares Analysis
Question
7 answers
The use of weighted least squares is common in analytical chemistry, but the evaluation of uncertainty is generally poorly documented. Do you know any international standard or guide that addresses the evaluation of uncertainty consistent with GUM (JCGM 100) when interpolated values are used in a curve obtained by weighted least squares.
Relevant answer
Answer
Thanks Andrew Paul McKenzie Pegman . I have studied many aspects of estimating uncertainties from the perspective of GUM, as well as from Monte Carlo. Please check my latest paper at
where I address most of JCGM 100 and 101 examples from both approaches. However, I have not found coherent documentation on weighted least squares neither from GUM, nor from MCM, apart from the fact that in my work in Physical Metrology the use of weights in least squares is not common, so I do not know in depth its use.
  • asked a question related to Least-Squares Analysis
Question
5 answers
Does anyone know where I can get any free 3-D smoothing spline code for irregular data in fortran?
I've used 1-D and 2-D code from Inoue, H., 1986: A least-squares smooth fitting for irregularly spaced data: Finite-element approach using the cubic B-spline basis. Geophysics, 5, 2051–2066.
cheers, arthur
Relevant answer
Answer
Thanks for raising such an interesting question. Probably some people would be interested in the following special issue.
"Modern Geometric Modeling: Theory and Applications II" (IF=2.258) (Deadline: February 28, 2022)
The scope of the Special Issue includes but is not limited to original research works within the subject of geometric modeling and its applications in engineering, arts, physics, biology, medicine, computer graphics, architecture, etc., as well as theoretical mathematics and geometry which can be applied to problems of geometric modeling. For this Special Issue, we plan to accept the following types of manuscripts:
  • Overviews;
  • Research manuscripts;
  • Short manuscripts which discuss open problems in geometric modeling.
  • asked a question related to Least-Squares Analysis
Question
6 answers
Hello everyone,
I am trying to carry out multiple regression analysis but I cannot meet the assumptions for normal distribution of the residuals and for homoscedasticity. I transformed the dependant variable by inverting it which resulted in a normal distribution of the residuals. However, the variance of the residuals is heteroscedastic. I tried to dig around and came across weighted least square regression (WLS). My question is:
Do I need to do both WLS and data transformation? Can I do just WLS/does WLS require normal distribution of residual?
I apologise if these are silly questions but I am having to teach all of this myself and I am struggling.
Thank you!
Relevant answer
Answer
@Maya Yes it can be used. For detailed information refer to the following link https://towardsdatascience.com/when-and-how-to-use-weighted-least-squares-wls-models-a68808b1a89d
  • asked a question related to Least-Squares Analysis
Question
5 answers
I have been trying to find a way to fit two functions simultaneously using nonlinear least squares (I have to find the optimum 3 variables, common for both models, that fits best both of them). I typically use Python's scipy.optimize.least_squares module for NLLS work, which uses the Levenberg–Marquardt algorithm.
I tried some specialised multi-objective optimization packages (like pymoo), but they don't seem suitable for my problem as they rely on evolutionary algorithms that output a set of solutions (I only need one optimum solution per variable) and they are made to work for conflicting objectives.
I also tried to take the sum of the norms of the residuals of the two functions (making it into a single objective problem) and to minimize that by various gradient and non-gradient based algorithms from Python's scipy.minimize package, but I found this norm becomes so huge (even with parameter bounds!) that I get oveflow error (34, results too large), crashing the programme sooner or later. It didn't crash using Truncated Newton's Method, but the results produced were rubbish (and from running an optimization on this same data on a simpler model, I know they shouldn't be!)
I have to perform this fit for a few thousand data sets per experiment, so it has to be quite robust.
Surprisingly, I can not find a way to do multiobjective NLLS (only for linear regression). I have found some papers on this, but I'm not a mathematician so it's quite out of my depth to understand them and apply them in Python..
Has anyone had a similar problem to solve?
Many thanks!
Relevant answer
Answer
Two functions fitted simultaneously to one dataset? Do I understand correctly? What is the objective function?
  • asked a question related to Least-Squares Analysis
Question
3 answers
I am a new learner to process the analysis in SMART PLS-SEM. there are many options to assess model fit. Could you please help me to provide a complete report of Model Fit as well as references?
Goodness of Fit Formula
GoF = squre root (squre R *AVE)
best regards
Relevant answer
Answer
Hi Baraaah
SmartPLS offers the following fit measures:
SRMR
NFI
Chi²
RMS_theta
For the approximate fit indices such as SRMR and NFI, you may directly look at the outcomes of a PLS or PLSc model estimation (i.e., the results report) and these criteria's values with a certain threshold (e.g., SRMR < 0.08 and NFI > 0.90).
For the exact fit measures you may consider the inference statistics for an assessment. Therefore, you need to run the bootstrap procedure and to use the “complete bootstrap” option in SmartPLS.
Please check out the following links
  • asked a question related to Least-Squares Analysis
Question
10 answers
I have a mix order of integration after doing the unit root tests for all my variables. That is I(0) and I(1), I can't run panel Johansen co-integration or panel least square fix/ random effect because of this. it would violate the condition or assumption underlying them. The suitable approach is panel ARDL using Eviews-11. But i can't find any diagnostic test except for Histogram normality test. I don't know how to carry out serial correlation LM test, heteroscedasticity using the panel PMG/ARDL method on Eviews-11. Can i run the diagnostic test using the ordinary regression? and still use ARDL? PLEASE HELP.
Relevant answer
Answer
Dear Kehinde, since you are making use of PMG/panel ARDL, check for cross-sectional dependence. If there is no cross-sectional dependence, proceed to check for cointegration using the pedroni. Kao, etc. and if there is cointegration, go on to interpret your results.
However, if there is cross-sectional dependence you will need to utilize the second generation test of unit root, the westerlun cointegration test which are available in STATA.
  • asked a question related to Least-Squares Analysis
Question
5 answers
I want to do comprehensive study of errors in variables from both numerical analysis and statistical viewpoint and compare the results with regression for selected parameter estimation problems in my domain where it is expected to perform better in terms of accuracy. These problems are of type linear and non linear regression. I want to check if the method under study is an improvement over generalized least squares. I am including multiple factors like accuracy, computational efficiency, robustness, sensitivity in my study under different combinations of stochastic models. What kind of statistical analysis/ experimental design/metric/hypothesis test is required for a study of this nature to establish superiority of one method over another(to make a recommendation of one method over another for a particular class of problems)
Relevant answer
Answer
Maybe you wan to consider the recursive least squares algorithm (RLS). RLS is the recursive application of the well-known least squares (LS) regression algorithm, so that each new data point is taken in account to modify (correct) a previous estimate of the parameters from some linear (or linearized) correlation thought to model the observed system. The method allows for the dynamical application of LS to time series acquired in real-time. As with LS, there may be several correlation equations with the corresponding set of dependent (observed) variables. For the recursive least squares algorithm with forgetting factor (RLS-FF), adquired data is weighted according to its age, with increased weight given to the most recent data.
Years ago, while investigating adaptive control and energetic optimization of aerobic fermenters, I have applied the RLS-FF algorithm to estimate the parameters from the KLa correlation, used to predict the O2 gas-liquid mass-transfer, hence giving increased weight to most recent data. Estimates were improved by imposing sinusoidal disturbance to air flow and agitation speed (manipulated variables). The proposed (adaptive) control algorithm compared favourably with PID. Simulations assessed the effect of numerically generated white Gaussian noise (2-sigma truncated) and of first order delay. This investigation was reported at (MSc Thesis):
  • asked a question related to Least-Squares Analysis
Question
3 answers
I have torques and angular positions data (p) to model a second-order linear model T=Is2p+Bsp+kp(s=j*2*pi*f). So first I converted my data( torque, angular position ) from the time domain into the frequency domain. next, frequency domain derivative is done from angular positions to obtain velocity and acceleration data. finally, a least square command lsqminnorm(MATLAB) used to predict its coefficients, I expect to have a linear relation but the results showed very low R2 (<30%), and my coefficient not positive always!
filtering data :
angular displacements: moving average
torques: low pass Butterworth cutoff frequency(4 HZ) sampling (130 Hz )
velocities and accelerations: only pass frequency between [-5 5] to decrease noise
Could anyone help me out with this?
what Can I do to get a better estimation?
here is part of my codes
%%
angle_Data_p = movmean(angle_Data,5);
%% derivative
N=2^nextpow2(length(angle_Data_p ));
df = 1/(N*dt); %Fs/K
Nyq = 1/(2*dt); %Fs/2
A = fft(angle_Data_p );
A = fftshift(A);
f=-Nyq : df : Nyq-df;
A(f>5)=0+0i;
A(f<-5)=0+0i;
iomega_array = 1i*2*pi*(-Nyq : df : Nyq-df); %-FS/2:Fs/N:FS/2
iomega_exp =1 % 1 for velocity and 2 for acceleration
for j = 1 : N
if iomega_array(j) ~= 0
A(j) = A(j) * (iomega_array(j) ^ iomega_exp); % *iw or *-w2
else
A(j) = complex(0.0,0.0);
end
end
A = ifftshift(A);
velocity_freq_p=A; %% including both part (real + imaginary ) in least square
Velocity_time=real( ifft(A));
%%
[b2,a2] = butter(4,fc/(Fs/2));
torque=filter(b2,a2,S(5).data.torque);
T = fft(torque);
T = fftshift(T);
f=-Nyq : df : Nyq-df;
A(f>7)=0+0i;
A(f<-7)=0+0i;
torque_freq=ifftshift(T);
% same procedure for fft of angular frequency data --> angle_freqData_p
phi_P=[accele_freq_p(1:end) velocity_freq_p(1:end) angle_freqData_p(1:end)];
TorqueP_freqData=(torque_freq(1:end));
Theta = lsqminnorm((phi_P),(TorqueP_freqData))
stimatedT2=phi_P*Theta ;
Rsq2_S = 1 - sum((TorqueP_freqData - stimatedT2).^2)/sum((TorqueP_freqData - mean(TorqueP_freqData)).^2)
  • asked a question related to Least-Squares Analysis
Question
10 answers
  1. What is the difference between Least Square Regression and Robust Regression?
  2. How can we interpret the results of the regression model in both cases?
  3. If the variables in the data set have not shown proper correlation, can we use these techniques?
  4. Any R script references?
Relevant answer
Answer
1. In the normal lingo, very little. In robust regression, the standard errors are calculated differently, to make them "robust" against heteroscedasticity (or clustering).
2. You interpret the regression coefficient in the same manner: the average change in y given a one-unit increase in x.
3. I have written three textbooks on regression, but I have never heard the term "proper correlation" before. You do regression to find out if there is a (linear) association between x and y and, if so, to find out how large this assocoation is.
4. I seldom use R, but I know for a fact that both plain vanilla and robust regression is are straightforward to estimate in R.
Good luck :-)
  • asked a question related to Least-Squares Analysis
Question
4 answers
Online model updating can improve the prediction ability of the model. Unscented Kalman filter is used to update model parameters. I know it can be used when parameters are constant. Can I also use it to solve time-varying parameters? What's the alternative and what's the difference between online recursive least squares estimation.
Relevant answer
Answer
The recursive least squares algorithm (RLS) which allows for (real-time) dynamical application of least squares (LS) regression to a time series of time-stamped continuously acquired data points. As with LS, there may be several correlation equations and a set of dependent (observed) variables. RLS is the recursive application of the well-known LS regression algorithm, so that each new data point is taken in account to modify (correct) a previous estimate of the parameters from some linear (or linearized) correlation thought to model the observed system. For RLS with forgetting factor (RLS-FF), adquired data is weighted according to its age, with increased weight given to the most recent data. This is often convenient for adaptive control and/or real-time optimization purposes. A particularly clear introduction to RLS/RLS-FF is found at: Karl J. Åström, Björn Wittenmark, "Computer-Controlled Systems: Theory and Design", Prentice-Hall, 3rd ed., 1997.
Application example ― While investigating adaptive control and energetic optimization of aerobic fermenters, I have applied the RLS algorithm with forgetting factor (RLS-FF) to estimate the parameters from the KLa correlation, used to predict the O2 gas-liquid mass-transfer, while giving increased weight to most recent data. Estimates were improved by imposing sinusoidal disturbance to air flow and agitation speed (manipulated variables). The proposed (adaptive) control algorithm compared favourably with PID. Simulations assessed the effect of numerically generated white Gaussian noise (2-sigma truncated) and of first order delay. This investigation was reported at (MSc Thesis):
  • asked a question related to Least-Squares Analysis
Question
4 answers
I have surface tension data vs logc and I need to find the slope at every point of the curve to plot surface excess vs c, I understand that I have to use the least square method but I am not familiar with it when it is not related to a linear regression.
Relevant answer
Answer
I. You may consider to directly evaluate the slopes from the actual data. This procedure often requires a somewhat large thread of data and some previous data smoothing. It is, however, independent from correlation models.
II. Alternatively, you may rely on some suitable correlation model, possibly nonlinear. In that case, you may possibly try to first adopt a linearized correlation, modified from the preferred nonlinear correlation. The linearized correlation and the corresponding linearized plot are often quite convenient to qualitatively evidentiate scattering around the trendline and to emphasize major effects and the physical meaning of the correlation parameters. Estimates of the parameters first derived by least squares after the linearized correlation can possibly be later refined by iterative nonlinear least-squares regression; to find unbiased least-squares estimates for the preferred nonlinear correlation. The nonlinear correlation can then, in principle, be differentiated as required. It may be advisable to compare both the linearized and the nonlinear correlations.
  • asked a question related to Least-Squares Analysis
Question
5 answers
I have an experimental set of data ( xdata, ydata) and I want to fit a 5 constant expression to these data and find the unknown constants with following matlab built in function :
[q,resnorm,residual,EXITFLAG,output,LAMBDA]=lsqnonlin('fun',q0)
it converges. but the norm is too big and as of "q", it returns my initial guess as the final fit
. I checked my procedure with some other set of data and a different function, it works perfectly. does this mean that the initial guess is too far? any suggestions? 
thanks in adavance
Relevant answer
Answer
Often the nonlinear correlation to be fitted to data can be 'somehow' linearized, as a first stage. Linearized correlation and the corresponding linearized plot are often quite convenient to qualitatively evidentiate scattering around the trendline and to emphasize major effects and the physical meaning of the correlation parameters. Estimates of the parameters derived by least squares after the linearized correlation (modified from a former nonlinear correlation) can possibly be refined by iterative nonlinear least-squares regression; to find unbiased least-squares estimates for the original correlation. It may be advisable to compare both the linearized and the nonlinear correlations.
The following RG discussions seemingly can be found of some interest, concerning to your query:
  • asked a question related to Least-Squares Analysis
Question
3 answers
I am trying to optimize a branching process based model, specifically a least square estimate. I do not have a functional form of the objective function (square of error), but i simulate the branching process to generate the objective function which depends upon the parameters (the objective function needs to optimized for these parameters)
Perhaps, one can call it simulated least square estimation. So, how to choose algorithm for optimizing this least square estimate?
Relevant answer
Answer
Maybe you can consider the recursive least squares algorithm (RLS) which allows for (real-time) dynamical application of least squares (LS) regression to a time series of time-stamped continuously acquired data points. As with LS, there may be several correlation equations and a set of dependent (observed) variables. RLS is the recursive application of the well-known LS regression algorithm, so that each new data point is taken in account to modify (correct) a previous estimate of the parameters from some linear (or linearized) correlation thought to model the observed system. For RLS with forgetting factor (RLS-FF), adquired data is weighted according to its age, with increased weight given to the most recent data. This is often convenient for adaptive control and/or real-time optimization purposes. A particularly clear introduction to RLS/RLS-FF is found at: Karl J. Åström, Björn Wittenmark, "Computer-Controlled Systems: Theory and Design", Prentice-Hall, 3rd ed., 1997.
Application example for a similar linearized correlation ― the KLa correlation, used to predict the O2 gas-liquid mass-transfer ― In the signalled work (MSc Thesis), while investigating adaptive control and energetic optimization of aerobic fermenters, I have applied the RLS-FF algorithm to estimate the parameters from that correlation. Estimates were improved by imposing sinusoidal disturbance to air flow and agitation speed (manipulated variables). The power dissipated by agitation was accessed by a torque meter. The proposed (adaptive) control algorithm compared favourably with PID. This investigation was reported at (MSc Thesis):
  • asked a question related to Least-Squares Analysis
Question
6 answers
I want to fit a 3D line with known equation (F(x,y)) to a set of points (x,y,z), to find the parameters of the equation. Therefore, I need to solve: min||F(ydata,xdata)−zdata||. How can I implement this in Matlab? do you know any similar or substitute approach?
Relevant answer
Answer
The recursive least squares algorithm (RLS) allows for (real-time) dynamical application of least squares (LS) regression to a time series of time-stamped continuously acquired data points. As with LS, there may be several correlation equations and a set of dependent (observed) variables. For the recursive least squares algorithm with forgetting factor (RLS-FF), adquired data is weighted according to its age, with increased weight given to the most recent data. This is often convenient for adaptive control and/or real-time optimization.
Application example for a similar linearized correlation ― the KLa correlation, used to predict the O2 gas-liquid mass-transfer ― In the signalled work (MSc Thesis), while investigating adaptive control and energetic optimization of aerobic fermenters, I have applied the RLS algorithm with forgetting factor (RLS-FF) to estimate the parameters from that correlation. Estimates were improved by imposing sinusoidal disturbance to air flow and agitation speed (manipulated variables). The power dissipated by agitation was accessed by a torque meter. Simulations were carried in Excel 5.0 with Visual Basic for Applications (VBA) macros:
  • asked a question related to Least-Squares Analysis
Question
2 answers
Hi everyone,
I'm trying to fit certain data using nonlinear least squares which is of the following form
y(t) = a1*X1(t)+a2*X2(t)+…an*Xn(t)+b1*Y1(t-1)+..+bk*Y(t-k)
Where a1,..an and b1,..bk are the parameters, which I want to find during training the data. I have around 5000 sample points (X,Y), which varies over time. The curve fits very well during training. My R-square value is 98%, but the major issue is with validation. I used the same data, which I used in training to validate the model. I knew this is not the right process to validate the model, but I did to make sure that I can reproduce the same R-square with same data. I observed that my R-square drops drastically at least 20% for the same data. The reason I found is while validating the error from past outputs, y(t-1) is taken in to account at y(t) and also during training, the feedback output is free from error . The error keeps on growing and results in a drop of R-square value. In other words, training the data is more like an open loop, but during Validation, it turns in to real closed loop. Is there some way to accommodate this effect in optimisation function during training the data? so that I can have the same R-square value when I train and validate with the same data set
Thanks!
Relevant answer
Answer
Maybe you can alernatively consider the recursive least squares algorithm (RLS). RLS is the recursive application of the well-known least squares (LS) regression algorithm, so that each new data point is taken in account to modify (correct) a previous estimate of the parameters from some linear (or linearized) correlation thought to model the observed system. The method allows for the dynamical application of LS to time series acquired in real-time. As with LS, there may be several correlation equations and a set of dependent (observed) variables. For the recursive least squares algorithm with forgetting factor (RLS-FF), adquired data is weighted according to its age, with increased weight given to the most recent data. No prior 'training phase' is required.
Years ago, while investigating adaptive control and energetic optimization of aerobic fermenters, I have applied the RLS algorithm with forgetting factor (RLS-FF) to estimate the parameters from the KLa correlation, used to predict the O2 gas-liquid mass-transfer, hence giving increased weight to most recent data. Estimates were improved by imposing sinusoidal disturbance to air flow and agitation speed (manipulated variables). This investigation was reported at (MSc Thesis):
  • asked a question related to Least-Squares Analysis
Question
3 answers
As is well known, iterative methods for solving linear systems such as Successive Over Relaxation and the like, are very attractive for solving many problems such as sparse matrices. These methods, in general, are formulated in the context of determined system in which the number of equations is equal to the unknowns. Now and for sake of simplicity, let us assume that we have one additional observation and We need to update the previous solution. In other words, now we have an over determined system with the provision of this additional observation. The question is how to include this observation to the to update the previously computed parameters. Indeed, the theory of parameter estimation provides a lot of guidelines to handle this task to get an optimal solution in the sense of least squares. But let us assume that we need to stick to the iterative approach for parameters. Then with assumption how we could handle the additional observation for parameters update and what kind of errors that we need to minimize.
Relevant answer
Answer
If you are looking for sparse solution to a linear system of arbitrary size m by n, you can apply the iterative method called " Orthogonal Matching Pursuit (OMP) algorithm ".
Also, Lasso Regression model can be used in such cases, which involves a regularization term in a L1 norm. For sparsity in solution, minimization in L1 or Linfinity norm is sought.
OMP algorithm generates a sparse solution based on minimization in L0 norm ( number of non zero entries in the vector)
  • asked a question related to Least-Squares Analysis
Question
10 answers
Right now I am studying GRAVSOFT for geoid modeling to use it in my thesis, I tried to read the manual but it was not explaining the GUI Python version (it is explaining the Fortran version), so that I am still confused to understand the software clearly. I would like to understand clearly which data I have to use for determination geoid modeling and the steps (step by step) of doing that using GRAVSOFT programs.
please provide me any documents or any files that can let me understand all the programs inside the GRAVSOFT interface specifically for creating geoid modeling.
Thanks in advance and your comments are appreciated
Relevant answer
Answer
Anas Osman GEOCOL and GEOGRID for gross-error detection.
Good luck
  • asked a question related to Least-Squares Analysis
  • asked a question related to Least-Squares Analysis
Question
9 answers
As is known, the method of least squares (generalized least squares) is used in the sentence that independent residuals have a normal distribution. Then what will be the parameter estimation if we apply the least squares method (generalized least squares) in case of uniform distribution of residuals?
Relevant answer
Answer
he recursive least squares algorithm (RLS) is the recursive application of the well-known least squares (LS) regression algorithm, so that each new data point is taken in account to modify (correct) a previous estimate of the parameters from some linear (or linearized) correlation thought to model the observed system. The method allows for the dynamical application of LS to time series acquired in real-time. As with LS, there may be several correlation equations and a set of dependent (observed) variables. A particularly clear introduction to RLS is found at: Karl J. Åström, Björn Wittenmark, "Computer-Controlled Systems: Theory and Design", Prentice-Hall, 3rd ed., 1997. Years ago, while investigating adaptive control and energetic optimization of aerobic fermenters, I have applied the RLS algorithm with forgetting factor (RLS-FF) to estimate the parameters from the KLa correlation, used to predict the O2 gas-liquid mass-transfer, while giving increased weight to most recent data. Estimates were improved by applying sinusoidal excitation to air flow and agitation speed (manipulated variables). This investigation was reported at (MSc Thesis):
  • asked a question related to Least-Squares Analysis
Question
6 answers
I would like to know if the SUPG method has any advantages over the least squares finite element method?
Thank you for your reply.
Relevant answer
Answer
Dear Zmour,
It can be better in term of diffusion convection reaction. My opinion is little different, the least-squares method has better control of the streamline derivative than the SUPG.
Ashish
  • asked a question related to Least-Squares Analysis
Question
4 answers
I'm in a situation where I need to compare the accuracy of one stress-strain curve with respect to the other, with both curves having different x and y coordinates. If both curves have the same x-coordinates (independent variable) and varying y-coordinates (dependent variable), I could use the R squared value or the Weighted Least Squares (WLS) method.
I'm trying to avoid interpolation as there are many values and would be a very tedious task.
Any help is appreciated :)
Relevant answer
Answer
Thank you for all your answers. It is much appreciated :)
I stumbled upon a software called 'Origin Lab' that gets the job done.
  • asked a question related to Least-Squares Analysis
Question
1 answer
I've got myself into a bit of a mess with some analysis.
I have conducted an Interrupted Time Series analysis using GLS, but wanted to check whether or not such assumed a normal distribution?
The AIC values for my gls models are better than any glm equivalent, but I'm not sure how to interpret such?
Any advice would be appreciated
Relevant answer
Answer
No but it assumes some distribution so Google what you did to find out which. I suggest you look at Journal of Data Science11(2013), 575-606, which may be of some interest to you as an alternative approach your problem. Best Wishes, David Booth
  • asked a question related to Least-Squares Analysis
Question
3 answers
Question is related to Massive MIMO OFDM system Channel estimation by Least squares approach for ZF and MMSE Detectors
Relevant answer
Answer
I think the results are generated using Matlab code, but to be sure, you can email the first author.
  • asked a question related to Least-Squares Analysis
Question
2 answers
When creating & optimizing mathematical models with multivariate sensor data (i.e. 'X' matrices) to predict properties of interest (i.e. dependent variable or 'Y'), many strategies are recursively employed to reach "suitably relevant" model performance which include ::
>> preprocessing (e.g. scaling, derivatives...)
>> variable selection (e.g. penalties, optimization, distance metrics) with respect to RMSE or objective criteria
>> calibrant sampling (e.g. confidence intervals, clustering, latent space projection, optimization..)
Typically & contextually, for calibrant sampling, a top-down approach is utilized, i.e., from a set of 'N' calibrants, subsets of calibrants may be added or removed depending on the "requirement" or model performance. The assumption here is that a large number of datapoints or calibrants are available to choose from (collected a priori).
Philosophically & technically, how does the bottom-up pathfinding approach for calibrant sampling or "searching for ideal calibrants" in a design space, manifest itself? This is particularly relevant in chemical & biological domains, where experimental sampling is constrained.
E.g., Given smaller set of calibrants, how does one robustly approach the addition of new calibrants in silico to the calibrant-space to make more "suitable" models? (simulated datapoints can then be collected experimentally for addition to calibrant-space post modelling for next iteration of modelling).
:: Flow example ::
N calibrants -> build & compare models -> model iteration 1 -> addition of new calibrants (N+1) -> build & compare models -> model iteration 2 -> so on.... ->acceptable performance ~ acceptable experimental datapoints collectable -> acceptable model performance
  • asked a question related to Least-Squares Analysis
Question
2 answers
Hello,
I am using robust correlation, more specifically iterative reweighted least squares with bisquare weighting (DuMouchel & O'Brien 1989; Street et al. 1988; Hollland & Welsch 1977), and wondering whether this test makes any assumption that data are normally distributed.
Relevant answer
Answer
Hi Jim,
Thanks a lot for your answer. I'm considering continuous data here. Another way to phrase the question is the following: The Pearson correlation coefficient measures the strength of the linear relationship between normally distributed variables, while the Spearman rank correlation method does not assume that the variables are normally distributed. What I'm wondering now is how the robust method relates to these - does it still assume normality, like Pearson?
That's a good suggestion, I could investigate this in some simulations.
Best,
Nicolai
  • asked a question related to Least-Squares Analysis
Question
6 answers
I am using a MATLAB Code. I need to combine the parameters of the photovoltaic plant with those of the grid, then run a state estimation for the combined system to obtain estimated voltage magnitudes and phase angles.
Relevant answer
Answer