Science topics: Geodesy and SurveyingLeast-Squares Analysis
Science topic
Least-Squares Analysis - Science topic
Least-Squares Analysis is a principle of estimation in which the estimates of a set of parameters in a statistical model are those quantities minimizing the sum of squared differences between the observed values of a dependent variable and the values predicted by the model.
Questions related to Least-Squares Analysis
In Brewer, K.R.W.(2002), Combined Survey Sampling Inference: Weighing Basu's Elephants, Arnold: London and Oxford University Press, Ken Brewer proved not only that heteroscedasticity is the norm for business populations when using regression, but he also showed the range of values possible for the coefficient of heteroscedasticity. I discussed this in "Essential Heteroscedasticity," https://www.researchgate.net/publication/320853387_Essential_Heteroscedasticity, and further developed an explanation for the upper bound.
Then in an article in the Pakistan Journal of Statistics (PJS), "When Would Heteroscedasticity in Regression Occur, https://www.researchgate.net/publication/354854317_WHEN_WOULD_HETEROSCEDASTICITY_IN_REGRESSION_OCCUR, I discussed why this might sometimes not seem to be the case, but argued that homoscedastic regression was artificial, as can be seen from my abstract for that article. That article was cited by other authors in another article, an extraction of which was sent to me by ResearchGate, and it seemed to me to incorrectly say that I supported OLS regression. However, the abstract for that paper is available on ResearchGate, and it makes clear that they are pointing out problems with OLS regression.
Notice, from "Essential Heteroscedasticity" linked above, that a larger predicted-value as a size measure, where simply x will do for a ratio model as bx still gives the same relative sizes, means a larger sigma for the residuals, and thus we have the term "essential heteroscedasticity." This is important for finite population sampling.
So, weighted least squares (WLS) regression should generally be the case, not OLS regression. Thus OLS regression really is not "ordinary." The abstract for my PJS article supports this. (Generalized least squares (GLS) regression may even be needed, especially for time series applications.)
The threshold least square regression model by Hansen (2000) divides the series into two regimes endogenously. The Regime above the threshold and below the threshold, and then regress both regimes individually by OLS. this method also involve bootstrap replication. In my case the regime above the threshold only remain with 17 number of observation. Does it creates loss of degree of freedom issue in the data?
our dependent variables is stationary level while independent variables are stationary at level and first difference
Dear academician friends, I have a question about econometrics. I evaluated the relationship between the number of researchers in the health field and patents with a balanced panel analysis over 11 countries and 10 years. The data is regular; I evaluated the model with least squares and then performed causality and co-integration analyses. However, one peer reviewer insists that the data should be counted and recommends counted panel analysis. I looked at the subject, but there was no need for such an analysis, so I proceeded according to the suitability of econometric evaluations and diagnostic tests. How can I make such an analysis (counted data) on Eviews? Thanks.
I have performed several meta-analysis using STATA, but never the dose-response analysis. I have been reading articles explaining about the drmeta package where it can utilize the generalized least-squares regression, but some variables such as SE and LogRR seems unfamiliar. How can you calculate those variables in STATA when the provided data are: OR, RR, HR, 95% CI, MD, or SMD? I need help for step-by-step command for the dose-response analysis.
A comment from previous thread said the analysis can be done through meta-regression. But dose-response and meta-regression are completely two different analyses, aren't they?
Suppose I compute a least squares regression with the growth rate of y against the growth rate of x and a constant. How do I recover the elasticity of the level of y against the level of x from the estimated coefficient?
For example, There is no doubt that global sea level is rising, and based on the global mean sea level (GMSL)data, we can calculated the trend of the GMSL. However, we all know that that must be some interannual/decadal variations of the GMSL, and even the alising errors of our data. We can get the linear trend of GMSL timeseires based on least-square method. However, how can we estimate the uncertainty range of this trend? 1, GMSL timeseires have autocorrelation; 2, the variations of GMSL timeseries are not the white noises, the standard deviation of GMSL anomalies is not 1.
I am planning to assess the extent of different income diversification strategies on rural household welfare. Considering simultaneous causality between different livelihood strategies and welfare indicators, the Two Stage Least Square (2SLS) method with instrumental variables will applied to estimate the impact of the strategies on household welfare.
Please check the attached file also. I just need to know which regression was used in table 4 of this paper and which tool (SPSS, STATA, R, etc.) I need to use to analyse the data.
I am running FGLS using Stata and I want to know how I can get the value of Pseudo R2. Your help would be much appreciated.
Greetings,
I am currently in the process of conducting a Confirmatory Factor Analysis (CFA) on a dataset consisting of 658 observations, using a 4-point Likert scale. As I delve into this analysis, I have encountered an interesting dilemma related to the choice of estimation method.
Upon examining my data, I observed a slight negative kurtosis of approximately -0.0492 and a slight negative skewness of approximately -0.243 (please refer to the attached file for details). Considering these properties, I initially leaned towards utilizing the Diagonally Weighted Least Squares (DWLS) estimation method, as existing literature suggests that it takes into account the non-normal distribution of observed variables and is less sensitive to outliers.
However, to my surprise, when I applied the Unweighted Least Squares (ULS) estimation method, it yielded significantly better fit indices for all three factor solutions I am testing. In fact, it even produced a solution that seemed to align with the feedback provided by the respondents. In contrast, DWLS showed no acceptable fit for this specific solution, leaving me to question whether the assumptions of ULS are being violated.
In my quest for guidance, I came across a paper authored by Forero et al. (2009; DOI: 10.1080/10705510903203573), which suggests that if ULS provides a better fit, it may be a valid choice. However, I remain uncertain about the potential violations of assumptions associated with ULS.
I would greatly appreciate your insights, opinions, and suggestions regarding this predicament, as well as any relevant literature or references that can shed light on the suitability of ULS in this context.
Thank you in advance for your valuable contributions to this discussion.
Best regards, Matyas
i am runing an instrumental variable regression.
Eviews is providing two different models for instrumetenal variables i.e., two-stage least squares and generalized method of moments.
how to choose between the two models.
thanks in advance
>> size(output_data(1:L-1,:))
ans =
3359 1
>> size(U1)
ans =
1000 1
>>
>> output_data(1:L-1,:) * U1;
Error using *
Inner matrix dimensions must agree.
% Load experimental input-output data
load('data.mat');
input_data = input;
output_data = Output2;
% Define the number of inputs and outputs
num_inputs = 1;
num_outputs = 1;
% Define the order of the model
n = 2; % number of states
m = 1; % number of inputs
p = 1; % number of outputs
% Construct the Hankel matrix
L = 1000; % number of rows in the Hankel matrix
H = hankel(input_data(1:L), input_data(L:end));
% Apply the least squares method to estimate the model parameters
[U,S,V] = svd(H, 'econ');
U1 = U(:, 1:n);
U2 = U(:, n+1:end);
S1 = S(1:n, 1:n);
S2 = S(n+1:end, n+1:end);
V1 = V(:, 1:n);
V2 = V(:, n+1:end);
Ahat = U2 * pinv(S2) * V2' * output_data(1:L-1,:) * U1;
% compute Ahat
Ahat = U2 * pinv(S2) * V2' * output_data(1:L-1,:) * U1';
%Ahat = U2*S2^(-1/2)*V2'*U1'*output_data(1:L-1)';
Bhat = U2*S2^(1/2)*V2(1:num_inputs, :)';
Chat = output_data(1:p)*Bhat*S2^(-1/2)*V2(:, 1:p)';
Dhat = output_data(1:p)*Bhat*S2^(1/2)*V2(1:num_inputs, 1:p);
I have a dataset whic has around 20k datapoints (n=20k). My research question necessitates using Weighted Least Squares(WLS). But i am facing autocorrelation issue (sample has negetive autocorrelation, dw-statistic range from 2.05 to 2.2). I use R-Package for my analysis and it seems Cochrane-Orcutt is incompatible to WLS.
I got the following error message:
"Error in lmtest::dwtest(reg) : weighted regressions are not supported."
The same is specified in - https://cran.r-project.org/web/packages/lmtest/NEWS
Changes in Version 0.9-29
o dwtest() now catches weighted regressions and throws an error
because weighted regressions are not supported (yet).
Kindly let me know are there any ways i can handle the autocorrelation issue.
Regards,
Karthik N
Hi everyone,
I have a problem with crystallite size in Topas refinement. I'm using the LVol_FWHM_CS_G_L macro and the problem I'm constantly facing is that the Gaussian contribution goes to a large number (infinite crystallite size), while the Lorentzian contribution gives at least a reasonable value.
LVol_FWHM_CS_G_L(1, 5.50322798`, 0.89, 7.68859526`, csgc, 2356.57349`_LIMIT_MIN_0.3, cslc, 8.63459021`)
And this is obviously simultaneous with wrong strain value in e0_from_Strain macro where both values G and L of strain approach the min limit.
I'm dealing with highly disordered alumina materials if you'd like to know that.
Thank you in advance!
Jamal
Hello, I am studying a system and I will do a refinement (Rietveld). But my data is in cps and I want to convert to only counts.
What are the conditions to apply iterated weighted least squares regression model to apply for a panel data?
Suppose, I am finding the effect of GDP on ROA of a company. WHat will be the mathematical equation of the model if we apply iterated weighted least squares?
From D.Simon book, a linear recursive estimator is defined with two equations:
y_{k} = H_{k}*x + v_{k}
x^_{k} = x^_{k-1} + K_{k}( y_{k} - H_{k}*x^_{k-1} )
The examples are for the estimate of 1 constant, but what if I need to estimate 15 constants? I mean, let's say I can measure 3 quantities at a time, the y_{k} vector is (3x1) while the vector of the unknown constants I want to estimate is (15x1). The H_{k} matrix will obviously be rank-deficient if its dimensions are (3x15). In least-squares the H matrix must be full-rank. Is it the same with recursive estimation? Am I missing something? Thank you all in advance
Hi Everyone,
I am trying to use the SURE model in Nlogit, the software uses generalized least square regression for estimation. Is there any way or command to use OLS instead of GLS to be used in the SURE model?
Thanks
I am trying to convert vector into an image using the code below
clear variables
load(Exe4_2022.mat')
n = length(b);
figure,
imagesc(reshape(b,sqrt(n),sqrt(n))),
colormap(gray),
axis off;
But I am getting this error. Could anybody tells me how to resolve this issue??
Error using reshape
Size arguments must be real integers.
I have attached the "Exe4_2022.mat" file with this post.
Thanks
I am looking for study material (books, articles, codes or even YouTube videos) on parameter estimation of differential equation models (using any method but preferably least squares). I would like to calibrate some mathematical models in the form of ODEs and PDEs; I have a time series data set for the dependent variables. I could simply Google but I would like material that has demonstrations for easy learning.
linear least squares problem::
fit f(t) = sin(t) on the interval [0, π] with polynomials of degree n with n = 1, 2, 3, · · · , 10. Here, we use equally spaced nodes.
Solve the normal equation with a backslash in Matlab. Save the norm of the residual and the condition numbers of AT A??
Could anybody please tell me how can find x, y, and A in that case ??
Hello,
My dependent variable is having both positive and negative values. Also it has heteroskedasticity poble. By using Wighted Least Square regression, i was able to address the issue of heteroskedasticity. However, residuals of the model are not normal.
I find many of the standard solutions to address normality does not apply since i have negative values in dependent variable. Can anyone let me know what are the ways i can address non-normality? By any way i can defend regression output inspite of normality issue?
My sample size exceed 5000.
Regards,
Karthik
Hello,
I have a linear model (both with categorical factors and covariates) with a continuous response variable. I know that the observed means can be different from least square means based on the model, so I was wondering whether it is appropriate to create plots (e.g. boxplots that compare means from different factor categories) based on the observed means or if one should build them based on the least square means. Also, when presenting the results in the results section of a paper, should one present the least square means or is it still appropriate to present the observed means?
Thank you.
Hello dear researchers
I want to make a rietveld refinement for a new material (doped material) using fullprof.
The problem is that I need the cif file, but it's a new material that doesn't have a cif file. Can I use the cif file of the undoped material at the beginning of the refinement and then use the cif file generated by fullprof to finish the refinement?
Please, if you have any ideas, help me!
If you also have tips on how to do the rietveld refinement better, feel free to mention them.
From my experiment I get two values. One is a control value, and another one is the value of my interest. I get both values from a non-linear least-square model built on many data points; therefore, I can get their 95% confidence intervals and p-values. Thus, I got 0.350 (0.336-0.365 – 95%CI) in control and 0.882 (0.810-0.959) in sample of interest. Also, I know that this change is highly significant, p-value is 1.29E-9. Next, I repeat the experiment and get the following numbers: 0.389 (0.370-0.409) in control and 0.845 (0.775-0.920) in sample of interest. Again, the change is highly significant, p-value is 6.29E-9. Due to biological nature of my samples, it is difficult (if not possible) to standardize experimental conditions. But my idea is to test the control sample in the same conditions as sample of interest.
To summarize results of experiments I’d like to express them as a fold-change compared to control. In this case I get 0.882/0.365 = 2.41 and 0.845/0.389 = 2.17. So, I’d like to calculate 95% confidence intervals for these values. I found a method which is supposed to do so (https://journals.sagepub.com/doi/10.3102/1076998620934125). However, it gives me very wide interval including 1: 0.612-10.17. Apparently, it does not take into account my knowledge of high significance of the changes I observe.
My question is how I can calculate reasonable 95% percent confidence intervals in such case? References for R packages would be very valuable.
Thanks in advance!
Machine learning, Support vector machine, least squares support vector regression.
I need to estimate a set of 15 constant parameters, which are not directly measured. My state vector is therefore fixed and is made up of these constants so the Kalman filter equations rearrange to those of the Recursive Least Squares. Only 3 quantities, different from the state, can be measured and from these I have to estimate the state vector. Results indicates that the state vector is estimated well but the rank of the observation matrix used in the calculations is much less than 15 because the equations(measures) are less than the constants to be estimated. Is it normal? Must the rank be equal to the number of parameters? Thank you
Good day.
I am doing linear regression between a set of data and predictions made by two models, that I'll call A and B. Both models have the same number of parameters.
If I do a simple regression with excel, I get the following:
- Model A has R2 = 0.97.
- Model B has R2 = 0.29.
- The least-squares fit to model A has a slope m = 2.43.
- The slope for model B is m = 0.29
From this simple analysis, I would conclude that model A is better than model B in capturing the trend of experimental outcomes. I even tested it on a set of unseen data and it performed better at predicting the trends.
Now, I was asked to confirm this by hypothesis testing, and here it gets tricky probably due to my lack of experience. Due to the large slope of model A, the residual sum of squares for model A is huge, almost 5 times larger than that for model B. Since the number of data points and parameters is the same for both models, this suggests that model B is better than model A.
What am I doing wrong? I feel that I'm not formulating my problem correctly, but I'm honestly lost.
Also, I've seen that there are endless flavors of hypothesis testing, but the more I read the less I know where to start.
Is there a simple prescription to formulate and test my hypothesis?
Many thanks in advance!
Hi, I conducted a Hausman test whereby I got the results as show in the screenshot. Both ROE and ROA shows fixed effects, and Tobin Q comes up with random effects. Which one should I use in this case? Or am I meant to run a fixed effects regression separately and then a random effects regression for Tobin Q?
Also, what is the difference between using Least Squares Dummy Variable (LSDV) approach vs Fixed effects panel data regression & Random effects panel data regression
Hello, I'm currently working on fitting some data with an Anand model. Data is in the form of strain-stress graph from the process of a hot-working pure titanium.
I'm using a Matlab Curve fitting tool and I have to do some tweaking for my lower and upper parameteres bounds calculated with nonlinear least square method. One of the parameter, which I'm looking for is "s0" - initial deformation resistance. In many works s0 values are around ~50 MPa. How can I determine if the value found by Matlab is reasonable? Can I somehow take an educated guess based on my data to find if this value is correct?
Dear All,
I docked a ligand using Autodock vina and performed MD simulations for about 100ns.
I need help in analysis of the results. If we see the results can we say the ligand detached from the protein binding site and attached itself to some other site? If the ligand is detach can we say it is due to protein (unstable) on the basis of RMSD results?
1. Periodic boundary conditions were removed.
2. Protein structure was made using I-TASSAR (no PDB structure was available).
3. We we validate these results in wet lab.
4. Also please help in RMSD. What should be the criteria for selecting the best RMSD? Stable RMSD?
5. RMSD is Backbone least square fitting to heavy atoms of ligand.
6. RMSD of protein and ligand separate.
Really need help as I am totally new to MD simulations.
+1
Hello everyone,
Does anyone have any recommendations on how to calculate a diffusion coefficient from drug release data? I know that I will acquire the drug release over time and with this I can perform a mean least squares test to determine which kinetic model (Korsmeyer-Peppas, ect) is best fitted. This is where I am stuck, how do I get a diffusion coefficient from this kinetic model/data?
This isn't my usual background so I am struggling with this, any help would be greatly appreciated.
Cheers,
Sabrina
Hello everybody,
I am running a confirmatory factor analysis, treating data as ordinal rather than continuous. I am therefore using robust weighted least squares estimator (WLSMV) based on the polychoric correlation matrix of latent continuous response variables. Since I am not able to produce any BIC and AIC, I was wondering what estimator I could use for model comparison. I read that in Mplus one could run a DIFFTEST. Does anyone know how to do it in JASP, Jamovi or SPSS?
Thank you very much!
The assumption of robust least square regression and supporting scholars.
please I am working on state estimation of power systems using the weighted least square and i have zero clue on how to go about it using MATLAB. I am using an IEEE 14 bus system. please i need help. Anyone has the complete research and project please kindly reach out.
Hi, this is Dwira
I would like help in changing the the following exponential equation into the method of least squares to determine the values of "Aref" and "ho" form:
A(x)=Aref (1- e^-x/ho)
Thank you.
I have a big dataset (n>5,000) on corporate indebtedness and want to test wether SECTOR and FAMILY-OWNED are significant to explain it. The information is in percentage (total liabilities/total assets) but is NOT bounded: many companies have an indebtedness above 100%. My hypothesis are that SERVICES sector is more indebted than other sectors, and FAMILY-OWNED companies are less indebted than other companies.
If the data were normally distributed and had equal variances, I'd perform a two-way ANOVA.
If the data were normally distributed but were heteroscedastic, I'd perform a two-way robust ANOVA (using the R package "WRS2")
As the data is not normally distributed nor heteroscedastic (according to many tests I performed), and there is no such thing as a "two-way-kruskall wallis test", which is the best option?
1) perform a generalized least squares regression (therefore corrected for heteroscedasticity) to check for the effect of two factors in my dependent variable?
2) perform a non-parametric ANCOVA (with the R package "sm"? Or "fANCOVA"?)
What are the pros and cons of each alternative?
Reading some literatures, I understand that there are two methods; (1) the inner product method, and (2) the least square fitting method.
I made some simulations to compare these two methods, and found that both methods give almost the same results.
Of cource the 1st one is simpler and faster.
However the least square fitting method seems to be more popular, at least in industry.
Is there any reason to use the least square fitting method instead of the inner product method?
In MIMO channel estimation, the linear minimum mean squares estimator (LMMSE) yields better performance than the least-squares (LS) estimator. However, it requires knowing the channel covariance matrix (which constitutes prior information). In practice, the channel covariance has to be estimated based on previous channel estimations, but how are these previous channels estimated? With LS? With MMSE using an identity matrix as covariance? Anything else?
I want to examine the simultaneous relationship between the two using microeconomic (household) data. I am using consumption as a proxy variable for income. I am confused about the modelling of it.
I am currently doing a topology optimization of a given elastic tensor. My cost function is the square of the difference between the target elastic tensor and the elastic tensor, and my volume constraint is in the form of an equation, but I don’t know how to modify the MMA The parameters in make it applicable to least square optimization. I have read the notes of Professor Krister Svanberg and tried some, but still no success. Has anyone done similar optimizations? Can you give me a little help? Thank you everyone.
ran OLS regression on my data and found issues with auto correlation due to non-stationarity of data (time series data). I need to conduct a Generalized least Square regression as it is robust against biased estimators
The use of weighted least squares is common in analytical chemistry, but the evaluation of uncertainty is generally poorly documented. Do you know any international standard or guide that addresses the evaluation of uncertainty consistent with GUM (JCGM 100) when interpolated values are used in a curve obtained by weighted least squares.
Does anyone know where I can get any free 3-D smoothing spline code for irregular data in fortran?
I've used 1-D and 2-D code from Inoue, H., 1986: A least-squares smooth fitting for irregularly spaced data: Finite-element approach using the cubic B-spline basis. Geophysics, 5, 2051–2066.
cheers, arthur
Hello everyone,
I am trying to carry out multiple regression analysis but I cannot meet the assumptions for normal distribution of the residuals and for homoscedasticity. I transformed the dependant variable by inverting it which resulted in a normal distribution of the residuals. However, the variance of the residuals is heteroscedastic. I tried to dig around and came across weighted least square regression (WLS). My question is:
Do I need to do both WLS and data transformation? Can I do just WLS/does WLS require normal distribution of residual?
I apologise if these are silly questions but I am having to teach all of this myself and I am struggling.
Thank you!
I have been trying to find a way to fit two functions simultaneously using nonlinear least squares (I have to find the optimum 3 variables, common for both models, that fits best both of them). I typically use Python's scipy.optimize.least_squares module for NLLS work, which uses the Levenberg–Marquardt algorithm.
I tried some specialised multi-objective optimization packages (like pymoo), but they don't seem suitable for my problem as they rely on evolutionary algorithms that output a set of solutions (I only need one optimum solution per variable) and they are made to work for conflicting objectives.
I also tried to take the sum of the norms of the residuals of the two functions (making it into a single objective problem) and to minimize that by various gradient and non-gradient based algorithms from Python's scipy.minimize package, but I found this norm becomes so huge (even with parameter bounds!) that I get oveflow error (34, results too large), crashing the programme sooner or later. It didn't crash using Truncated Newton's Method, but the results produced were rubbish (and from running an optimization on this same data on a simpler model, I know they shouldn't be!)
I have to perform this fit for a few thousand data sets per experiment, so it has to be quite robust.
Surprisingly, I can not find a way to do multiobjective NLLS (only for linear regression). I have found some papers on this, but I'm not a mathematician so it's quite out of my depth to understand them and apply them in Python..
Has anyone had a similar problem to solve?
Many thanks!
I am a new learner to process the analysis in SMART PLS-SEM. there are many options to assess model fit. Could you please help me to provide a complete report of Model Fit as well as references?
Goodness of Fit Formula
GoF = squre root (squre R *AVE)
best regards
I have a mix order of integration after doing the unit root tests for all my variables. That is I(0) and I(1), I can't run panel Johansen co-integration or panel least square fix/ random effect because of this. it would violate the condition or assumption underlying them. The suitable approach is panel ARDL using Eviews-11. But i can't find any diagnostic test except for Histogram normality test. I don't know how to carry out serial correlation LM test, heteroscedasticity using the panel PMG/ARDL method on Eviews-11. Can i run the diagnostic test using the ordinary regression? and still use ARDL? PLEASE HELP.
I want to do comprehensive study of errors in variables from both numerical analysis and statistical viewpoint and compare the results with regression for selected parameter estimation problems in my domain where it is expected to perform better in terms of accuracy. These problems are of type linear and non linear regression. I want to check if the method under study is an improvement over generalized least squares. I am including multiple factors like accuracy, computational efficiency, robustness, sensitivity in my study under different combinations of stochastic models. What kind of statistical analysis/ experimental design/metric/hypothesis test is required for a study of this nature to establish superiority of one method over another(to make a recommendation of one method over another for a particular class of problems)
I have torques and angular positions data (p) to model a second-order linear model T=Is2p+Bsp+kp(s=j*2*pi*f). So first I converted my data( torque, angular position ) from the time domain into the frequency domain. next, frequency domain derivative is done from angular positions to obtain velocity and acceleration data. finally, a least square command lsqminnorm(MATLAB) used to predict its coefficients, I expect to have a linear relation but the results showed very low R2 (<30%), and my coefficient not positive always!
filtering data :
angular displacements: moving average
torques: low pass Butterworth cutoff frequency(4 HZ) sampling (130 Hz )
velocities and accelerations: only pass frequency between [-5 5] to decrease noise
Could anyone help me out with this?
what Can I do to get a better estimation?
here is part of my codes
%%
angle_Data_p = movmean(angle_Data,5);
%% derivative
N=2^nextpow2(length(angle_Data_p ));
df = 1/(N*dt); %Fs/K
Nyq = 1/(2*dt); %Fs/2
A = fft(angle_Data_p );
A = fftshift(A);
f=-Nyq : df : Nyq-df;
A(f>5)=0+0i;
A(f<-5)=0+0i;
iomega_array = 1i*2*pi*(-Nyq : df : Nyq-df); %-FS/2:Fs/N:FS/2
iomega_exp =1 % 1 for velocity and 2 for acceleration
for j = 1 : N
if iomega_array(j) ~= 0
A(j) = A(j) * (iomega_array(j) ^ iomega_exp); % *iw or *-w2
else
A(j) = complex(0.0,0.0);
end
end
A = ifftshift(A);
velocity_freq_p=A; %% including both part (real + imaginary ) in least square
Velocity_time=real( ifft(A));
%%
[b2,a2] = butter(4,fc/(Fs/2));
torque=filter(b2,a2,S(5).data.torque);
T = fft(torque);
T = fftshift(T);
f=-Nyq : df : Nyq-df;
A(f>7)=0+0i;
A(f<-7)=0+0i;
torque_freq=ifftshift(T);
% same procedure for fft of angular frequency data --> angle_freqData_p
phi_P=[accele_freq_p(1:end) velocity_freq_p(1:end) angle_freqData_p(1:end)];
TorqueP_freqData=(torque_freq(1:end));
Theta = lsqminnorm((phi_P),(TorqueP_freqData))
stimatedT2=phi_P*Theta ;
Rsq2_S = 1 - sum((TorqueP_freqData - stimatedT2).^2)/sum((TorqueP_freqData - mean(TorqueP_freqData)).^2)
- What is the difference between Least Square Regression and Robust Regression?
- How can we interpret the results of the regression model in both cases?
- If the variables in the data set have not shown proper correlation, can we use these techniques?
- Any R script references?
Online model updating can improve the prediction ability of the model. Unscented Kalman filter is used to update model parameters. I know it can be used when parameters are constant. Can I also use it to solve time-varying parameters? What's the alternative and what's the difference between online recursive least squares estimation.
I have surface tension data vs logc and I need to find the slope at every point of the curve to plot surface excess vs c, I understand that I have to use the least square method but I am not familiar with it when it is not related to a linear regression.
I have an experimental set of data ( xdata, ydata) and I want to fit a 5 constant expression to these data and find the unknown constants with following matlab built in function :
[q,resnorm,residual,EXITFLAG,output,LAMBDA]=lsqnonlin('fun',q0)
it converges. but the norm is too big and as of "q", it returns my initial guess as the final fit
. I checked my procedure with some other set of data and a different function, it works perfectly. does this mean that the initial guess is too far? any suggestions?
thanks in adavance
I am trying to optimize a branching process based model, specifically a least square estimate. I do not have a functional form of the objective function (square of error), but i simulate the branching process to generate the objective function which depends upon the parameters (the objective function needs to optimized for these parameters)
Perhaps, one can call it simulated least square estimation. So, how to choose algorithm for optimizing this least square estimate?
I want to fit a 3D line with known equation (F(x,y)) to a set of points (x,y,z), to find the parameters of the equation. Therefore, I need to solve: min||F(ydata,xdata)−zdata||. How can I implement this in Matlab? do you know any similar or substitute approach?
Hi everyone,
I'm trying to fit certain data using nonlinear least squares which is of the following form
y(t) = a1*X1(t)+a2*X2(t)+…an*Xn(t)+b1*Y1(t-1)+..+bk*Y(t-k)
Where a1,..an and b1,..bk are the parameters, which I want to find during training the data. I have around 5000 sample points (X,Y), which varies over time. The curve fits very well during training. My R-square value is 98%, but the major issue is with validation. I used the same data, which I used in training to validate the model. I knew this is not the right process to validate the model, but I did to make sure that I can reproduce the same R-square with same data. I observed that my R-square drops drastically at least 20% for the same data. The reason I found is while validating the error from past outputs, y(t-1) is taken in to account at y(t) and also during training, the feedback output is free from error . The error keeps on growing and results in a drop of R-square value. In other words, training the data is more like an open loop, but during Validation, it turns in to real closed loop. Is there some way to accommodate this effect in optimisation function during training the data? so that I can have the same R-square value when I train and validate with the same data set
Thanks!
As is well known, iterative methods for solving linear systems such as Successive Over Relaxation and the like, are very attractive for solving many problems such as sparse matrices. These methods, in general, are formulated in the context of determined system in which the number of equations is equal to the unknowns. Now and for sake of simplicity, let us assume that we have one additional observation and We need to update the previous solution. In other words, now we have an over determined system with the provision of this additional observation. The question is how to include this observation to the to update the previously computed parameters. Indeed, the theory of parameter estimation provides a lot of guidelines to handle this task to get an optimal solution in the sense of least squares. But let us assume that we need to stick to the iterative approach for parameters. Then with assumption how we could handle the additional observation for parameters update and what kind of errors that we need to minimize.
Right now I am studying GRAVSOFT for geoid modeling to use it in my thesis, I tried to read the manual but it was not explaining the GUI Python version (it is explaining the Fortran version), so that I am still confused to understand the software clearly. I would like to understand clearly which data I have to use for determination geoid modeling and the steps (step by step) of doing that using GRAVSOFT programs.
please provide me any documents or any files that can let me understand all the programs inside the GRAVSOFT interface specifically for creating geoid modeling.
Thanks in advance and your comments are appreciated
It's one of the Modal analysis identification techniques and also a time domain method.
As is known, the method of least squares (generalized least squares) is used in the sentence that independent residuals have a normal distribution. Then what will be the parameter estimation if we apply the least squares method (generalized least squares) in case of uniform distribution of residuals?
I would like to know if the SUPG method has any advantages over the least squares finite element method?
Thank you for your reply.
I'm in a situation where I need to compare the accuracy of one stress-strain curve with respect to the other, with both curves having different x and y coordinates. If both curves have the same x-coordinates (independent variable) and varying y-coordinates (dependent variable), I could use the R squared value or the Weighted Least Squares (WLS) method.
I'm trying to avoid interpolation as there are many values and would be a very tedious task.
Any help is appreciated :)
I've got myself into a bit of a mess with some analysis.
I have conducted an Interrupted Time Series analysis using GLS, but wanted to check whether or not such assumed a normal distribution?
The AIC values for my gls models are better than any glm equivalent, but I'm not sure how to interpret such?
Any advice would be appreciated
Question is related to Massive MIMO OFDM system Channel estimation by Least squares approach for ZF and MMSE Detectors
When creating & optimizing mathematical models with multivariate sensor data (i.e. 'X' matrices) to predict properties of interest (i.e. dependent variable or 'Y'), many strategies are recursively employed to reach "suitably relevant" model performance which include ::
>> preprocessing (e.g. scaling, derivatives...)
>> variable selection (e.g. penalties, optimization, distance metrics) with respect to RMSE or objective criteria
>> calibrant sampling (e.g. confidence intervals, clustering, latent space projection, optimization..)
Typically & contextually, for calibrant sampling, a top-down approach is utilized, i.e., from a set of 'N' calibrants, subsets of calibrants may be added or removed depending on the "requirement" or model performance. The assumption here is that a large number of datapoints or calibrants are available to choose from (collected a priori).
Philosophically & technically, how does the bottom-up pathfinding approach for calibrant sampling or "searching for ideal calibrants" in a design space, manifest itself? This is particularly relevant in chemical & biological domains, where experimental sampling is constrained.
E.g., Given smaller set of calibrants, how does one robustly approach the addition of new calibrants in silico to the calibrant-space to make more "suitable" models? (simulated datapoints can then be collected experimentally for addition to calibrant-space post modelling for next iteration of modelling).
:: Flow example ::
N calibrants -> build & compare models -> model iteration 1 -> addition of new calibrants (N+1) -> build & compare models -> model iteration 2 -> so on.... ->acceptable performance ~ acceptable experimental datapoints collectable -> acceptable model performance
Hello,
I am using robust correlation, more specifically iterative reweighted least squares with bisquare weighting (DuMouchel & O'Brien 1989; Street et al. 1988; Hollland & Welsch 1977), and wondering whether this test makes any assumption that data are normally distributed.
I am using a MATLAB Code. I need to combine the parameters of the photovoltaic plant with those of the grid, then run a state estimation for the combined system to obtain estimated voltage magnitudes and phase angles.