13th Apr, 2022

Homi Bhabha National Institute

Question

Asked 28th Sep, 2019

I have a model that needs calibration, but I am afraid that if I calibrate using too many model parameters, I will overfit to my data, or the calibration will not be well-done.

Can anyone suggest a method to determine the maximum number of parameters I should use?

**Get help with your research**

Join ResearchGate to ask questions, get input, and advance your work.

I think It is Better to guess the function instead of using simple polynomial fit. Ofcourse in all cases we need to keep in mind the outliers . The advantage of using a function is the parameters will have some physical significance according to your model. Also from the chisq/ ndf of different fit you can compare the goodness of the fit.

For purpose of calibration, i suggest you carry out a linear/planar form of model fitting initially, and perform the appropriate statistical analyses including the model diagonstic tests, in case you find the linear / planar model poorly fitting your data, you can sequentially go for a higher order model that best fits your data using the stepwise selection of regressors algorithm.

An alternate approach is to use a neural network approach, design a neural network with pre chosen no. of parameters( connecting weights, bias values and no. of neurons in hidden layers etc. ) , use experimental data on process input and output as training set with an appropriate algorithm, e.g back propagation algorithm or genetic algorithm as the optimizer.

Initially the fitting problem is based on how you choose the shape of the fitting equation and you search for the fitting parameters. If the data is hard to detect its shape you can assume a nonlinear equation of the second order and you raise the order gradually to the limit of the accepted accuracy. For some problems the model may be assumed as a polynomial have parameters on the numerator and others on the denumerators, You can rely easily on several published letters on my list of publications that cover this topic.

Good Luck

Prof. S. El-Rabaie

1st Oct, 2019

Agree with Qin that the number of parameters should not usually exceed the number of data points less one. Deviation from this can be justified only if you have information additional to your data, e.g. a theoretical reason why a certain function is relevant. More parameters, up to data-1, will give a better fit (i.e. reduced residuals) but not necessarily better understanding - see comments from others on how to assess this. The more parameters you have, the more poorly behaved the model is likely to be outside the range of data, or even between data points.

Typically, if the number of parameters of a model are fewer than the number of data points, then some optimisation process (like the least-squares method) can be used to determine the optimal model parameters. However, if the number of parameters are greater than the data points, then a unique set of parameters cannot be determined - of course in certain cases, the Lagrange multipliers approach has been proposed (see for example: http://people.csail.mit.edu/bkph/articles/Pseudo_Inverse.pdf ).

To illustrate: consider a simple linear models; it has two model parameters, the gradient, m, and offset, c. Two or more data points are needed to estimate the numerical values for m and c. If we had only one data point, then an infinity number of lines can be fitted and would be equally viable. However, if prior information about one of the model parameters were know, then the infinite number of solutions can be collapsed to a single solution!

I hope this helps!

Article

- Jan 1976

This discussion reviews progress in applying conversational programming techniques to obtain parameters of Weibull and double exponential distribution functions for sets of fatigue lives measured at constant stress amplitude. Cases also are described where the techniques have been used to describe distributions of measured maximum corrosion pit dep...

Article

- Feb 2011

The proposed SBBB distribution allows for a more complete summarization of stand structure data than has been possible in forestry until now. For two data sets constructed out of a large loblolly pine data set to resemble possible plantations, the univariate SB and the trivariate SBBB distributions fit the data reasonably well as measured by χ2. Th...

Article

- Jul 2004

A conceptual model to predict the threshold shear velocity, which should be overcome to initiate deflation of moist sediment, was recently developed by Cornelis et al. The model relates the threshold shear velocity to the ratio between water content and the water content at a matric potential of -1.5 MPa, and contains one proportionality coefficien...

Get high-quality answers from experts.