Question
Asked 25th May, 2016

How to handle negative values in log transformations in a regression analysis?

I would like to use a linear form of Cobb-Douglas production function in my project. However, there are some negative values in one of my independent variables. As far as I know, there are some transformations as follows:
log(1+Y-min(Y))
or
- log( -y + 1) if Y is negative
  log( y + 1) if Y is non-negative
Which is better? What are the consequences of these transformations in a regression analysis? Is there any better solution (e.g., a symmetric transformation that pulls in extreme values and preserves sign)?
Any comment would be appreciated.
Thanks in advance

Most recent answer

Samaila Adamu
Federal University, Kashere, Gombe State
Thats good contribution and way out without affecting the variance
1 Recommendation

Popular answers (1)

Abdulrazzak Charbaji
Lebanese American University - LAU
Adding   or  subtracting a constant  affects   the mean  but   does  not affect  variance  . Therefore  it is  recommended  to add a constant .  The best constant to add in case of Cobb Douglas production function is to add the same constant to all values of the same variable which makes all values of the variable positive.  Suppose you have three negative values  such as -6  and -9  and   minus   2   then   Adding the constant   10  to all values  will make all values positive and greater than zero . The transformation such as log becomes possible without affecting R SQR   or  elacticity  etc...
10 Recommendations

All Answers (13)

Vladislav Shchekoldin
Novosibirsk State Technical University
Unfortunately, most economic data are non-invariant to the shift-transformation-type (when someone adds or subtracs some constant to the data). So your variant of transformation in many cases could directly lead to the wrong (skewed) results.If you have some negative values of the responce variable the log-transformation could be applied in two differents ways: 1) you can drop these values from the sample and estimate your model on such "truncated" sample; here of course you can lost some degrees of freedom and get results of less statistical sufficiency; 2) you can use so-called censored regression approach (for instance, tobit-models). It is much more correct case but it requires from you some special efforts in understanding the idea of tobit-type models. There are loads of books on the poblem, hope you will be successful.
2 Recommendations
Timothy A Ebert
University of Florida
Sometimes negative values can be removed by reformulating the problem or correcting errors. Do the negative values make sense in the context of the problem? What is the independent variable that is causing problems? Is it income, percent growth, coffee consumption?
@Vladislav: could you provide a citation to the observation that economic data are non-invariant to shift transformations. I am interested to see what it is about economic data that produces this outcome.
2 Recommendations
Sergiy Prykhodko
Admiral Makarov National University of Shipbuilding
Dear  Morteza,
In this case, instead of the log transformation is better to use other transformations, for example, Johnson translation system or a two-parameter Box-Cox transformation.
2 Recommendations
Abdulrazzak Charbaji
Lebanese American University - LAU
Adding   or  subtracting a constant  affects   the mean  but   does  not affect  variance  . Therefore  it is  recommended  to add a constant .  The best constant to add in case of Cobb Douglas production function is to add the same constant to all values of the same variable which makes all values of the variable positive.  Suppose you have three negative values  such as -6  and -9  and   minus   2   then   Adding the constant   10  to all values  will make all values positive and greater than zero . The transformation such as log becomes possible without affecting R SQR   or  elacticity  etc...
10 Recommendations
Amir Hossein Montazer Hojat
Shahid Chamran University of Ahvaz
you can change origin so that all observation to be positive. then you cat transfer into log form
3 Recommendations
Timothy A Ebert
University of Florida
Here is another option if you can assume that the reason you have zeros is because your sample size is insufficient to get a non-zero value. In biology this might be something like measuring insect mortality due to the application of an insecticide at 1, 10, 100, and 1000 grams/hectare. The 1 gram/hectare rate results in a mortality rate of 0.002%. Given that your sample size was 10, it is unlikely that any of the ten insects will die. So your estimate is zero mortality, but that is only because your sample size was inappropriate for measuring such a low mortality.
If a similar situation applies in your case, what happens if you just delete the zero values?
1 Recommendation
Regret Tanyara Sunge
Afromontane Research Unit (ARU) Department of Economics and Finance University of the Free State
Hie.Surely negative values are common in regression.Adding a constant to make the minimum value positive has no harm to analysis. If the variable concerned is the depended variable, adding a constant will only alter the constant, but the parameter estimates will be the same.For independent variables, adding a constant does not change the parameters neither.Try it!
2 Recommendations
Justas Birgiolas
Ronin Institute
One could use the "Bi-Symmetric Log transformation", which performs a log-like transformation on numbers that are negative and doesn't exaggerate the 0-1 region.
Ali Madina Dankumo
Federal University Kashere
Please is there any reference to back up this formula "log(Y+a)" for log transformation of negative numbers?
1 Recommendation
Saizal Pinjaman
Universiti Malaysia Sabah (UMS)
=log(sqrt((X^2)+1))
1 Recommendation
Markos Farag
University of Cologne
A common approach to handle negative values is to add a constant value to the data prior to applying the log transform. The transformation is therefore log(Y+a) where a is the constant. Some people like to choose a so that min(Y+a) is a very small positive number (like 0.001). Others choose a so that min(Y+a) = 1. For the latter choice, you can show that a = b – min(Y), where b is either a small number or is 1.
A criticism of this method is that some practicing econometricians don't like to add an arbitrary constant to the data. The argument is that a better way to handle negative values is to use missing values for the logarithm of a non-positive number.
1 Recommendation
Carlos Araújo Queiroz
Universidade NOVA de Lisboa
i. Adding a constant fixed arbitrary value to a variable, in order to somehow render the argument of the logarithmic function positive, may be reasonable (or not) depending on the scientific support (possibly, the physical meaning) that may justify this option. Note that such constant does not necessarily add to the mentioned argument as a whole. It may possibly add just to a variable from several possibly considered. It may be, alternatively, found preferable to add a parameter, to be fitted after the data, rather than a constant of fixed arbitrary value.
ii. A quantity (q ≥ 0 u) can be logarithmized after dividing by its unit (u); except for the origin of the scale, where discontinuity is expected because ln(0) is undefined. This discontinuity does not occur if the said quantity is restricted in its domain to (qu), so that it is also ensured that the argument of the logarithmic function is q/u ≥ 1. This may be found convenient if negative logarithmic values are found undesirable. We sometimes encounter the case where the quantity q is a monotonic increasing function of time, q(t), so that some 'hidden' time (θ) can be added to displace the origin of the scale, possibly from q/u = 0 to q/u = 1, or as otherwise can be found justifiable. This figure (θ) is sometimes called induction time in some kinetic studies, and it can be possibly adopted as (an additional) correlation parameter.
As example for this kind approach; you may check my post here:
1 Recommendation
Samaila Adamu
Federal University, Kashere, Gombe State
Thats good contribution and way out without affecting the variance
1 Recommendation

Similar questions and discussions

Related Publications

Article
Mecanografiado Tesis (Magister Scientiarum)-- Universidad de Los Andes, Facultad de Economía, Instituto de Estadística Aplicada y Computación, Mérida, 1991 Incluye bibliografía
Article
Es ist wohlbekannt, dass der Kleinste-Quadrate-Schätzer im Falle vorhandener Multikollinearität eine große Varianz besitzt. Eine Möglichkeit dieses Problem zu umgehen, besteht in der Verwendung von verzerrten Schätzern, z.B den Ridge-Schätzer. In dieser Arbeit wird ein neues Schätzverfahren vorgestellt, dass auf Addition einer kleinen Konstanten om...
Article
Discussion on the papers by Brown, Philip J., Aspects of multivariate regression and by Dempster, Arthur P., Bayesian inference in applied statistics, both of them part of a round table on Regression and time series held in the First International Congress on Bayesian Methods (Valencia, Spain, 28 May - 2 June 1979).
Got a technical question?
Get high-quality answers from experts.