How to handle negative values in log transformations in a regression analysis?

Question

I would like to use a linear form of Cobb-Douglas production function in my project. However, there are some negative values in one of my independent variables. As far as I know, there are some transformations as follows:
log(1+Y-min(Y))
or
- log( -y + 1) if Y is negative
&#xA0; log( y + 1) if&#xA0;Y is non-negative
Which is better? What are the consequences of these transformations in a regression analysis? Is there any&#xA0;better&#xA0;solution (e.g., a symmetric transformation that pulls in extreme values and preserves sign)?
Any comment would be appreciated.
Thanks in advance

Abdulrazzak Charbaji · Accepted Answer

Adding &#xA0; or &#xA0;subtracting a constant &#xA0;affects &#xA0; the mean &#xA0;but &#xA0; does &#xA0;not affect &#xA0;variance &#xA0;. Therefore &#xA0;it is &#xA0;recommended &#xA0;to add a constant . &#xA0;The best constant to add in case of Cobb Douglas production function is to add the same constant to all values of the same variable which makes all values of the variable positive. &#xA0;Suppose you have three negative values &#xA0;such as -6 &#xA0;and -9 &#xA0;and &#xA0; minus &#xA0; 2 &#xA0; then &#xA0; Adding the constant &#xA0; 10 &#xA0;to all values &#xA0;will make all values positive and greater than zero . The transformation such as log becomes possible without affecting R SQR &#xA0; or &#xA0;elacticity &#xA0;etc...

Vladislav Shchekoldin · Answer

Unfortunately, most economic data are&#xA0;non-invariant to the shift-transformation-type (when someone adds or subtracs some&#xA0;constant to the data). So your variant of transformation in many cases could directly lead to the wrong (skewed) results.If you have some negative values of the responce variable the log-transformation could be applied in&#xA0;two&#xA0;differents ways: 1) you can drop these values from the&#xA0;sample and estimate your model on such &#x22;truncated&#x22; sample; here of course you can lost some degrees of freedom and get results of less statistical sufficiency; 2) you can use so-called&#xA0;censored regression approach (for instance, tobit-models). It is much more correct case but it requires from you some special efforts in understanding the idea of tobit-type models. There are loads of books on the poblem, hope you will be successful.

Timothy A Ebert · Answer

Sometimes negative values can be removed by reformulating the problem or correcting errors. Do the negative values make sense in the context of the problem? What is the independent variable that is causing problems? Is it income, percent growth, coffee consumption?
@Vladislav: could you provide a citation to the observation that economic data are non-invariant to shift transformations. I am interested to see what it is about economic data that produces this outcome.

Sergiy Prykhodko · Answer

Dear &#xA0;Morteza,
In this case, instead of the log transformation is better to use other transformations, for example, Johnson translation system or a two-parameter Box-Cox transformation.

Amir Hossein Montazer Hojat · Answer

you can change origin so that all observation to be positive. then you cat transfer into log form

Timothy A Ebert · Answer

Here is another option if you can assume that the reason you have zeros is because your sample size is insufficient to get a non-zero value. In biology this might be something like measuring insect mortality due to the application of an insecticide at 1, 10, 100, and 1000 grams/hectare. The 1 gram/hectare rate results in a mortality rate of 0.002%. Given that your sample size was 10, it is unlikely that any of the ten insects will die. So your estimate is zero mortality, but that is only because your sample size was inappropriate for measuring such a low mortality.
If a similar situation applies in your case, what happens if you just delete the zero values?

Regret Tanyara Sunge · Answer

Hie.Surely negative values are common in regression.Adding a constant to make the minimum value positive has no harm to analysis. If the variable concerned is the depended variable, adding a constant will only alter the constant, but the parameter estimates will be the same.For independent variables, adding a constant does not change the parameters neither.Try it!

Justas Birgiolas · Answer

One could use the &#x22;Bi-Symmetric Log transformation&#x22;, which performs a log-like transformation on numbers that are negative and doesn&#x27;t exaggerate the 0-1 region.

A bi-symmetric log transformation for wide-range data

Ali Madina Dankumo · Answer

Please is there any reference to back up this formula &#x22;log(Y+a)&#x22;  for log transformation of negative numbers?

Saizal Pinjaman · Answer

=log(sqrt((X^2)+1))

Markos Farag · Answer

A common approach to handle negative values is to add a constant value to the data prior to applying the log transform. The transformation is therefore log(Y+a) where a is the constant. Some people like to choose a so that min(Y+a) is a very small positive number (like 0.001). Others choose a so that min(Y+a)&#xA0;=&#xA0;1. For the latter choice, you can show that a&#xA0;=&#xA0;b &#x2013; min(Y), where b is either a small number or is 1.
 A criticism of this method is that some practicing econometricians don&#x27;t like to add an arbitrary constant to the data. The argument is that a better way to handle negative values is to use missing values for the logarithm of a non-positive number.

Carlos Ara&#xFA;jo Queiroz · Answer

i. Adding a constant fixed arbitrary value to a variable, in order to somehow render the argument of the logarithmic function positive, may be reasonable (or not) depending on the scientific support (possibly, the physical meaning) that may justify this option. Note that such constant does not necessarily add to the mentioned argument as a whole. It may possibly add just to a variable from several possibly considered. It may be, alternatively, found preferable to add a parameter, to be fitted after the data, rather than a constant of fixed arbitrary value. 
ii. A quantity (q &#x2265; 0 u) can be logarithmized after dividing by its unit (u); except for the origin of the scale, where discontinuity is expected because ln(0) is undefined. This discontinuity does not occur if the said quantity is restricted in its domain to (q &#x2265; u), so that it is also ensured that the argument of the logarithmic function is q/u &#x2265; 1. This may be found convenient if negative logarithmic values are found undesirable. We sometimes encounter the case where the quantity q is a monotonic increasing function of time, q(t), so that some &#x27;hidden&#x27; time (&#x3B8;) can be added to displace the origin of the scale, possibly from q/u = 0 to q/u = 1, or as otherwise can be found justifiable. This figure (&#x3B8;) is sometimes called induction time in some kinetic studies, and it can be possibly adopted as (an additional) correlation parameter.
As example for this kind approach; you may check my post here:
https://www.researchgate.net/post/How_can_I_find_an_specific_equation_that_can_describe_the_relationship_between_X_and_Y_in_the_attached_file
You may also find useful to check: https://www.researchgate.net/post/How_to_calculate_the_logarithm_of_a_negative_number2

Samaila Adamu · Answer

Thats good contribution and way out without affecting the variance

How to handle negative values in log transformations in a regression analysis?

Most recent answer

Popular answers (1)

Top contributors to discussions in this field

All Answers (13)

Similar questions and discussions

Related Publications

Related Publications

Ajuste de modelos de regresión lineal simple con ambas variables sujetas a error / Omar Enrique Castro

A new biased estimator for multivariate regression models with highly collinear variables
Article
Jun 2009

Regression and time series: discussion
Article
Jan 1980