ArticlePDF Available

Abstract and Figures

The Gini index is a summary statistic that measures how equitably a resource is distributed in a population; income is a primary example. In addition to a self-contained presentation of the Gini index, we give two equivalent ways to interpret this summary statistic: first in terms of the percentile level of the person who earns the average dollar, and second in terms of how the lower of two randomly chosen incomes compares, on average, to mean income.
Content may be subject to copyright.
The Gini Index and Measures of Inequality
Frank A. Farris
Abstract. The Gini index is a summary statistic that measures how equitably a resource is
distributed in a population; income is a primary example. In addition to a self-contained pre-
sentation of the Gini index, we give two equivalent ways to interpret this summary statistic:
first in terms of the percentile level of the person who earns the average dollar, and second
in terms of how the lower of two randomly chosen incomes compares, on average, to mean
income.
1. INTRODUCTION. You hear anecdotes all the time: The poorest 20% of the peo-
ple on Earth earn only 1% of the income. A mere 20% of the people on Earth consume
86% of the consumer goods. Only 3% of the U.S. population owns 95% of the privately
held land.
The Gini index offers a consistent way to talk about statistics like these. A single
number that measures how equitably a resource is distributed in a population, the Gini
index gives a simple, if blunt, tool for summarizing economic data. It allows us to
illustrate how equity has changed in a given situation over time, such as how U.S.
family income changed over the 20th century. (The poor got poorer over the second
half.) We can also compare income or wealth across societies, and even analyze salary
structures of organizations.
Being only a single summary statistic, the Gini index has been critiqued by social
scientists [2]. It is true that no summary statistic can reveal all we need to know about
the distribution of income, wealth, or land. Even so, the Gini index deserves to be
better known in the mathematical community, as it continues to find application in
new situations, from genetics [7] to astronomy [1].
In addition to a self-contained presentation of the Gini index, we give two equivalent
ways to interpret this summary statistic: first in terms of the percentile level of the
person who earns the average dollar, and second in terms of how the lower of two
randomly chosen incomes compares, on average, to mean income. The first of these
appears to be new; the second has appeared in the literature [11], but does not seem to
be well known. Beyond the inherent mathematical interest, our story draws attention
to the concept of inequity and offers readers tools to help them go beyond the factoids
of the first paragraph.
2. DEFINING THE GINI INDEX. Though it is named for Italian statistician Cor-
rado Gini (1884–1965), the Gini index can almost be glimpsed in the diagrams from
a 1905 paper by M. O. Lorenz [12]. Gini’s original work on the subject appeared in
1912 in Italian [8]; it is not easy to access. Fortunately, the paper by Lorenz is quite
charming to read and gives an excellent historical snapshot of the seeds of this train of
thought. The first sentence is memorable:
There may be wide difference of opinion as to the significance of a very unequal
distribution of wealth, but there can be no doubt as to the importance of knowing
whether the present distribution is becoming more or less unequal.
doi:10.4169/000298910X523344
December 2010] THE GINI INDEX AND MEASURES OF INEQUALITY 851
Let us define a Lorenz curve, the instrument Lorenz proposed for visualizing the
distribution of a quantity in a population. Suppose that some quantity Q, which could
stand for wealth, income, family income, land, food, and so on, is distributed in a pop-
ulation. If we imagine the population to be lined up by increasing order of their shares
of Q(with ties being broken arbitrarily), then for any pbetween 0 and 1 the people
in the first fraction pof the line represent the Q-poorest 100 p% of the population. We
then call L(p)the fraction of the totality of Qowned (or earned or controlled or eaten)
by that fraction of the population. In summary:
The Lorenz curve for a resource Qis the curve y=L(p),wheretheQ-poorest
fraction pof the population has a fraction L(p)of the whole.
Using this vocabulary, the first sentence of the paper would be expressed as
L(.20)=.01, where Lis the Lorenz curve for world income. The variable pis
called the percentile variable.
If everyone had exactly the same amount of Q, the order of our imaginary line-up
would be completely arbitrary and we would say that L(p)=p,thecurveofperfect
equitability. In other situations where some fraction of the population all share equally
in an amount of Q, our rule for an arbitrary order of that portion of the line results in a
linear segment of the Lorenz curve. For instance, if everyone in the bottom half of the
population owned an equal share of 1/4 of the wealth, we would say that L(p)=p/2
for 0 p1/2, so that L(1/2)=1/4.A purist might say that, in a population of N
individuals, it only makes sense for pto take on values of the form k/N. In practice,
we model Lorenz curves as being defined for all p, using linear interpolation whenever
necessary. This requires us to say, for instance, that the poorest 10% of an individual
earns 10% of that person’s income, which is not too much of a stretch.
The Gini index is a quantity calculated from a particular Lorenz curve. It is defined
as an integral that summarizes how much the Lorenz curve in question deviates from
perfect equitability:
G:= 21
0
[pL(p)]dp.(1)
The formula reveals why the Gini index sometimes appears in calculus books in the
section on the area between two curves. The reason for the factor 2 is to scale the area
in such a way that the Gini index varies between 0, perfect equitability where everyone
has the same share of the good, and 1, where one person has everything.
I downloaded data for 2006 family income from the U.S. Census Bureau [14]. Mod-
ulo some details, described in another section, it is easy to use a spreadsheet program
to create a Lorenz curve from the data and estimate the Gini index. This is shown in
Figure 1.
The Gini index for this situation works out to be about .47, which agrees with the
figure reported by the U.S. Census Bureau [13]. For perspective, the similar indices
for Brazil and Denmark are about .58 and .24 respectively [4]. Does this fit with what
you know about these societies? It is also instructive to compare Gini indices over
time: The index for U.S. family income hit its 20th century low of around .36 in 1968.
Does this match your understanding of social change in America over the last several
decades?
The CIA World Factbook [4] reports that the Gini index of U.S. family income (for
2007) is .45, and other sources claim figures even below .40. These lower figures tend
to be adjusted indices, taking income adjustments into account. For instance, in the
852 c
THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
US Family Income 2006
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Figure 1. The Gini index is twice the area between the Lorenz curve and the curve of perfect equitability. For
U.S. family income in 2006, the data leads to an estimate of G.47.
U.S., the tax structure, which asks higher income citizens to pay a higher percentage
of income so that various benefits (food stamps, Medicare, etc.) may be given to lower
income residents, does in fact reduce inequity, but this feature of U.S. society would
be missed if only raw income figures were used to compute a Gini index. When com-
paring Gini indices, we should take care to specify exactly what is being measured. We
will side-step any deep analysis of this point in favor of mathematics, simply warning
readers to be careful to compare Gini indices only when they are computed in the same
way.
The Gini index need not be a grim topic. When I was learning how to calculate
indices, Alex Rodriguez had just been hired by the N.Y. Yankees for an annual salary
of $22 million. I downloaded the salary schedules for the Yankees and the Red Sox
from the ESPN website and determined that the salary Gini for the Yankees during
that season was .57, noticeably higher than for the more equitable Red Sox, at .52.
I found a contrasting example in the exam scores of my calculus students. Even
with more than half the students below a mean of 84%, the Gini index for distribution
of exam points was only .06. Does this mean that my grading is especially fair?
3. CONNECTING TO PROBABILITY. As we move toward calculating Gini in-
dices, we must accept that economic data are almost always reported in aggregated
form. Except in fanciful applications like salaries of baseball players where we have
an income figure for each individual, we get tables of data where one column lists the
(very large) number of people in a given range of incomes and another gives the mean
income for this group. Income ranges are listed in convenient order from lowest to
highest. A truncated version of the data appears in Table 1.
At the ends of the spectrum of 2006 U.S. family income, there are 2,533 thousand
households with mean income $295 and 2,240 thousand households with mean income
$448,687.
In general, let us name the entries in any such table as hjunits (households) that,
on the average, have an amount xjof our good Q(income). If our table has nrows,
then jranges from 1 to nand the order of the table means that xj<xkwhen j<k.
December 2010] THE GINI INDEX AND MEASURES OF INEQUALITY 853
Tab le 1. U.S. family income, 2006, aggregated.
Number of households Average income
(in thousands) hjxj
2,533 295
1,030 3,737
2,124 6,431
3,002 8,713
3,677 11,206
3,203 13,668
3,677 16,088
3,169 18,646
3,886 21,056
3,005 23,690
.
.
.
.
.
.
13,385 119,461
4,751 169,454
1,776 219,377
2,240 448,687
As a first step in calculating a Gini index from such a table, let us consider how to
express the Lorenz curve, L(p). First, we define numbers
N=
n
i=1
hiand T=
n
i=1
xihi.
In words, Nis the size of the population and Tis the total amount of the good Q. With
this notation, the mean amount owned is T/N, which we call X.
In our example of 2006 U.S. family income, Nis about 116 million households, T
is almost 8 trillion dollars, and the mean income is X$66,570. The first entry in
the table corresponds to a percentile value of h1/N.02183, the poorest 2% of the
population. In general, the numbers
pj=1
N
j
i=1
hi
give us n(not necessarily equally spaced) points along the p-axis between 0 and 1.
For convenience, we define p0=0.
The Lorenz curve is easily calculated for these particular values of p:L(0)=0and
L(pj)=1
T
j
i=1
xihi,1jn,(2)
because this is the fraction of the total earned by the poorest fraction pj.
854 c
THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
The simplest way to fill in the rest of the Lorenz curve is by linear interpolation.
When we do this, we are assuming that every one of the hjunits with mean amount
xjhas an equal share in that amount. As we examine in more detail later, this always
results in an underestimate of the Gini index, though a small one if our aggregation
uses a fine partition.
It is a little cumbersome to calculate the Gini index (1) by direct integration from
(2). Instead, we will reinterpret that equation, uncovering a surprising connection to
probability density functions.
With a little algebra, we rewrite (2) in terms of the percentile variable and recognize
the result as a Riemann sum:
L(pj)=
j
i=1
xi
T/N(pipi1)
=pj
0
s(p)dp,(3)
where
s(p)=xj
X,for pj1<ppj.(4)
We call the function s(p)the share density, because it tells us what share of the
whole is owned by the portion of the population that falls in a given percentile range.
In our example, the income of the poorest 2% is about 0.0044 of the mean income,
while the group at the top, above the 99th percentile, have a share that is about 6.74
times the mean.
It seem reasonable to use (3) to define the Lorenz curve for every value of p, not
just the numbers pj, which accomplishes the linear interpolation we spoke of before.
In fact, we prefer to think of the share density as the primary object here, from which
details of the Lorenz curve can be derived. One result of this approach is that the
nondecreasing nature of s(p)establishes L(p)as a convex function. For instance, the
following inequalities demonstrate midpoint convexity for any p1<p2:
p1+p2
2
p1
s(˜p)d˜pp2
p1+p2
2
s(˜p)d˜p,so Lp1+p2
2L(p1)+L(p2)
2.
We acknowledge that aggregation of data always leads to some error in using (3) to
define the Lorenz curve, and hence in the Gini index from which it is computed. We
address this briefly in a later section. It has also been treated widely in the economics
literature. Indeed, the concept of share density is implicit in the work of Gastwirth
from 1971 [5].
It may happen that a Lorenz curve for a given situation is known, or proposed
theoretically as a function of some particular type [6]. In such a case, we could define
the share density, perhaps only almost everywhere, as
s(p)=d
dp L(p). (5)
Since s(p)0and1
0s(p)dp =L(1)=1, the function s(p)fits the requirements
of a probability density function (pdf). What experiment would lead to a random vari-
able that has this pdf? We propose the following:
December 2010] THE GINI INDEX AND MEASURES OF INEQUALITY 855
Pick a dollar earned by a U.S. household at random, assuming that every dollar
is equally likely to be chosen. Record the value of the percentile variable, p,of
the person who earned that dollar.
For this experiment, pis a random variable with density s(p). To see this, look
at (4). For instance, the probability that a dollar chosen at random was earned in the
percentile range from pj1to pjis exactly the fraction of those dollars in proportion
to the whole, which is
xjhj
T=xj
T/N(pjpj1)=pj
pj1
s(p)dp.
Also, we note that a share density of s(p)1 would indicate a perfectly equitable
distribution, in which case each dollar has an equal chance of being earned in all per-
centiles.
The share density earns its keep from the following computation, in which we sub-
stitute the integral form of the Lorenz curve into the definition of the Gini index (1)
and then switch the order of integration:
G=21
0
[pL(p)]dp
=121
0p
0
s(˜p)d˜pdp
=121
0
(1−˜p)s(˜p)d˜p
=21
0
ps(p)dp 1.(6)
This last integral is simply the expected value of our random variable with density
s(p).Weuse pfor this expected value and call it the percentile of the average dollar
earned. This proves a theorem that gives our first interpretation of the Gini index:
Theorem 1. Suppose G is the Gini index associated with the Lorenz curve L (p)and
the share density is defined by s(p)=L(p)almost everywhere. Let p be the expected
value of the random variable on [0,1]whose density function is s(p). Then G and p
are related by
G=2p1and p=G+1
2.(7)
Let us apply this to our examples: The average dollar earned in the U.S. in 2006
was earned at a percentile level of (.47 +1)/2, or above the 73rd percentile. For the
Yankees, with salary Gini .57, the average dollar comes in above the 78th percentile.
In my opinion, this gives a more visceral fact to share with the general public than just
the value of an index that ranges between 0 and 1.
Calculating Ginis. Interpreting the Gini index in terms of the average dollar earned
is also key to calculations. To compute p, we break up the integral into pieces where
856 c
THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
s(p)is constant:
p=1
0
ps(p)dp
=
n
j=1
xj
Xpj
pj1
pdp
=
n
j=1
xj
X
pj+pj1
2pj,(8)
where, as usual, pj=pjpj1. This form is ideal for entering into a spreadsheet
and it was this version that produced the values reported for the examples.
We can rearrange (8) as
G=1
T
n
j=1
xj(pj+pj1)hj1,(9)
for the Gini index of the Lorenz curve obtained by linear interpolation of the data. With
some work, this last expression for Gcan also be derived directly from the definition
of the Gini index by applying the trapezoid rule to find the area under the piecewise
linear Lorenz curve.
4. THE LOWER OF TWO INCOMES. Consider this experiment: Pick two house-
holds in the U.S. and record the lower of their two incomes; call the result Y, a random
variable that takes values in [0,). An amusing computation shows that the expected
value of Y, in ratio to the mean income, is the complement of the Gini index relative
to 1. In symbols,
Y/X=1G.(10)
To prove this, we need to know the pdf for Y. This in turn requires the pdf for X,
the random variable recording the income of a single household. This pdf is not so
directly available from the data, as presented in government statistics. That data could
be interpreted as requiring point probability masses placed at each aggregated mean
income. In other words, we could place a point mass of 2,533/Nat income X=$295
and a point mass of 2,240/Nat income X=$448,687. This would be both inaccurate
and mathematically cumbersome.
Instead, let us work theoretically, assuming a known piecewise smooth density func-
tion f(x)(0 x<) for the random variable X. Knowing fallows us to compute
the cumulative density function (cdf) for X:
F(x)=P(X<x)=x
0
f(˜x)d˜x.
A standard computation in order statistics gives a cdf for Y,H(x), as follows:
H(x)=P(Y<x)=1P(Y>x)
=1P(first income >x)·P(second income >x)
=1(1F(x)) ·(1F(x)).
December 2010] THE GINI INDEX AND MEASURES OF INEQUALITY 857
This gives the pdf for Yas H(x)=2f(x)(1F(x)),since F(x)=f(x)(almost
everywhere). Therefore, the expected value of Yis
Y=
0
2xf(x)(1F(x)) dx.(11)
Let us connect this to the Gini index. The percentile variable pis easily related
to the cdf for X. For a specific value of x, the probability that an income chosen at
random is less than xis exactly the size of the fraction of the population earning less
than x. In symbols, this means that
F(x)=x
0
f(˜x)d˜x=p,
which means that we can express xin terms of pand pin terms of x. The share density
s(p)is simply x/X, recording the proportion of the mean income at percentile level p.
We interpret (11) in terms of the variable p, using the substitution p=F(x),dp =
f(x)dx.
Y/X=
0
2(x/X)f(x)(1F(x)) dx
=1
0
2s(p)(1p)dp
=22p=22G+1
2=1G.(12)
This manipulation allows us to interpret any known Gini index in an approachable,
conversational way: Assuming that the Gini index for U.S. family income is .47, we
conclude that the lower of two U.S. family incomes, chosen at random, is about 53% of
the mean; on the average, the poorer of two families earns only about half the national
mean.
5. GINI ESTIMATES. Gini indices, in practice, must be computed from incomplete
data. It helps to have some idea of the errors introduced when we infer a Gini index
from partial data. This topic has received extensive treatment in the economics liter-
ature [6]; our self-contained treatment is meant to be accessible to mathematicians.
One important way in which we depart from practical concerns is that we consider
aggregate data, as for example in Table 1, as representing the exact averages for each
group reported. In other words, we do not consider the reality that this data contains
reporting errors. In this section, we make statements about the conclusions that can be
drawn from the given data, assuming that it is accurate.
We begin with the case where our knowledge is limited to a single nontrivial point
on a Lorenz curve. This is the situation of the first sentences of the paper. For instance,
knowing that 20% of the people on Earth consume 86% of the consumer goods, what
can we say about the Gini index?
Proposition 1. If G is the Gini index associated with a Lorenz curve L (p)and we
know that L (a)=b, where 0<b<a<1,then
abG<12b(1a). (13)
858 c
THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
Smaller upper bounds are possible, but less easy to state. For instance,
if b <1
2<a,then G 14b(1a).
Proof. The least possible area between y=pand y=L(p)occurs when L(p)is the
piecewise linear function shown in Figure 2, which simply connects (0,0)to (a,b)
andthento(1,1). In that case, the Gini index is easily calculated to be 1 (ab +
(1a)(1+b)) =ab.
The rather blunt upper bound in (13) arises from geometric considerations. In Figure
2,theareaofthebby 1 arectangle in the lower right corner is clearly off limits.
The estimate follows by removing twice this area from the highest possible Gini of 1.
(1,1)
(a,b)
(0,0)
L
(p)
p
Figure 2. Estimates for the Gini index obtained from a single data point are possible, but not especially
accurate.
The better estimate comes from recognizing that Lorenz curves must be convex.
Extreme behavior arises from linear functions, at the edge of convexity; some readers
may recognize that we are talking about support lines for the Lorenz curve.
To get the largest possible Gini index, we should cut the smallest possible area from
below y=x. It turns out that among all lines through (a,b), the one that forms the
smallest triangle (together with the x-axis and the line x=1) is parallel to the diago-
nal of the b-by-(1a)rectangle mentioned earlier. The only issue is whether this line
crosses y=x. As long as b<1/2<a, this line cuts a triangle of area 2b(1a)from
the large triangle, resulting in a Gini index of 1 4b(1a). (Alert readers may recog-
nize that we are talking about a discontinuous Lorenz curve with limp1L(p)<1,
which requires the share density to have a point mass at p=1. This models a situation
where a very small portion of the population has a very large share of Q.)
An economic interpretation of this proposition holds some interest. The lowest pos-
sible Gini estimate comes from assuming that a fraction bof the good Qis distributed
absolutely equally through the poorest fraction aof the population, with the remaining
portion shared equally among the remainder.
For instance, when we conclude from the estimate about consumer goods (where
L(.80)=.14) that .66 G,thevalue.66 would arise from the poorest 80% sharing
equally in 14% of the goods—an unlikely situation.
December 2010] THE GINI INDEX AND MEASURES OF INEQUALITY 859
The highest possible Gini index consistent with our single data point is 1
4(.14)(.2)=.888. This corresponds to a distribution under which 60% have no
goods at all, 40% have an equal share in 28%, and one person has all the remaining
72%—again, unlikely. (As mentioned earlier, this requires the share density to have
a point mass at p=1.) The range of the estimate is wide, but we can still say that
consumer goods are less equitably distributed than U.S. family income.
Golden [9] offers a related estimate, based on knowledge of the Lorenz curve at a
point that is known to be farthest from the line of perfect equity, in which case the
relevant support line has slope 1. This is done in the special case of quintile data.
Readers may wish to derive this estimate as an exercise.
Our simple estimate becomes more useful when we apply the same geometric ideas
to individual summands in our trapezoid rule approximation for the Gini index, (9). Let
us use GTto denote the Gini index for the Lorenz curve obtained by linear interpola-
tion of aggregated data and find estimates for the Gini index for the situation where we
have one data point for every individual in the population. In each term of the sum, the
simplest estimate we can make is that the Gini index could increase by twice the area
of a triangle with base pjpj1and height L(pj)L(pj1). Working out formulas
for these gives a potential positive contribution to the error of xjh2
j/(TN).Wehave
proved a proposition:
Proposition 2. If GTis the approximation for a Gini index from (9), the actual index,
G, satisfies
GTG<GT+1
TN
n
j=0
xjh2
j.(14)
In our computations for U.S. family income, the error is bounded by 0.042 and we
conclude that .467 G<.509.
For a more precise upper bound on the Gini index, we now focus on one particular
interval in which mean data has been given. In terms of the notation established earlier,
we are talking about the population between pj1and pj, where at first we assume that
1<j<n, leaving discussion of the first and last intervals for later.
Over this interval, the Lorenz curve has slope sj, which is intermediate between
the slopes on the left, sj1, and right, sj+1. This is shown in Figure 3 where segments
are labeled with their slopes. The most extreme Lorenz curve that still connects points
(pj1,L(pj1)) and (pj,L(pj)) consists of the two line segments shown, continuing
the line of slope sj1as far as the point where it meets the line of slope sj+1.Callthe
p-coordinate of this point of intersection p.
To understand this in economic terms, recall that in this portion of the population,
hjhouseholds have an average income of xj, which gives them a share density of
sj=xj/¯
X=xj/(T/N). (Although our notation favors the interpretation of family
income, the discussion applies to any situation.) To achieve the extreme Lorentz curve
mentioned, we could redistribute the hjxjdollars earned, taking away income from
one group to push a fraction ppj1down one bracket to have share density sj1,
pushing the remainder up to a share density of sj+1. Note that this results in one fewer
income bracket than the data original suggested.
The effect on the Gini index would be to increase it by twice the area of the triangle
in Figure 3.
860 c
THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
sj–1
sj+1
sj
pj–1 p* pj
b
Figure 3. Better estimates for the Gini index involve an interval-by-interval analysis.
A simple computation, writing the area as the difference of two triangles whose
base is labeled bin the diagram, shows the maximal increase in the Gini index to be
(sjsj1)(sj+1sj)
(sj+1sj1)(pjpj1)2=1
NT
(xjxj1)(xj+1xj)
(xj+1xj1)h2
j.(15)
At the lowest endpoint, there is no lower income bracket to push population into;
in the highest bin, there is no possibility to redistribute upward, so the estimates work
out a bit differently. Still, if we set x0=0andxn+1=xn, the formulas are correct.
Of course, pushing population from one bracket into neighboring brackets affects
the analysis of those regions; we cannot capture this extra area in every interval while
still maintaining a convex curve. Even so, we know that the Gini index cannot be
increased by more than twice the sum of areas of all these triangles. We have sketched
the proof of a theorem:
Theorem 2. If a Lorenz curve is generated from aggregate data, where the jth bin
consists of h jindividual units in possession of x junits of the resource Q, then the ac-
tual Gini index G, the one that would result from analyzing every individual’s portion,
must satisfy
GTG<GT+1
NT
n
j=1
(xjxj1)(xj+1xj)
(xj+1xj1)h2
j,(16)
wherewesetx
0=0and xn+1=xn.
Applying this analysis to U.S. family income from 2006 gives .467 G<.472,
showing that .47 is the correct Gini index to two-digit accuracy. (Remember that this
is a raw Gini index, ignoring the effect of taxes, Medicare, and Social Security.)
Efforts to find the best upper bound have shown this to be a complicated question.
Experiments attempting to minimize total area using a variable support line at each
data point suggest that the minimum is realized only at endpoints of the intervals of
possible slopes, except in simple cases like the example with a single point. I con-
jecture that the largest possible Gini index consistent with data from 2nbins arises by
December 2010] THE GINI INDEX AND MEASURES OF INEQUALITY 861
applying the redistribution method outlined above in alternate bins, pushing population
into neighboring bins to create nnew bins and a maximally less equitable distribution.
6. HIGHER ORDER GINIS. Atkinson [2] rightly pointed out in 1970 that the Gini
index is no universal measure of society. Sometimes it helps little in judging whether
one distribution of income is preferable to another. It is easy to give a mathematical
reason for this: Many Lorenz curves give rise to the same Gini index.
Atkinson uses utility functions to weight income disparities, asking those who
would judge inequity: Is extreme poverty more socially harmful than extreme wealth?
Of course, no single utility function will serve all purposes.
My response on reading Atkinson was to place the Gini index as first in a family of
indices, each weighting the percentile range differently. Introducing a weighting factor
of (1p)k1for k1 yields an index where extreme poverty is weighted more for
higher values of k. In fact, this idea originally appeared in a paper by Kakwani in
1982 [10], and many other economists and social scientists have taken the ball and run
with it.
We define the kth Gini index (or perhaps kth Gini poverty index) by the formula
Gk:= k(k+1)1
0
(pL(p)) ·(1p)k1dp.
The factor k(k+1), analogous to the 2 in Gini’s original definition, forces each index
in the sequence to lie between 0 and 1. Note that G1is simply Gfrom (1).
It is a simple matter to mimic earlier computations to produce a spreadsheet-friendly
formula to approximate Gkfrom aggregated data. Probabilists may recognize that we
are really talking about moments of the share density. An analog of (9) is
Gk=1(1/X)
n
j=1
xj(1pj1)k+1(1pj)k+1,(17)
though the resemblance may not be immediately evident.
Using (17) with data from U.S. family income suggests that G2is about .61. What
does this mean? To answer this, we return to order statistics.
As before, the random variable Xdenotes the income of a single household chosen
at random; the pdf for Xis f(x)and the cdf is F(x). Now we independently choose
k+1 households and record the lowest of their incomes as Ymin
k. A computation just
like the one that led to (11) shows that the expected value of Ymin
kis
Ymin
k=(k+1)
0
x(1F(x))kf(x)dx.
The reasoning that led us to (12) in this case gives
Ymin
k
X=1Gk.(18)
If the second-order Gini index of U.S. family income is G2=.61, this means that,
on average, the lowest of three incomes randomly selected is only 39% of the mean
income.
We should mention that (18), though new to the author, appears in a paper by
Kleiber and Kotz [11]. We hope that MONTHLY readers find this presentation easier to
follow than anything in the extensive economics literature on the subject.
862 c
THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
7. CLOSING THOUGHTS. Higher-order Gini indices can be useful in calibrating
models. Social scientists (and authors of calculus texts) often model Lorenz curves
with a variety of Paret o functions, which are convex combinations of power functions,
such as
L(p)=ap +(1a)pb.
Since this model has two free parameters, aand b, it is natural to calibrate it to match
values of G1and G2derived from data. The degree to which this model fits the data
can be judged by the difference between the value of G3calculated from the model
and the one calculated from the data using (17).
Though it arose in the study of poverty, the Gini index is a flexible idea that deserves
to be better known. As mentioned in the introduction, scientists in many fields [1, 7]
have found occasion to apply the Gini index. These are certainly not the only examples,
and the reader may enjoy finding others.
A search of the literature turns up many efforts to explain away the Gini index as
inaccurate or incomplete. It seems to me that none of these objections should pre-
vent us from using the Gini index to analyze data for ourselves and share the results
with those we know. It matters to me to know a few summary statistics, though they
be mere summary statistics, and to know how to relate them to average outcomes of
thought experiments about who earns the average dollar and about the poorer of two
households.
Our country is rich with diverse opinions about what one ought to do about eco-
nomic facts, but perhaps we are not sufficiently armed with facts that we have checked
for ourselves. I hope that this article will inspire you to dig into the mountains of data
that are available and refine some summary statistics for yourself.
ACKNOWLEDGMENTS. The author is grateful for the support that the MAA provides to editors, allowing
them to supervise a generous and thorough review process, which greatly improved this paper.
REFERENCES
1. R. Abraham, S. van den Bergh, and P. Nair, A new approach to galaxy morphology, I: Analysis of the
Sloan digital sky survey early data release, Astrophysical Journal 588 (2003) 218–229. doi:10.1086/
373919
2. A. B. Atkinson, On the measurement of inequality, J. Econom. Theory 2(1970) 244–263. doi:10.1016/
0022-0531(70)90039- 6
3. , On the measurement of poverty, Econometrica 55 (1987) 749. doi:10.2307/1911028
4. Central Intelligence Agency World Factbook, available at https://www.cia.gov/library/
publications/the-world- factbook/geos/us.html, accessed April 20, 2008.
5. J. L. Gastwirth, A general definition of the Lorenz curve, Econometrica 39 (1971) 1037–1039. doi:
10.2307/1909675
6. , The estimation of the Lorenz curve and Gini index, Rev. Econom. Statist. 54 (1972) 306–316.
doi:10.2307/1937992
7. D. Gianola, M. Perez-Enciso, and M. A. Toro, On marker-assisted prediction of genetic value: Beyond
the ridge, Genetics 163 (2003) 347–365.
8. C. Gini, Variabilit`
a e mutabilit`
a; reprinted in Memorie di Metodologica Statistica, E. Pizetti and T.
Salvemini, eds., Libreria Eredi Virgilio Veschi, Rome, 1955.
9. J. Golden, A simple geometric approach to approximating the Gini coefficient, Journal of Economic
Education 39 (2008) 68–77. doi:10.3200/JECE.39.1.68-77
10. N. Kakwani, On a class of poverty measures, Econometrica 48 (1980) 437–446. doi:10.2307/1911106
11. C. Leiber and S. Kotz, A characterization of income distributions in terms of generalized Gini coeefi-
cients, Social Choice and Welfare 19 (2001) 789–794. doi:10.1007/s003550200154
12. M. O. Lorenz, Methods of measuring the concentration of wealth, J. Amer. Statist. Assoc. 9(1905) 209–
219.
December 2010] THE GINI INDEX AND MEASURES OF INEQUALITY 863
13. C. DeNavas-Walt, B. D. Proctor, and J. Smith, Income, Poverty, and Health Insurance Coverage in the
United States: 2006, Current Population Report, U.S. Census Bureau, August, 2007.
14. U.S. Census Bureau, Current Population Survey (CPS), available at http://pubdb3.census.gov/
macro/032007/hhinc/new06_000.htm, accessed April 20, 2008.
15. B. H. Webster, Jr. and A. Bishaw, Income, Earnings, and Poverty Data From the 2006 American Com-
munity Survey, American Community Survey Report, U. S. Census Bureau, August, 2007.
FRANK A. FARRIS received his B.A. from Pomona College in 1977 and his Ph.D. from M.I.T. in 1981. He
has taught at Santa Clara University since 1984 and recently finished a second stint as editor of Mathematics
Magazine. His article “The Edge of the Universe” in Math Horizons was honored with the Trevor Evans Award.
Department of Mathematics and Computer Science, Santa Clara University, Santa Clara, CA 95053
ffarris@scu.edu
A Fourth Proof of an Inequality
Motivated by the paper “Three Proofs of the Inequality e<1+1
nn+0.5”[2],
we provide a fourth proof and many other refinements. By truncating the well-
known continued fraction [3, p. 342]
ln(1+x)=x
1+12x
2+12x
3+22x
4+22x
5+32x
6+···
,
we see that for x>0,
x>6x+x2
6+4x>··· >ln(1+x)>··· >6x+3x2
6+6x+x2>2x
2+x.
Dividing by ln(1+x), inverting, substituting 1/nfor x, and applying the expo-
nential function, we obtain
1+1
nn
<1+1
nn+3n
6n+1
<··· <e<··· <1+1
nn+3n+1
6n+3
<1+1
nn+1
2
.
A similar argument can be found in [1].
REFERENCES
1. M. D. Hirschhorn, Response to note 88.39, Math. Gaz. 89 (2005) 304.
2. S. K. Khattri, Three proofs of the inequality e<1+1
nn+0.5,Amer. Math. Monthly 117 (2010)
273–277. doi:10.4169/000298910X480126
3. H. S. Wall, Analytic Theory of Continued Fractions, AMS Chelsea Publishing, Providence, RI,
2000.
—Submitted by Li Zhou, Polk State College, Winter Haven, FL 33881
864 c
THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 117
... Através de uma conexão com a teoria de probabilidades, Farris [1] introduz um certo parâmetro, denominado dólar médio, que aponta novas interpretações para oíndice de Gini, permitindo aprofundar o entendimento doíndice como medida de desigualdade de uma outra perspectiva. Inspirados no trabalho de Farris [1], construímos neste artigo um procedimento baseado no conceito de curva de Lorenz e noíndice de Gini para medir o que chamamos de desempenho escolar, ou seja, as habilidades que as pessoas demonstram numa avaliação (ou exame) e cujos resultados são dados em forma de umaúnica nota. ...
... Através de uma conexão com a teoria de probabilidades, Farris [1] introduz um certo parâmetro, denominado dólar médio, que aponta novas interpretações para oíndice de Gini, permitindo aprofundar o entendimento doíndice como medida de desigualdade de uma outra perspectiva. Inspirados no trabalho de Farris [1], construímos neste artigo um procedimento baseado no conceito de curva de Lorenz e noíndice de Gini para medir o que chamamos de desempenho escolar, ou seja, as habilidades que as pessoas demonstram numa avaliação (ou exame) e cujos resultados são dados em forma de umaúnica nota. ...
... Seguindo Farris [1] escrevemos a equação (2) como uma soma de Riemann ...
... The data were accessed from the World Bank data portal, World Development Indicator, from 1998 to 2018, in which information collected on the Gini Index was used as a measurement, as well as a dependent variable. Moreover, being a popular measurement, the Index provides a convenient summary on degree of inequality measurement and simple illustration on the changes in equity at a given period over time (Haughton and Khandker, 2009;Farris, 2010). Furthermore, it satisfies one of the standards in inequality measurement which is 'Transfer Principle' (Trapeznikova, 2019). ...
Article
Full-text available
This paper aims to explore the relationship between growth in economic sectors, especially manufacturing, service, and agriculture, towards income inequality. Furthermore, it utilizes panel data for low-middle income ASEAN countries. The result shows that the share of agricultural sector in GDP has a significant and negative relationship with income inequality. In fact, the effect is robust for the incorporation of control variables. Therefore, it underlines the importance of agricultural sector development for reducing inequality and also for fostering ASEAN economic integration.
... Model EKC v současném kontextu je zaznamenán v celosvětovém měřítku [16], převážně však v Asii [17,18]. V postkomunistických zemích je situace nejlépe zmapována na Ukrajině [19]. ...
... This index is a measure of statistical dispersion intended to represent the income or wealth distribution of the people of a country. Also, it is often used as a gauge of economic inequality (Farris, 2010). terms of PM 10 and PM 2.5 has improved during COVID-19 pandemic lockdown in LAC (shown in Suppl. ...
Article
We have evaluated the spread of SARS-CoV-2 through Latin America and the Caribbean (LAC) region by means of a correlation between climate and air pollution indicators, namely, average temperature, minimum temperature, maximum temperature, rainfall, average relative humidity, wind speed, and air pollution indicators PM10, PM2.5, and NO2 with the COVID-19 daily new cases and deaths. The study focuses in the following LAC cities: Mexico City (Mexico), Santo Domingo (Dominican Republic), San Juan (Puerto Rico), Bogotá (Colombia), Guayaquil (Ecuador), Manaus (Brazil), Lima (Perú), Santiago (Chile), São Paulo (Brazil) and Buenos Aires (Argentina). The results show that average temperature, minimum temperature, and air quality were significantly associated with the spread of COVID-19 in LAC. Additionally, humidity, wind speed and rainfall showed a significant relationship with daily cases, total cases and mortality for various cities. Income inequality and poverty levels were also considered as a variable for qualitative analysis. Our findings suggest that and income inequality and poverty levels in the cities analyzed were related to the spread of COVID-19 positive and negative, respectively. These results might help decision-makers to design future strategies to tackle the spread of COVID-19 in LAC and around the world.
... We say that a graph is sparse if the number of edges is not larger than a quantity that scales linearly in the number of nodes, i.e. |E| ≤ αn with α ∈ (0, 1). There are other metrics to define sparsity, as the generalization of the Gini Index [81] for networks. We refer to [82] for an overview of the sparsity definitions adapted for networks. ...
Preprint
Full-text available
Interpersonal influence estimation from empirical data is a central challenge in the study of social structures and dynamics. Opinion dynamics theory is a young interdisciplinary science that studies opinion formation in social networks and has a huge potential in applications, such as marketing, advertisement and recommendations. The term social influence refers to the behavioral change of individuals due to the interactions with others in a social system, e.g. organization, community, or society in general. The advent of the Internet has made a huge volume of data easily available that can be used to measure social influence over large populations. Here, we aim at qualitatively and quantitatively infer social influence from data using a systems and control viewpoint. First, we introduce some definitions and models of opinions dynamics and review some structural constraints of online social networks, based on the notion of sparsity. Then, we review the main approaches to infer the network's structure from a set of observed data. Finally, we present some algorithms that exploit the introduced models and structural constraints, focusing on the sample complexity and computational requirements.
... Within the ambit of this study, the term "economic environment" refers to the overall wellbeing of a country's economy (Guilhoto et al., 2002;Tatiana & Valeriya, 2015;Buryk, 2016). To evaluate the economic environment of a country, six key economic indicators can be interpreted and analysed, namely (Kingdon & Knight, 2000;Schmitt-Grohé & Uribe, 2001;Mohr & Fourie, 2004;Farris, 2010;Račickas & Vasiliauskaitė, 2010;Akiba et al., 2012;Frumkin, 2015;Diener et al., 2018):co-authoring and interdisciplinary sharing (Bruwer et al., 2019;Herman & Nicholas, 2019;Tebes & Thai, 2018;Levin, et al., 2016;Racheria & Hu, 2010). Using the above as a basis, and within the ambit of this study, RECON is therefore conceptualised as follows: ...
... Since any population of people must have finite size n, the function L(x) as defined above would appear to be a discrete function on the set { k n : k ∈ Z and 0 ≤ k ≤ n}. However, in practice L is often made continuous on [0, 1] by linear interpolation [6]. ...
Preprint
The Gini index is a number that attempts to measure how equitably a resource is distributed throughout a population, and is commonly used in economics as a measurement of inequality of wealth or income. The Gini index is often defined as the area between the Lorenz curve of a distribution and the line of equality, normalized to be between zero and one. In this fashion, we define a Gini index on the set of integer partitions and show that it is closely related to the second elementary symmetric polynomial, and the dominance order on partitions. We conclude with a generating function for the Gini index, and discuss how it can be used to find lower bounds on the width of the dominance lattice.
... There are various studies on inequality measurement using the Gini coefficient (Farris, 2010;Yemtsov and Vijverberg, 2001;Thomas et al., 2001;Catalano et al., 2009;Wan, 2001).Though the Gini coefficient is widely used and popular, it does not meet the above mentioned six criteria. There are several measures of inequality that meets all six criteria. ...
Article
Full-text available
The present study examines the impact of non-farm income on poverty and expenditure inequality in rural Bangladesh using a nationally representative Household Income Expenditure Survey (HIES) 2010 data. This study used Horvitz-Thompson (HT) estimator of the Foster, Greer, and Thorbecke (FGT) indices to investigate the effect of non-farm income on poverty. The results reveal that the inclusion of non-farm income reduces the level, depth and severity of poverty in rural Bangladesh. In addition, poverty maps help the policymakers to identify the location of the relatively higher concentration of poor people. However, Gini, Theil's and Atkinson inequality measures show that the inclusion of non-farm income increased expenditure inequality among rural households. This research also employed a probit model for identifying the most significant factors associated with non-farm income participation of the rural households. The results imply that higher levels of education, greater flow of remittances, availability of electricity facilities and involvement in high return sector are likely to be effective in rising non-farm income at the rural household level. The policy implication of this study is non-farm income generating activities should be encouraged among rural households to reduce poverty and hence, to improve welfare and living standards of the rural households.
... To finely quantify and compare the sparsity patterns of saliency maps, we consider the Lorenz curves and Gini indices of them for analysis [47]. Let ϕ (1) ≤ ϕ (2) ≤ · · · ≤ ϕ (n) be the non-decreasing order statistics of a saliency map ϕ. ...
Article
Full-text available
In this paper, we propose an accurate and robust approach to salient region detection for 3D polygonal surface meshes. The salient regions of a mesh are those that geometrically stand out from their contexts and therefore are semantically important for geometry processing and shape analysis. However, a suitable definition of region contexts for saliency detection remains elusive in the field, and the previous methods fail to produce saliency maps that agree well with human annotations. We address these issues by computing saliency in a global manner and enforcing sparsity for more accurate saliency detection. Specifically, we represent the geometry of a mesh using a metric that globally encodes the shape distances between every pair of local regions. We then propose a sparsity-enforcing rarity optimization problem, solving which allows us to obtain a compact set of salient regions globally distinct from each other. We build a perceptually motivated 3D eye fixation dataset and use a large-scale Schelling saliency dataset for extensive benchmarking of saliency detection methods. The results show that our computed saliency maps are closer to the ground-truth. To showcase the usefulness of our saliency maps for geometry processing, we apply them to feature point localization and achieve higher accuracy compared to established feature detectors.
Article
This paper considers the impact of variables at three different levels-city, community and individual-on the depression of adults aged 45 years and above in China. Evidence shows that community factors, such as infrastructure and elderly centres, are critical to reduce depression but the effect of city-level factors such as lowering income inequality and improving public health investment is different for the segments of poor and non-poor as well as the rural and urban residents. This highlights the need to consider targeted policy mix options to avoid resource misallocation. Lastly, Chinese females' depression has worsened over time with ageing, particularly those who drink alcohol or are less educated are prone to depression prompting the need for specialist women health centres.
Article
Full-text available
The inequality e > (1+1/n)n is well known. In this work, we give three proofs of the inequality e > (1+1/n)n+0.5. For deriving the inequality, we use the Taylor series expansion and the Hermite-Hadamard inequality. In the third proof, we define a strictly increasing function which is bounded from above by 0.5.
Article
89.38 An identity involving binomial coefficients - Volume 89 Issue 515 - Michael D. Hirschhorn
Article
The article, written in 1973, examines what comparisons of income distributions can be made when Lorenz curves cross, employing the concept of third-order stochastic dominance.