Content uploaded by Koen Van de Moortel

Author content

All content in this area was uploaded by Koen Van de Moortel on Feb 20, 2022

Content may be subject to copyright.

The Body Mass Index recalculated

Koen Van de moortel, MSc experimental physics, independent math tutor

Jules de Saint-Genoisstraat 98, 9050 Gentbrugge, Belgium, info@lerenisplezant.be

15 june 2021 - small addition 20 february 2022

Abstract: The so-called ‘least squares regression’ for mathematical modeling is a widely used

technique. It’s so common that one might think nothing could be improved to the algorithm

anymore. But it can. By minimizing the squares of the differences between measured and

predicted values not only in the vertical, but also in the horizontal direction. I call this

‘multidirectional regression’. The difference is very significant, especially for power function

models, often used in biomedical sciences. And it makes the regression invariant if the

dependent and independent variables are switched. This was a neglected problem with the

traditional method. The Body Mass Index and the Corpulence Index and their correlation with

body fat percentage are studied here as an example showing that this regression technique

produces better results.

Keywords: nonlinear regression, mathematical modeling, multidirectional regression, curve

fitting, software, algorithm, body mass index, BMI, corpulence index, ponderal index, body fat

percentage, anthropometry, scaling law, power function.

Introduction: the BMI mystery

In the process of writing a book about measuring methodology and regression analysis, I

thought the so called “Body Mass Index” (BMI) might be a good example of quantization, how to

put a number on ‘overweight’. As you probably know, it is calculated by taking a person’s mass

m (in kg) and divide it by the square of the height h (in meters). Now, this is quite awkward,

since the masses of objects with the same shape and similar density distributions are

proportional with the

third

power of the height (or any longitudinal dimension).

So I started digging... Why did the inventor, Adolphe Quêtelet, who happens to have lived in the

same city as me (Ghent, Belgium), define this index with h² in 1832? I wanted to find the

original data that he analyzed, the ‘reference people’ to calibrate it. Strangely, there seems to

be no trace of them on the internet, and also no other dataset could be found! Thousands of

sites offer ‘ideal mass’ tables or calculators, and many use obscure disclosed formulas,

clearly not using h², some even using a linear relationship!

For me as a physicist, it’s hard to believe, but apparently it took almost a century before

someone (Fritz Rohrer, CH) came up with the idea to calculate the index with h³ anyway. This

number: m/h³ is then called ‘Corpulence Index’ (CI) or ‘Ponderal Index’ (PI) [Rohrer 1921]. It

took another century until someone (Sultan Babar, SA) found what was to be expected: “It has

the advantage that it doesn’t need to be adjusted for age after adolescence.” [Babar 2015] In

spite of that, the general public still only knows the BMI. Today, june 2021, a search on

Academia.edu on BMI produces more than 795000 results, while only 1735 articles mention the

CI, and 6428 the PI. It took until 2013 before someone like Nick Trefethen (numerical analyst at

the University of Oxford, GB) raised his eyebrows and dared to make this remark in The

Body Mass Index recalculated - Page 1 of 9

Economist: “The body-mass index that you (and the National Health Service) count on to

assess obesity is a bizarre measure. We live in a three-dimensional world, yet the BMI is

defined as weight divided by height squared. It was invented in the 1840s, before calculators,

when a formula had to be very simple to be usable. As a consequence of this ill-founded

definition, millions of short people think they are thinner than they are, and millions of tall

people think they are fatter.” And then he said: “You might think that the exponent should

simply be 3, but

that doesn’t match the data at all

. It has been known for a long time that

people don’t scale in a perfectly linear fashion as they grow. I propose that a better

approximation to the actual sizes and shapes of healthy bodies might be given by an exponent

of 2.5. So here is the formula I think is worth considering as an alternative to the standard

BMI: ‘new BMI’=1.3m/h2.5. [Trefethen, 2013, my emphasis]

Now, how could it “not match the data”? I was curious now to inspect some data myself. After

a long search, I came in touch with Nir Krakauer (The City College of New York), who was also

doing BMI-related modeling, and he was so kind to refer me to his data:

rdrr.io/github/dtkaplan/NIMBIOS/man/nhanesOriginal.html.

From this large collection, I extracted the masses and heights of 90 adult men who had a more

or less ‘ideal’ body fat percentage: between 11.6 and 13.8%. I’m not a medical doctor, but

according to different sources these percentages seem to be considered good for young

adults. The most important point for this selection was to have a more or less homogeneous

group with a range of sizes, but with similar densities. Of course I know other factors like

bone density and body type play a role as well, but this is the best I could do.

First, I put the data in the popular math program ‘GeoGebra’ (version 5 Classic):

This calculated the following relationship as ‘best fitting’: m = 18.8541 h2.1419.

Aha, that must have been the reason why Quêtelet decided to round the exponent of h to 2,

because the empirical value seems to be 2.1419! Then I realized that this program takes the

logarithms of the variables, in order to reduce the regression problem to a linear one. This

causes errors, as I illustrated elsewhere [Van de moortel 2021].

So I decided to put the data in my own software program, called ‘FittingKVdm’, which uses an

iterative algorithm to estimate the parameters. This produced: m = 19.331 h2.1084. Now the

exponent was even closer to 2! Strange!

GraphPad Prism 9.0.2, a program that seems well designed to me, and also uses iteration,

produced an identical result. Their writers also condemn the logarithm habit, by the way.

Still not being happy, I wanted to see the difference between a fit with a fixed exponent of 2

and one with exponent 3. The results: m = 20.5918 h2 and m = 11.4640 h3.

The value of 20.5918 is indeed a good BMI, and 11.4640 is close to the ‘good’ value of 12 for the

CI according to Sultan Babar.

Now, a picture is worth a thousand words, so I would like you to take a look on the mass

versus height graphs of both fitted curves (don’t mind if you can’t read the small letters, just

look at the dots and the lines):

Body Mass Index recalculated - Page 2 of 9

Which line visually fits the best through the cloud of points? Everyone I asked this question,

answered: the one on the right, obviously!

So now the question came up: is there something wrong with the regression method itself?

Well, there is definitely an asymmetry: the classical algorithm that everybody uses, minimizes

the sum of the (weighted)

vertical

distances between the measured (yi) and the predicted y

values f(xi). The weights are inversely proportional to the square of the measuring errors σy,i.

The parameters in the model function are adjusted in order to minimize this sum:

Sy f(x )

i i

2

y,i

2

i 1

n

Would it make any difference if we would use the sum of the

horizontal

distances? Why is it

not done? Well, in the case of non-invertible functions, especially periodical functions, there

are many such distances for every y value, but for a bijective function like the one above, it’s

perfectly possible. The simplest way to try it, is by switching the so called ‘independent’ and

the ‘dependent’ variable.

If the ‘best fit’ for our data, with free moving exponent, m = 19.331 h2.1084, was indeed the best fit,

it shouldn’t make any difference if we switched the h and m columns and fitted again, should

it? The expected outcome of this procedure would be:

hm

19.331 0.24534 m

1

2.1084 0.47429

Now, what was the actual outcome? h = 0.72877 m0.21394 Or, inverted: m = 4.38814 h4.67421

I double-checked it using GraphPad... same result. GeoGebra gave almost the same: h =

0.7225 m0.2159

This is not just a small difference, like a ‘rounding error’ or so. This is obviously shocking and

dramatic!

Is it possible that nobody ever noticed this? Or that nobody cared? Well, after a long search, I

found some people who made the same observation, but I couldn’t find any textbooks or

software manuals mentioning this problem.

I experimented with other data and other invertible functions. The same happens every time.

Body Mass Index recalculated - Page 3 of 9

Solution: multidirectional regression

The classical regression, minimizing the squares of the vertical distances (or residues ry,i, see

the graph below), seems to pull the line through a cloud of points too much horizontally.

Minimizing the horizontal distances rx,i (i.e. by switching x and y) pulls the line too much

vertically.

Because of symmetry reasons, there is no reason to favor one of both if f is a bijection.

Therefore it seems only logical to give the two ‘pulling forces’ equal rights, and minimize this

sum:

Sy f(x ) x f (y )

i i

2

i

1

i

2

y,i

2

x,i

2

i 1

n

I implemented this in a Windows program called ‘FittingKVdm, version 1.0’, and I would call it

‘multidirectional regression’, or shorter ‘xy-fitting’, abbreviated ‘MDLS’ (as opposed to ‘OLS’ =

‘Ordinary Least Squares’). It can be expanded in multiple directions of course, if there are

more variables.

Fitting the same data now, yielded:

m=ahb, with a=10.482±0.058*, b=3.1581±0.0092*.

That exponent b is a lot closer to 3, as we physicists always expected! And 3.1581 is ap-

proximately equal to the geometric mean of 2.1084 and 4.67421, which makes sense.

Now again switching the variables, we obtain an exponent of 1/3.1581 as expected by the

symmetry!

(*The confidence intervals are estimated by doing 100 fittings with the data+random noise with

the same magnitude as the probable error on the measurements, i.e. the xi values are

replaced by xi+g(σx,i) and yi+g(σy,i), g being a Gauss distributed random number function.)

Body Mass Index recalculated - Page 4 of 9

Confirmation: relationship with body fat percentage

The same analysis was done for men and women (aged 16 and more) with different fat

percentages, first with parameters a and b floating, then with a fixed value of b=2, so in that

case ‘a’ represents the classic ‘BMI’, and then with b fixed to 3, so in this case ‘a’ represents

the ‘BMI with h³’ aka ‘CI’.

Men a and b floating b=2 fixed b=3 fixed

fat % (f) n a b a=‘BMI’ a= ‘BMI3’=‘CI’

[10, 12[

[12, 14[

[14, 16[

[16, 18[

[18, 20[

[20, 22[

[22, 24[

[24, 26[

[26, 28[

[28, 30[

[30, 32[

[32, 34[

[34, 36[

[36, 38[

[38, 40[

[40, 42[

[42, 44[

[44, 46[

[46, 48[

[48, 50[

9

169

344

423

467

528

562

806

960

943

837

721

501

351

224

118

51

15

9

1

9.800

10.84

7.695

10.82

10.28

9.587

10.24

11.81

12.47

12.10

11.80

12.15

10.31

10.55

9.109

8.993

10.89

35.67

10.52

3.192

3.093

3.761

3.186

3.337

3.540

3.543

3.366

3.326

3.480

3.600

3.628

4.016

4.070

4.485

4.567

4.419

2.355

4.735

19.318 ± 0.017

20.5621 ± 0.0051

21.2382 ± 0.0043

21.4028 ± 0.0033

22.3881 ± 0.0038

23.2707 ± 0.0035

24.5169 ± 0.0030

25.8724 ± 0.0029

26.6912 ± 0.0022

28.0654 ± 0.0027

29.3692 ± 0.0035

30.8169 ± 0.0031

32.9015 ± 0.0041

34.6159 ± 0.0046

38.3387 ± 0.0063

39.8371 ± 0.0084

44.184 ± 0.014

43.407 ± 0.024

46.445 ± 0.032

43.711 ± 0.054

10.921 ± 0.012

11.4527 ± 0.0036

11.9307 ± 0.0029

12.0430 ± 0.0022

12.5053 ± 0.0022

13.0892 ± 0.0022

13.9356 ± 0.0022

14.5751 ± 0.0022

15.0336 ± 0.0019

15.8937 ± 0.0019

16.6075 ± 0.0021

17.4054 ± 0.0024

18.4909 ± 0.0030

19.5014 ± 0.0035

21.5662 ± 0.0041

22.4331 ± 0.0067

24.8267 ± 0.0099

25.308 ± 0.021

27.236 ± 0.024

24.836 ± 0.049

When a and b are left free to be adjusted, the fitting is not very stable. We can presume that

this is because many other factors besides the body fat percentage play a role, like the body

type, muscle weight etc. The measurement points form clouds rather than precise lines.

Anyhow, the exponents are almost always more than 3. The average is even 3.668, and the

standard deviation 0.579. With fixed exponents, we see a very nice correlation of a versus the

fat percentage. The ‘goodness of fit’ (indicated by the χ² per degree of freedom, but also by the

estimated errors on the fitted parameters) was always better with b=3 than with b=2.

Body Mass Index recalculated - Page 5 of 9

Women a and b floating b=2 fixed b=3 fixed

fat % (f) n a b a=‘BMI’ a= ‘BMI3’=‘CI’

[16, 18[

[18, 20[

[20, 22[

[22, 24[

[24, 26[

[26, 28[

[28, 30[

[30, 32[

[32, 34[

[34, 36[

[36, 38[

[38, 40[

[40, 42[

[42, 44[

[44, 46[

[46, 48[

[48, 50[

[50, 52[

[52, 54[

[54, 56[

[56, 58[

2

13

32

51

124

201

275

383

459

562

720

738

892

896

756

583

404

255

99

26

4

6.472

7.763

9.110

9.542

9.814

11.08

13.82

11.48

9.557

11.03

12.49

15.12

11.49

14.76

12.14

12.94

11.62

10.22

10.84

20.06

4.187

3.807

3.469

3.479

3.461

3.314

2.914

3.389

3.878

3.707

3.570

3.316

4.014

3.606

4.156

4.199

4.590

5.061

5.096

3.977

17.209 ± 0.031

18.000 ± 0.021

18.834 ± 0.011

18.869 ± 0.092

19.8013 ± 0.0077

20.4095 ± 0.0055

21.1685 ± 0.0054

21.6030 ± 0.0039

23.1021 ± 0.0046

24.5112 ± 0.0042

25.5553 ± 0.0031

26.9707 ± 0.0030

28.7631 ± 0.0030

30.6793 ± 0.0030

32.4786 ± 0.0032

34.6952 ± 0.0039

38.0436 ± 0.0049

41.1029 ± 0.0074

46.046 ± 0.011

47.842 ± 0.027

52.369 ± 0.048

10.927 ± 0.023

11.302 ± 0.015

11.5395 ± 0.0076

11.5032 ± 0.0072

12.0853 ± 0.0054

12.3724 ± 0.0037

12.9417 ± 0.0040

13.2474 ± 0.0027

13.9777 ± 0.0030

14.8582 ± 0.0028

15.6460 ± 0.0028

16.5106 ± 0.0025

17.6573 ± 0.0022

18.8671 ± 0.0023

19.8652 ± 0.0029

21.3284 ± 0.0034

23.2991 ± 0.0038

25.2133 ± 0.0057

28.013 ± 0.011

29.441 ± 0.020

32.142 ± 0.036

We can make the same remarks for the women. The average b is here even more: 3.859, and

the standard deviation 0.556.

If the body mass is better correlated with h³ than with h², as it was found, we would also

expect that the body fat percentage (f) is better correlated with the CI (=BMI3) than with the

classic BMI.

The graphs below show CI vs f for men (from the table above). The relationship is clearly not

linear, but it seems to follow an exponential pattern. The dotted lines are ‘worst case

scenarios', with parameters at the limits of their confidence intervals.

Body Mass Index recalculated - Page 6 of 9

The curve through the data points is the best fitting exponential function B=b af+c, with a, b and

c fitted parameters and B=‘classic BMI’ or ‘CI’ (CI in the above graph). The graphs are similar

for men and women, for BMI and CI, but with different parameters of course. They are listed

here:

Corpulence Index vs Body Fat % (men 16 and older)

Corpulence Index vs Body Fat % (women 16 and older)

Body Mass Index recalculated - Page 7 of 9

men aged 16 or more (n=8039) women aged 16 or more (n=7475)

B=classic BMI B=BMI3 = CI B=classic BMI B=BMI3 = CI

a 1.0424±0.0011 1.0433±0.0011 1.0671±0.0019 1.0687±0.0019

b 5.08±0.26 2.77±0.14 0.97±0.081 0.564±0.050

c 11.42±0.37 6.48±0.21 14.7±0.31 9.15±0.16

χ²x per degree of

freedom

1.79432 1.5959 0.581528 0.444813

χ²y per degree of

freedom

11953.5 6344.76 3654.17 2086.1

By comparing the χ² values (in both x and y directions), we see that the body fat percentage (f)

is better correlated with the CI than with the BMI.

We see that the Corpulence Index can be estimated from the body fat % (f), using:

‘CI’ = 2.77 1.0433f + 6.43 (men)

‘CI’ = 0.564 1.0687f + 9.15 (women)

Conclusion

I hope I have awakened your interest and you will be curious to test multidirectional

regression with your own data. Use it whenever the dependent and independent variables can

be switched. The necessary software ‘FittingKVdm’ can be downloaded from:

www.lerenisplezant.be/fitting.htm and a 25 day free trial is possible. More examples of the

method and some practical thoughts can be found in another text [Van de moortel, April 2021].

I also hope it has become clear now that there is no reason anymore to use the classic BMI.

The ‘BMI’ with h³, aka Corpulence Index should logically be a better estimator for all kinds of

health issues. Maybe the theoretical exponent of h should even be bigger than 3, as suggested

by the empiric evidence. Also the reason why the relationship between body fat and the CI

seems to be exponential, is a puzzle to be solved by biologists. Maybe they already did, but my

main point here is to remark that the correlation is better with m/h³ than with m/h².

All your remarks and suggestions are most welcome, of course.

Body Mass Index recalculated - Page 8 of 9

References:

Babar, Sultan (March 2015): “Evaluating the Performance of 4 Indices in Determining

Adiposity”. Clinical Journal of Sport Medicine, Lippincott Williams & Wilkins, 25 (2): 183.

Rohrer, Fritz (1921): “Der Index der Körperfülle als Maß des Ernährungszustandes”. Münchner

Med. WSCHR. 68: 580–582.

Trefethen, Nick (2013) on his own website: people.maths.ox.ac.uk/trefethen/bmi.html

Van de moortel, Koen (February 2021): “Non-linear regression - Why you shouldn’t take the

logarithms of your variables”, DOI: 10.13140/RG.2.2.18442.80324

www.researchgate.net/publication/349324179_Non-linear_regression_-Why_you_shouldn'

t_take_the_logarithms_of_your_variables

Van de moortel, Koen (April 2021): “Multidirectional regression analysis”, DOI:

10.13140/RG.2.2.16703.64162

www.researchgate.net/publication/350838636_Multidirectional_regression_analysis

Also published here:

http://www.physicsjournal.net/article/view/30/3-3-11

Conflict of interest:

The author declares no conflict of interest.

Body Mass Index recalculated - Page 9 of 9