Content uploaded by Christian Winkler

Author content

All content in this area was uploaded by Christian Winkler on Jan 29, 2019

Content may be subject to copyright.

RefCurv: A Software for the Construction of

Pediatric Reference Curves

Christian Winkler

University of Bonn

Katharina Linden

University of Bonn

Andreas Mayr

University of Bonn

Thomas Schultz

University of Bonn

Thomas Welchowski

University of Bonn

Johannes Breuer

University of Bonn

Ulrike Herberg

University of Bonn

Abstract

In medicine, reference curves serve as an important tool for everyday clinical practice.

Pediatricians assess the growth process of children with the help of percentile curves

serving as norm references. The mathematical methods for the construction of these

reference curves are sophisticated and often require technical knowledge beyond the scope

of physicians. An easy-to-use software for life scientists and physicians is missing. As

a consequence, most medical publications do not document the construction properly.

This project aims to develop a software that enables non-technical users to apply modern

statistical methods to create and analyze reference curves.

In this paper, we present RefCurv, a software that facilitates the construction of

reference curves. The software comprises functionalities to select and visualize data. Users

can ﬁt models to the data and graphically present them as percentile curves. Furthermore,

the software provides features to highlight possible outliers, perform model selection, and

analyze the sensitivity.

RefCurv is an open-source software with a graphical user interface (GUI) written in

Python. It uses Rand the gamlss add-on package (Rigby and Stasinopoulos (2005)) as

the underlying statistical engine.

RefCurv simpliﬁes the process to create percentile curves with a broad set of data pro-

cessing features. The tool can help to standardize the procedure and plan the acquisition

of data. An exemplary analysis of the robustness for the underlying statistical methods is

shown in case scenarios. Also, a method to design studies concerning the required sample

size and the model setting is demonstrated.

In summary, RefCurv is the ﬁrst software based on the gamlss package, which enables

practitioners to construct and analyze reference curves in a user-friendly GUI. In broader

terms, the software brings together the ﬁelds of statistical learning and medical applica-

tion. Consequently, RefCurv can help to establish the construction of reference curves in

other medical ﬁelds.

Keywords: reference curves, percentile curves, centile estimation, z-scores, LMS method, pe-

diatrics, echocardiography, open source, R,gamlss,Python.

arXiv:1901.09775v1 [stat.AP] 28 Jan 2019

2RefCurv: A Software for the Construction of Pediatric Reference Curves

1. Introduction

Reference curves and charts are standard tools to describe the normal range of a parameter.

In clinical practice, physicians use percentile curves (or z-score curves) to evaluate measured

values of patients. Comparing the measurement to a reference helps to quantify the severity

of the disease and diagnose the condition of a patient. In this context, percentile curves

have been established for most common physiological and anthropometric parameters. For

children, the curves can be used to assess the growth process. In literature, several reference

curves and charts for pediatric parameters are available. One prominent parameter is the

Body Mass Index (BMI) (Cole, Freeman, and Preece (1995); Fredriks, van Buuren, Wit, and

Verloove-Vanhorick (2000b)).

Figure 1shows an example of pediatric reference curves for the BMI, which we ﬁtted to

a dataset from a previous study (Fredriks, Van Buuren, Burgmeijer, Meulmeester, Beuker,

Brugman, Roede, Verloove-Vanhorick, and Wit (2000a)). Furthermore, there have been stud-

ies on weight, height and head circumference (Cole, Freeman, and Preece (1998); Group and

de Onis (2006); Cacciari, Milani, Balsamo, Spada, Bona, Cavallo, Cerutti, Gargantini, Greg-

gio, Tonini et al. (2006); Neuhauser, Schienkiewitz, Rosario, Dortschy, and Kurth (2013)).

Figure 1: Reference curves for BMI based on a dataset of healthy Dutch boys

(Fredriks et al. (2000a)). RefCurv was used to ﬁt a model to the data points and depict

it in the form of percentile curves. The labels indicate the percentiles, e.g. "P3" stands for

the third percentile.

Christian Winkler 3

Another broad application ﬁeld for reference curves are echocardiographical parameters

(Kobayashi, Fuse, Sakamoto, Mikami, Ogawa, Hamaoka, Arakaki, Nakamura, Nagasawa,

Kato et al. (2016); Dallaire and Dahdah (2011); Cantinotti, Kutty, Franchi, Paterni, Scalese,

Iervasi, and Koestenberger (2017)). Echocardiography has become an essential support for

cardiological examination in children. Due to its noninvasiveness and fast application, it has

been established as a standard technology in everyday clinical practice. Cardiologists use ref-

erence curves to detect cardiac pathologies and plan surgical treatments. A recent literature

review on this growing ﬁeld is given by Mawad, Drolet, Dahdah, and Dallaire (2013). In the

present paper, the focus is on the cardiological application of our software in children and

examples are based on echocardiographical measurements.

The mathematical methods for the construction of pediatric reference curves have been shaped

by the publications of Cole and Green (Cole (1990); Cole and Green (1992); Cole et al.

(1998)). In Cole (1990), the author proposes an algorithm for ﬁtting smooth curves to

data by using a penalized likelihood. Furthermore, Cole and Green describe the Box-Cox

Cole Green (BCCG) distribution for pediatric growth curves and show the application on a

dataset for BMI (Cole et al. (1995)). This approach has since been called the LMS method (or

Lambda-Mu-Sigma method) and has been applied in many studies (Mul, Fredriks, Van Bu-

uren, Oostdijk, Verloove-Vanhorick, and Wit (2001); Fredriks et al. (2000a); Katzmarzyk

(2004); Nysom, Mølgaard, Hutchings, and Michaelsen (2001); Ataei, Hosseini, Fayaz, Navidi,

Taghiloo, Kalantari, and Ataei (2016); Hirschler, Molinari, Maccallini, Hidalgo, Gonzalez,

and de los Cobres Study Group (2016); Khadilkar, Ekbote, Chiplonkar, Khadilkar, Kajale,

Kulkarni, Parthasarathy, Arya, Bhattacharya, and Agarwal (2014)).

Subsequently, a program called LMSchartmaker was implemented by the group of Cole and

Green enabling practitioners to apply the LMS method. This tool was used by multiple stud-

ies but we found issues regarding the scientiﬁc practice. On the one hand, LMSchartmaker is

not open-source and there is not any description of the implementation. On the other hand,

a scientiﬁc publication and references are missing.

At the same time, Rigby and Stasinopoulos developed and implemented the Radd-on pack-

age for "Generalized Additive Models for Location Scale and Shape" (gamlss,Rigby and

Stasinopoulos (2005); Stasinopoulos, Rigby et al. (2007)). The gamlss package contains the

LMS method and algorithms by Cole and Green. In addition, it extends the method by pro-

viding other model classes and diagnostic tools to assess the ﬁtted reference curves. Unlike

LMSchartmaker, the gamlss package is open-source, scientiﬁcally well-documented and free

of charge. However, the usage of gamlss requires an intense study of related mathematical

methods and programming skills in R.

Despite the availability of multiple statistical methods, most medical publications do not doc-

ument the construction of reference curves properly or miss important information such as

details about the model selection. One reason might be the complex application of statistical

methods, which is a challenge for physicians and data analysts alike.

Researchers often cannot reproduce study results in the form of reference curves because

datasets are not published by the authors.

4RefCurv: A Software for the Construction of Pediatric Reference Curves

This project aims to develop RefCurv, an easy-to-use software for the construction of ref-

erence curves. With this tool, we want to enable non-technical users to create and analyze

percentile curves for clinical usage. Moreover, it was intended to help experts with the ad-

vanced analysis of reference curves. Likewise, we proposed features to plan the study design

such as estimating the required sample size.

RefCurv uses Rand the gamlss add-on package as the underlying statistical engine. Users

can apply the LMS method on their data or use a customized GAMLSS model for their cal-

culations. The graphical user interface (GUI) is written in Python using its features in data

visualization and processing. Users can deﬁne model settings in the GUI. This information

is passed on to an R-script using functions from the gamlss package. After computation,

the results are delivered back to the GUI. Functionalities for data selection, model selection,

and model validation are provided. The software is designed to simplify data processing and

model ﬁtting. With RefCurv, users are guided through a simple workﬂow from acquired data

to reference curves. The software is intended mainly for users without any speciﬁc program-

ming or mathematical skills.

Furthermore, an echocardiographical dataset was acquired in previous studies by our research

group. We use these data to demonstrate and explain the application of RefCurv. The given

examples can be considered as recommended steps for the construction of reference curves.

Christian Winkler 5

2. Methods

The main focus of RefCurv lies on the LMS method by Cole using the gamlss package in

Rfor the statistical computations. The model used for the LMS method is a special case

of a GAMLSS model. The model class is deﬁned by the Box-Cox Cole Green (BCCG)

distribution and penalized splines as smoothing for the distribution parameters L, M and S.

Penalized splines are implemented in Ras pb() function. Each penalized spline has a degree

of freedom (df) to be predeﬁned by the user. These three parameters (L_df,M_df and S_df)

are arguments of the pb() function and they are therefore deﬁned as hyperparameters. We

chose a setting of L_df = 0, M_df = 1 and S_df = 0 as the default model.

Consequently, the model ﬁtting according to the LMS method is implemented as followed:

LMS_model <- gamlss(y ~ pb(x, df = M_df),

sigma.formula = ~ pb(x, df = S_df),

nu.formula = ~ pb(x, df = L_df),

family = "BCCG",

method = RS(),

data = dataset_training)

The Rigby and Stasinopoulos algorithm, RS(), is used for the ﬁtting (Stasinopoulos et al.

2007).

The LMS method has been established as a standard procedure for pediatric reference curves.

Therefore, it is set as default for the model ﬁtting (Appendix A). Apart from that, the advance

settings in RefCurv allow the user to ﬁt a broader set of univariate GAMLSS models to the

data.

Details about the installation and software architecture of RefCurv are given in the appendix

(Appendix B). The statistical engine is the gamlss package and RefCurv consequently inherits

its limitations.

RefCurv’s model ﬁtting is based on a model class with a BCCG distribution. One limitation

is that this model class is developed and tested for positive data values only. Furthermore,

methods might be sensitive to outliers and model ﬁtting might fail if data is distributed

unevenly. These limitations and how to address them will be discussed in the application

section.

In this section, we will describe RefCurv in four parts. After giving information about the

repository and documentation (2.1), we will present its graphical user interface (2.2). Next,

RefCurv’s features and functions are presented (2.3). Finally, we will recommend steps for

the construction of reference curves (2.4).

2.1. GitHub repository and documentation

RefCurv is open source and currently available as version 0.4.2. The source code and binaries

are provided on GitHub: https://github.com/xi2pi/RefCurv

The related GitHub Wiki contains a quick guide and instructions for the application. The

example datasets, which were used in this paper, can be accessed through the software directly.

The repository will be kept up to date and news will be announced on GitHub. Developers

can exchange information in the related forum for issues.

6RefCurv: A Software for the Construction of Pediatric Reference Curves

In addition to the source code, we created a website (https://refcurv.com) and video tutorials

(https://vimeo.com/user93523411).

RefCurv has been developed and tested on Windows and Linux. More information about

package versions are given in Appendix B.

2.2. Graphical user interface

Figure 2shows RefCurv’s graphical user interface (GUI) consisting of a table viewer (left)

and a plot viewer (right). Users can select data columns in the table viewer and visualize

them in the plot viewer as scatter plot. In this example, we demonstrate the application on

an echocardiographical dataset. The end-systolic volume of the left ventricle (ESV) is plotted

against the age. The default model was ﬁtted and graphically depicted as curves in the plot

viewer. Each curve represents the percentile of the underlying distribution (3rd, 10th, 25th,

50th, 75th, 90th, 97th) and is labeled accordingly (e.g. "P3" stands for the third percentile).

Percentile curves can be easily converted into z-score curves.

Users can navigate in RefCurv through the toolbar at the top of the window. The toolbar

consists of a set of buttons for categories such as "Model" for the model processing. RefCurv

is a Multiple Document Interface application, meaning that users can adjust settings in sub-

windows for most functions.

Plot viewerTable viewer

Toolbar Reference curves

Figure 2: RefCurv’s graphical user interface with a table viewer (left) and a plot

viewer (right). Users can navigate through the features by using the upper toolbar.

Christian Winkler 7

2.3. Features

Import of data

RefCurv allows the import of data tables in the form of CSV ﬁles ("File" →"Load Data").

The following structure of the data table is required: columns contain measured variables,

while rows represent the cases. The ﬁrst row of the chart should be a header indicating the

name of the measured variables.

Subjects Variable 1 Variable 2

Subject 1 ... ...

Subject 2 ... ...

... ... ...

Table 1: Structure of the input table for RefCurv

After import, users can inspect the data in the table viewer as highlighted in Figure 2.

Data selection

After the data are loaded, RefCurv will provide functions to select data points and exclude

them in case they are considered as anomalies. For that, users can choose two variables in

the lower right drop down menu, one for the x-axis and one for the y-axis. Chosen columns

are highlighted in the table viewer. By clicking the "Plot" button, a scatter plot is created.

Data points are highlighted in the scatter plot when chosen in the table viewer. By checking

or unchecking the box in the table, subjects can be excluded or included respectively. Chosen

data serves as the training dataset and can be used for the model ﬁtting.

Figure 3: RefCurv’s model ﬁtting. The model ﬁtting window shows the model parame-

ters and gives a summary in the text output ﬁeld.

8RefCurv: A Software for the Construction of Pediatric Reference Curves

Model ﬁtting

For model ﬁtting, the selected data is passed as training dataset to the gamlss function. Users

can specify the hyperparameters in the model ﬁtting window ("Model" →"Model Fitting").

The hyperparameters for the LMS method are the degree of freedom (df) for the penalized

splines of L, M, and S. After the ﬁtting, a text output ﬁeld provides a summary of the ﬁtting

results. We recommend a value for df between 0 and 5 respectively. The eﬀect of diﬀerent

settings for the hyperparameters, L_df,M_df and S_df, on the resulting percentile curves

is shown in Figure 4. The higher the value for df of the penalized spline is, the higher the

ﬂexibility of the curves will be.

(a) L_df = 0, M_df = 1, S_df = 0 (b) L_df = 1, M_df = 2, S_df = 1 (c) L_df = 4, M_df = 4, S_df = 4

Figure 4: Model ﬁtting with diﬀerent settings for the hyperparameters, L_df,

M_df and S_df.

The plot viewer depicts resulting percentile curves after the computation. The text output

shows the output of the gamlss() function. The output gives information about the ﬁtting

results and diagnostic values such as the global deviance.

Advanced model ﬁtting

In the advanced model ﬁtting ("Model" →"Model Fitting (advanced)"), GAMLSS model

settings can be customized. Table 2shows a list of distributions and smoothing functions for

GAMLSS models.

Distribution Rfunction

Box-Cox Cole and Green BCCG()

Box-Cox power exponential BCPE()

Box-Cox-t BCT()

Smoothing function Rfunction

Cubic splines cs()

Polynomials poly()

Penalized splines pb()

Table 2: Distributions and smoothing functions for GAMLSS models.

A full list with distributions and smoothing functions are presented in Stasinopoulos et al.

(2007).

An example GAMLSS model with a BCCG distribution and cubic splines as smoothing

function is:

Christian Winkler 9

GAMLSS_model <- gamlss(y ~ cs(x, df = 1),

sigma.formula = ~ cs(x, df = 0)),

nu.formula = ~ cs(x, df = 0),

family = "BCCG",

method = RS(),

data = dataset_training)

In RefCurv, the model ﬁtting with this setting can be realized by typing the command to the

input text ﬁeld of the advanced model ﬁtting window (ﬁgure 5).

Figure 5: RefCurv’s advanced model ﬁtting.

The features model selection and sensitivity analysis are only available for LMS models.

GAMLSS models with other settings are so far not supported.

10 RefCurv: A Software for the Construction of Pediatric Reference Curves

Outlier detection

Outliers in the training dataset might have an adverse eﬀect on the model ﬁtting. Datasets

might contain outliers because of transcription errors, for instance. RefCurv oﬀers a fast

way to detect and analyze potential outliers. Users can decide to exclude individual outliers

consequently. The outlier detection is based on a model ﬁtting result. After a predeﬁned

Figure 6: RefCurv’s outlier detection.

model is ﬁtted, the residuals will be calculated. RefCurv’s outlier detection feature allows

highlighting data points regarding the residuals. Limits for highlighting data points can be

chosen individually. In the example in ﬁgure 6, we chose to set the limit to the 90% and 10%

("Setting" →"Outliers Setting"). As a result, data points above the 90th percentile curve and

below the 10th percentile curve are highlighted in yellow. Afterwards, users can deselect data

points that they consider as outliers. Residuals are added as column in the table so that a

quantitative assessment is possible.

Model selection

As shown before, the LMS method can have diﬀerent outcomes depending on the degree of

freedom (df) for the penalized splines of the three parameters L, M and S. The task of the

model selection is to ﬁnd an appropriate setting for the df and balance the trade-oﬀ issue be-

tween the goodness of ﬁt and complexity. Overﬁtting can be avoided with this step. RefCurv

provides two diﬀerent ways of model selection.

The ﬁrst model selection method uses the Bayesian Information Criterion (BIC) as decision

support for selection (Appendix C). A grid search is performed to ﬁnd the best model con-

cerning the BIC. The model selection window in RefCurv allows to set the limits for the df of

L, M and S. Default step size is set to 1. The output of the model selection is a list of models

ordered by BIC. The df setting of the model with the lowest BIC is considered as best for

the chosen dataset.

The second method for model selection is based on cross-validation. RefCurv uses the gamlss

Christian Winkler 11

Figure 7: RefCurv’s model selection. The range for df are set to L_df = 0,...,5; M_df

= 0,...,5; S_df = 0,...,5

function gamlssCV() for this task. Since datasets are often small in the ﬁeld of pediatrics, we

decided to implement a 10-fold cross-validation. The validation is performed on the training

dataset. For that, the dataset is split into ten folds. As a next step, the model is trained on

nine folds of the dataset, while the remaining part serves as a validation dataset. Afterwards,

the global deviance for the validation dataset is computed, which gives information about the

generalization error of the model. Stepwise, each of the ten folds has served as a validation

dataset. Finally, the overall generalization error is computed as the mean of the global de-

viances.

A cross-validation can be time-consuming due to its computational eﬀort. The model selec-

tion based on the BIC is faster and therefore computationally more eﬃcient. Furthermore,

RefCurv’s BIC method is automatized in the form of a grid search. For the practical appli-

cation, we currently recommend the BIC method as the model selection for users with little

statistical background knowledge.

Sensitivity analysis

In order to analyze the sensitivity of the ﬁtting method, RefCurv oﬀers a feature to add noise

to data points. This kind of uncertainty could be caused by measurement errors. Figure 8

shows the concept of the sensitivity analysis.

Users can choose single or multiple data points, which are depicted in black. The variations

∆yup and ∆ydown can be applied concerning the chosen response variable y. As a result,

there are three diﬀerent datasets (black, green, red), which will be used as training data. The

method then ﬁts a model and shows the percentile curves for each of the three cases. In ﬁgure

8the 50th percentile curve is depicted (black, green, red).

Figure 9shows an example of the sensitivity analysis in RefCurv. Chosen data points with

variation are highlighted in yellow. The values for the variation can be set by the user in the

text ﬁelds below.

This feature also allows to examine the inﬂuence of data points on the percentile curves. By

varying one data point, for example, we can analyze the eﬀect on the 50th percentile curve.

12 RefCurv: A Software for the Construction of Pediatric Reference Curves

x

y

P50

P50

P50

yt

Δyup

Δydown

Figure 8: Concept of the sensitivity analysis. A model is ﬁtted to each of the training

datasets (red, black, green), which symbolically consists of four data points in this ﬁgure.

The 50th percentile curve is shown for each of the three cases in the corresponding color.

Figure 9: RefCurv’s sensitivity analysis. Chosen data points with variation are high-

lighted in yellow. The curves represent the 50th percentile curve for the three cases: variation

up, variation down and no variation.

Christian Winkler 13

Model comparison

In the model comparison window, users can compare the percentile curves of two models. As

an example, models with diﬀerent settings for the df can be ﬁtted and compared afterwards

to analyze the eﬀect.

Reverse computation

For decision support, clinicians often use reference curves or charts from the literature. One

problem is that the distribution parameters (L, M, and S for the BCCG distribution) are

often missing. RefCurv’s reverse computation feature enables users to approximate L, M,

and S values for given reference curves. With this method, it is possible to express any

reference curve as a LMS model. To achieve that, the method ﬁts a BCCG distribution to

the reference curves. The results are the distribution parameters L, M, and S for each value

of the covariate. More mathematical details about this approach is given in the Appendix D.

Export

Resulting reference curves can be exported ("File" →"Save Reference Curves") as a graph

(all common graphical formats) or as a table (CSV ﬁle). The values for L, M, and S are

automatically exported so that percentiles or z-scores can be computed manually. For clinical

use, the values for L, M, and S are essential to compute the z-score of a new case using Cole’s

formula (Appendix A).

Z-Score/Percentile converter

Percentiles can be converted into z-scores and vice versa. It depends on the examination

which of both terms is used by the clinician. RefCurv oﬀers a converter to deal with both

deﬁnitions ("Calculator" →"Z-score/Percentile Converter"). In ﬁgure 10, we converted the

percentile value of 75 to a z-score. In that case, we receive a z-score of 0.67449 as the result.

Figure 10: RefCurv’s Z-score/Percentile converter

Z-Score calculator

Clinicians obtain percentile and z-score values of patients as a diagnostic parameter. These

values have to be computed with measured data and for a given reference curve. Currently,

there is a big number of web and smartphone applications to compute the z-score. With

RefCurv’s z-score calculator ("Calculator" →"Z-score Calculator"), it is possible to compute

z-score values of patients directly after the construction of reference curves.

Figure 11 shows an example where the z-score for the entered data point is 1.473.

14 RefCurv: A Software for the Construction of Pediatric Reference Curves

Figure 11: RefCurv’s z-score calculator. The z-score of the entered data point (x =

100, y = 40) is 1.473.

Monte Carlo Simulation

RefCurv’s Monte Carlo Simulation is a feature that could help researchers to design a study.

The goal is to plan the required sample size for the construction of reference curves. The

simulation is based on a GAMLSS model to create a random sample. Users can enter the

simulated sample size in this step.

Next, this simulated random sample can be used for the model ﬁtting and analyzing with

diﬀerent settings. Based on this approach, users gather information about the behavior of

the models ﬁtted to the created sample size. As a result, users might estimate an appropriate

sample size for the construction of reference curves.

Christian Winkler 15

2.4. Recommended steps for the modeling of pediatric reference curves with

the LMS method

The steps for constructing reference curves depend on the analyst’s choice. The data analyst

could choose the order: model selection, model model ﬁtting, outliers analysis. On the other

side, the outlier analysis could also be performed before.

Consequently, diﬀerent approaches can lead to diﬀerent results and none of these is objective or

ideal. However, an uniﬁed workﬂow can improve reliability, comparability and reproducibility.

1. Data Preparation

Data visualization

Outlier detection

2. Model Selection

Model class

Hyperparameter tuning

3. Model Fitting

Fitting of model parameters

4. Model Testing / Evaluation

Validation on test dataset

Training dataset

Model setting

GAMLSS model

Figure 12: Recommended steps for the modeling of pediatric reference curves

The gamlss package oﬀers diﬀerent constellations for the application steps of the LMS method.

Steps include model selection and cross-validation. The documentation is very comprehensive

(Stasinopoulos, Rigby, Heller, Voudouris, and De Bastiani (2017)). However, we found that

gamlss methods are applied in arbitrarily order. A guideline for practitioners seems to be

missing. We suggest here steps for the modeling of reference curves with the LMS method,

which can be achieved with RefCurv. Figure 12 highlights our four recommended steps.

1. Data preparation is the ﬁrst step of reference curve modeling. The data visualization

is crucial to get an overview of the data distribution. We recommend to depict data in

a scatter plot and use descriptive statistics to analyze the behavior. The dataset could

contain outliers that might have a negative eﬀect on the construction. By highlighting

possible outliers, users can reassess, ﬁlter and correct the data. We have to make sure

that the data serve as a good training set for the model ﬁtting.

16 RefCurv: A Software for the Construction of Pediatric Reference Curves

The output of this modeling step is the training dataset that can be used for the model

ﬁtting.

2. As a next step, we recommend to perform model selection. The task of this step is to

deﬁne the model class (distribution family, smoothing functions and hyperparameter),

which will be ﬁtted to the data. The decision for the model class should be based on

the data distribution, sample size and other data characteristics. Therefore, this step

requires experience with modeling.

The LMS method uses penalized splines and therefore belongs to the group of non-

parametric models. These models can be used if the amount of data is high and a-priori

knowledge about the data distribution is missing. In RefCurv, this class is set as default,

so that users do not have to deal with complicated model selection tasks.

Furthermore, the LMS method contains the hyperparameters,df_L,df_M and df_S.

These hyperparameters have to be tuned during the model selection.

3. The model ﬁtting follows after the hyperparameters have been found. In this step,

the model parameters are ﬁtted. For the LMS methods, the model parameters are the

vectors L, M and S. The ﬁnal result of this modeling step is a generalized LMS model

that describes the behavior of the data.

4. The last step is the model testing / evaluation. In this step, the model has to be

validated on an independent test dataset from the population. As a result, users can

compute the prediction error for this test dataset, which explains the quality of the

model.

Christian Winkler 17

3. Application

In this section, we will show how to apply RefCurv on an example dataset, which was acquired

in a previous study of our group (Krell, Laser, Dalla-Pozza, Winkler, Hildebrandt, Kececioglu,

Breuer, and Herberg (2018)). The dataset is accessible for users through the software. First,

we will highlight an example where we apply the recommended steps for modeling, which

were listed in the previous section. Second, we will go through a case scenario to emphasize

the advantages of RefCurv. Last, we will demonstrate how a study design in terms of sample

size can be planned.

3.1. Example

After loading the ﬁle ("Examples" →"Echo example"), users can observe the data of 351

healthy children in the table viewer. Measured variables are age, weight, height, end-systolic

volume (ESV), end-diastolic volume (EDV) and stroke volume (SV) of the left ventricle. The

left ventricle is one of the large chambers of the heart and cardiologists measure its volume

and shape with echocardiograms. Data from both genders were combined for this example.

1. Data preparation

First, we examined the data in the table viewer and plotted them as a scatter plot to analyze

the data distribution. Age and ESV were selected as variables in the main window. Selecting

data points in the table highlights them in the scatter plot (Figure 13 (a)).

As a next step, we highlighted possible outliers by ﬁtting a standard model (L_df = 0, M_df

= 1, S_df = 0) to the data. The limit for highlighting possible outliers was set to the 3rd and

the 97th percentile curve. In the interest of simpliﬁcation, all data below the 3rd percentile

curve and above the 97th percentile curve were deselected for this example (Figure 13 (b)).

Please note that only some of the highlighted data points - the ones that the analyst assesses

as abnormal - should be considered as outliers.

(a) Data visualization. (b) Outlier detection.

Figure 13: Data preparation. Data are visualized (a) and possible outliers are

highlighted (b).

18 RefCurv: A Software for the Construction of Pediatric Reference Curves

2. Model selection

In order to optimize the hyperparameters L_df,M_df, and S_df, RefCurv’s BIC model selec-

tion was performed. The range for each df was set to be 0 to 5.

The result of the model selection is shown in Figure 14. The model with setting M_df=4,

S_df=0 and L_df=0 had the lowest BIC (1940.633). This model was chosen as the best

model, and its settings were used in the model ﬁtting window to create the new prediction.

Figure 14: Model selection. The range for df are set to L_df = 0,...,5; M_df = 0,...,5;

S_df = 0,...,5

3. Model ﬁtting

The model was ﬁtted with the tuned hyperparameters (M_df=4, S_df=0, L_df=0).

Figure 15 shows the results of the model ﬁtting process.

Figure 15: Model ﬁtting. The df are set to M_df=4, S_df=0, and L_df=0

4. Model testing

As the last step, the model was validated by using the implemented 10-fold cross-validation

function. In the model validation window, the LMS-values, which were found through the

model selection process, were given (Figure 16). The cross-validated global deviance was

1902.871 for this case.

Christian Winkler 19

Figure 16: Model testing. We used a 10-fold cross-validation to determine the cross

validated global deviance of 1902.871 for the model. The global deviance for the training of

the last model during the cross validation (10th iteration step) was 1645.632.

3.2. Case scenario

In order to display the other features of RefCurv, we present a case scenario for the construc-

tion of reference curves. First, we studied the impact of reducing data points and creating a

gap in the age range. We also investigated the eﬀect of the data points on the sides (edges)

of the measuring range.

Figure 17: Case scenario. We reduced the number of data points (from left to right)

creating a gap in the data cloud.

In this scenario, data points from the training dataset were gradually excluded in the middle

of the data cloud. This resulted in a gap possibly causing computational problems. With this

procedure, the feasibility and robustness of the LMS method were tested. Figure 17 shows

the procedure. When the number of data points reached less than 274, the LMS method gave

unsatisfying reference curves with low smoothness as a result. A change of the hyperparam-

eters df did not help to improve the smoothness.

To solve this issue, we used the RefCurv’s "Advanced model ﬁtting". We deﬁned GAMLSS_model

20 RefCurv: A Software for the Construction of Pediatric Reference Curves

with the following setting:

GAMLSS_model <- gamlss(y ~ poly(x,2),

sigma.formula = ~ poly(x,1),

nu.formula = ~ poly(x,1),

family = "BCCG",

method = RS(),

data = dataset_training)

where poly(x) is the function for evaluating orthogonal polynomials. Figure 18 shows the

result of the ﬁtting.

Figure 18: Advanced model ﬁtting. We deﬁned a new gamlss model with poly(x)

functions for the curve ﬁtting.

Christian Winkler 21

3.3. Design of study

RefCurv’s Monte Carlo simulation can be used to visualize the impact of this sample size.

Let us take a look at the resulting reference curves from the example before (Figure 15). The

curves can be loaded into the simulation window and diﬀerent sample sizes can be created

for the simulation. We chose a sample size of 500 and reduced it to 100 (Figure 19).

(a) n = 500 (b) n = 100

Figure 19: Monte Carlo simulation. Diﬀerent sample sizes were created from a previ-

ously chosen model.

We continued with the lower sample size (n= 100), ﬁtted a LMS model with standard setting

and compared it to the original model (Figure 20). The diﬀerence of the 50th percentile

curve for both models was compared. It shows that the absolute diﬀerence is never bigger

than 1 milliliter. From these results, users could conclude that a sample size of 100 might be

suﬃcient to create percentile curves. We recommend to use similar analyses like computing

the diﬀerence of the other percentile curves to corroborate this assumption.

22 RefCurv: A Software for the Construction of Pediatric Reference Curves

Monte Carlo simulation

(n = 100)

Difference between

original model and

simulation

Original model

(Example 3.1.)

50th percentile

Figure 20: Model comparison. We used the model from example 3.1 and created a

sample (n= 100) by Monte Carlo simulation. This sample size served as training dataset to

ﬁt a model. The 50th percentile of both models is compared.

We illustrated that RefCurv’s features might help to design studies before data is acquired.

This can be achieved by going through a case scenario like the presented one. Questions about

the number of data points, data distribution, robustness of the LMS method and impact of

outliers can be analyzed in advance. Issues like missing data could be considered during the

planning.

4. Discussion

In the ﬁeld of medical research, physicians and life scientists miss an easy-to-use software for

the construction of reference curves. For the application of modern statistical approaches,

most methods such as the gamlss package are implemented in Rand require programming

skills. Furthermore, the number of steps for the statistical analysis is high and hampers a

quick analysis. Thus, there is a gap between the statistical methods and end-users. To ad-

dress this issue, we presented RefCurv, a software that enables the construction and analysis

of reference curves for children.

In this article, we focused on medical and particularly echocardiographical data where ref-

erence curves are broadly discussed. Dallaire and Dahdah (2011) presented a very detailed

analysis of modeling approaches. In their study, they focused on parametric regression models

and analyzed features such as goodness of ﬁt for the model and data distribution. A simi-

lar approach was presented by Kobayashi et al. (2016) while they also added nonparametric

regression models to their analysis. These studies have led to the necessity of developing a

Christian Winkler 23

tool, which can simplify and automate the computation. Our project focused on the LMS

method and GAMLSS models because of their good quality and reliability. In this context,

the gamlss package provides a broad set of distributions and smoothing functions.

With this project, we lay the foundation for further analysis of reference curves. RefCurv

helps to solve issues that were discussed in multiple articles before. Cantinotti, Scalese,

Franchi, Corana, Viacava, Assanta, Santoro, and Koestenberger (2018) propose, for example,

to develop a uniform approach to data normalization. The same authors developed an appli-

cation for smartphones, BabyNorm, which enables and simpliﬁes the use of medical reference

values in clinics (Cantinotti et al. (2017)). The advantage of BabyNorm is the possibility

to choose between diﬀerent published reference curves. Clinicians can compare patient data

to normal values, which are given in the journal articles. We found that a quality index for

the published reference curves is missing. Users mostly have to choose reference from studies

arbitrarily without knowing any details about the references values provided by the study,

such as a number of data points, statistical method or goodness of ﬁt. RefCurv can solve

this problem by oﬀering methods such as the cross-validation method to rank diﬀerent models.

The development process of RefCurv is ongoing in order to improve the functionality. An

automated computation of the sample size is planned. So far, the software was tested with

multiple datasets and is found to be stable. However, stability and convergence issues might

occur like stated in documentations of the gamlss package (Rigby and Stasinopoulos (2005)).

So far, the handling of negative data points has not been considered but will be considered

in future versions. In the future, RefCurv will be tested on large and highly distributed data

to ﬁnd out about limitations.

RefCurv can help to standardize procedures and plan the acquisition of data. The design

of the study can be planned in advance through exemplary case scenarios. For example,

researchers could simulate the impact of the sample size on their reference curves to ﬁnd out:

(i) the minimum number of data points required, (ii) the eﬀect of an increasing number of

data points, (iii) the correct choice of predictor for the curves, (iv) the necessity to stratify by

gender or other variables. The construction of percentile curves with RefCurv can determine

the impact of these parameters on their study results.

A fundamental problem in pediatrics is the low number of measurements because the data

acquisition is long lasting, expensive and diﬃcult. Consequently, sample sizes are often small,

which is discussed in multiple articles (Tanaka (1987); Cantinotti et al. (2017); Williams,

Thomson, Seto, Contopoulos-Ioannidis, Ioannidis, Curtis, Constantin, Batmanabane, Hartling,

and Klassen (2012)). A solution for this issue is to acquire data in multicenter studies such

as our example dataset (Krell et al. (2018)). Merging of data can be easily managed with

RefCurv and the eﬀect for diﬀerent training datasets on the resulting reference curves can be

quickly tested.

24 RefCurv: A Software for the Construction of Pediatric Reference Curves

4.1. Reuse potential

We demonstrated the software on echocardiographical data of children. Apart from pediatric

applications, reference curves play an important role in many other disciplines of medicine.

As an example, they could be used to describe the growth process of organs or the eﬀect of

a drug. Beyond that, every natural science or technical environment requires references in

order to analyze and improve processes. Thus, RefCurv’s application ﬁeld is ﬂexible.

The application of the LMS method have been proven to be valuable for pediatric reference

curves. However, model classes and distributions as listed in Stasinopoulos et al. (2007) are

accessible through the advanced model ﬁtting in RefCurv. Therefore, this software is ﬂexible

and can be adjusted according to research hypothesis, theory or purpose.

Due to its easy-to-use GUI, the application does not need any extended training but can be

applied quickly. After the installation, data visualization, model ﬁtting and reference curve

analysis are intuitive.

RefCurv uses GAMLSS models in a Python environment, which opens the door for combi-

nations with other Python software packages. The Simvascular project (Updegrove, Wilson,

Merkow, Lan, Marsden, and Shadden (2017)), for example, oﬀers methods to model the car-

diovascular system and provides a Python interface. Computation results in Simvascular are

however complex and diﬃcult to understand for physicians. RefCurv could help to translate

Simvascular’s output into reference curves, which are easy to understand for pediatric cardi-

ologists, for example. We currently work on a connection to the Simvascular framework.

Altogether, we can recommend this software for students and researchers of any ﬁeld, who

plan to construct reference curves. Likewise, the software can be used for educational purposes

at all levels. For clinicians, this tool can help to understand the underlying methods of the

construction of percentile curves and its challenges, such as the tuning of hyperparameters.

In science and especially in medical science, the usage of proprietary software with restricted

access to the code is unfortunately a standard practice. This issue makes it hard for re-

searchers to understand and reproduce the results of other publications. Also, it restricts

the scientist from sharing and contributing to other works. This project is entirely open-

source and the source code was released under GPLv3. We encourage other working groups

to develop RefCurv and share their knowledge about reference curves so that the scientiﬁc

community can proﬁt from its value.

Christian Winkler 25

4.2. Conclusion

In this paper, we presented RefCurv, a software package enabling to construct reference

curves. The software uses the statistical methods of the gamlss package in R and provides

a user-friendly GUI for data visualization written in Python. Combining both packages,

RefCurv provides a clear structured workﬂow from data to reference curves. The main fea-

tures of this software are the model ﬁtting, model selection, sensitivity analysis and model

validation.

In the present article, we showed exemplarily how RefCurv can improve the application of

GAMLSS models. As a result, this package can now also be used by physicians and non-

technicians.

Due to these advantages, RefCurv could help improving clinical studies to reduce time and

costs. We showed how to systematically design studies according to sample size, subject group

and medical parameters. In conclusion, a well-designed plan can help to create high-quality

reference curves.

Acknowledgments

This study is funded by Fördergemeinschaft Deutsche Kinderherzzentren e.V.

We thank Rupert Hammen and Jochen Kunkel for their help in testing RefCurv.

26 RefCurv: A Software for the Construction of Pediatric Reference Curves

References

Ataei N, Hosseini M, Fayaz M, Navidi I, Taghiloo A, Kalantari K, Ataei F (2016). “Blood

pressure percentiles by age and height for children and adolescents in Tehran, Iran.” Journal

of human hypertension,30(4), 268.

Cacciari E, Milani S, Balsamo A, Spada E, Bona G, Cavallo L, Cerutti F, Gargantini L,

Greggio N, Tonini G, et al. (2006). “Italian cross-sectional growth charts for height, weight

and BMI (2 to 20 yr).” Journal of endocrinological investigation,29(7), 581–593.

Cantinotti M, Kutty S, Franchi E, Paterni M, Scalese M, Iervasi G, Koestenberger M (2017).

“Pediatric echocardiographic nomograms: what has been done and what still needs to be

done.” Trends in cardiovascular medicine,27(5), 336–349.

Cantinotti M, Scalese M, Franchi E, Corana G, Viacava C, Assanta N, Santoro G, Koesten-

berger M (2018). “Why Use Percentiles and Not Z Scores to Calculate Pediatric Echocardio-

graphic Nomograms? The Need for a Uniform Approach to Data Normalization.” Journal

of the American Society of Echocardiography.

Cole TJ (1990). “The LMS method for constructing normalized growth standards.” European

journal of clinical nutrition,44(1), 45–60.

Cole TJ, Freeman JV, Preece MA (1995). “Body mass index reference curves for the UK,

1990.” Archives of disease in childhood,73(1), 25–29.

Cole TJ, Freeman JV, Preece MA (1998). “British 1990 growth reference centiles for weight,

height, body mass index and head circumference ﬁtted by maximum penalized likelihood.”

Statistics in medicine,17(4), 407–429.

Cole TJ, Green PJ (1992). “Smoothing reference centile curves: the LMS method and penal-

ized likelihood.” Statistics in medicine,11(10), 1305–1319.

Dallaire F, Dahdah N (2011). “New equations and a critical appraisal of coronary artery Z

scores in healthy children.” Journal of the American Society of echocardiography,24(1),

60–74.

Fenton T, Sauve R (2007). “Using the LMS method to calculate z-scores for the Fenton

preterm infant growth chart.” European journal of clinical nutrition,61(12), 1380.

Fredriks AM, Van Buuren S, Burgmeijer RJ, Meulmeester JF, Beuker RJ, Brugman E, Roede

MJ, Verloove-Vanhorick SP, Wit JM (2000a). “Continuing positive secular growth change

in The Netherlands 1955–1997.” Pediatric research,47(3), 316.

Fredriks AM, van Buuren S, Wit JM, Verloove-Vanhorick S (2000b). “Body index measure-

ments in 1996–7 compared with 1980.” Archives of disease in childhood,82(2), 107–112.

Group WMGRS, de Onis M (2006). “WHO Child Growth Standards based on length/height,

weight and age.” Acta paediatrica,95, 76–85.

Hirschler V, Molinari C, Maccallini G, Hidalgo M, Gonzalez C, de los Cobres Study Group SA

(2016). “Waist circumference percentiles in indigenous Argentinean school children living

at high altitudes.” Childhood Obesity,12(1), 77–85.

Christian Winkler 27

Katzmarzyk P (2004). “Waist circumference percentiles for Canadian youth 11–18 y of age.”

European journal of clinical nutrition,58(7), 1011.

Khadilkar A, Ekbote V, Chiplonkar S, Khadilkar V, Kajale N, Kulkarni S, Parthasarathy L,

Arya A, Bhattacharya A, Agarwal S (2014). “Waist circumference percentiles in 2-18 year

old Indian children.” The Journal of pediatrics,164(6), 1358–1362.

Kobayashi T, Fuse S, Sakamoto N, Mikami M, Ogawa S, Hamaoka K, Arakaki Y, Nakamura

T, Nagasawa H, Kato T, et al. (2016). “A new Z score curve of the coronary arterial

internal diameter using the lambda-mu-sigma method in a pediatric population.” Journal

of the American Society of Echocardiography,29(8), 794–801.

Krell K, Laser KT, Dalla-Pozza R, Winkler C, Hildebrandt U, Kececioglu D, Breuer J,

Herberg U (2018). “Real-Time Three-Dimensional Echocardiography of the Left Ven-

tricle—Pediatric Percentiles and Head-to-Head Comparison of Diﬀerent Contour-Finding

Algorithms: A Multicenter Study.” Journal of the American Society of Echocardiography,

31(6), 702–711.

Mawad W, Drolet C, Dahdah N, Dallaire F (2013). “A review and critique of the statistical

methods used to generate reference values in pediatric echocardiography.” Journal of the

American Society of Echocardiography,26(1), 29–37.

Mul D, Fredriks AM, Van Buuren S, Oostdijk W, Verloove-Vanhorick SP, Wit JM (2001).

“Pubertal development in the Netherlands 1965–1997.” Pediatric research,50(4), 479.

Neuhauser H, Schienkiewitz A, Rosario AS, Dortschy R, Kurth BM (2013). “Referenzperzen-

tile für anthropometrische Maßzahlen und Blutdruck aus der Studie zur Gesundheit von

Kindern und Jugendlichen in Deutschland (KiGGS).”

Nysom K, Mølgaard C, Hutchings B, Michaelsen KF (2001). “Body mass index of 0 to 45-

y-old Danes: reference values and comparison with published European reference values.”

International journal of obesity,25(2), 177.

Rigby RA, Stasinopoulos DM (2005). “Generalized additive models for location, scale and

shape.” Journal of the Royal Statistical Society: Series C (Applied Statistics),54(3), 507–

554.

Stasinopoulos DM, Rigby RA, et al. (2007). “Generalized additive models for location scale

and shape (GAMLSS) in R.” Journal of Statistical Software,23(7), 1–46.

Stasinopoulos MD, Rigby RA, Heller GZ, Voudouris V, De Bastiani F (2017). Flexible re-

gression and smoothing: using GAMLSS in R. Chapman and Hall/CRC.

Tanaka JS (1987). “" How big is big enough?": Sample size and goodness of ﬁt in structural

equation models with latent variables.” Child development, pp. 134–146.

Updegrove A, Wilson NM, Merkow J, Lan H, Marsden AL, Shadden SC (2017). “SimVascular:

An open source pipeline for cardiovascular simulation.” Annals of biomedical engineering,

45(3), 525–541.

Williams K, Thomson D, Seto I, Contopoulos-Ioannidis DG, Ioannidis JP, Curtis S, Con-

stantin E, Batmanabane G, Hartling L, Klassen T (2012). “Standard 6: age groups for

pediatric trials.” Pediatrics,129(Supplement 3), S153–S160.

28 RefCurv: A Software for the Construction of Pediatric Reference Curves

A. The LMS method by Cole

The LMS method is a special case of a generalized additive model and was originally proposed

by Cole (1990). In summary, the approach can be deﬁned by univariate nonparametric GAM.

Let Y= (y1, y2,...yn)T,∀yi>0be a positive random variable with nobservations. The ex-

planatory variable is deﬁned by X= (x1, x2,...xn)T. The model is deﬁned by the parameters

L,Mand S. While Lis considered as skewness parameter, Sis deﬁned as scale parameter

and Mlocation parameter.

Yshould yield a Box-Cox Cole Green (BCCG) distribution denoted by BCCG(M,S,L). A

transformed random variable Zis given by

Z=

1

LS "Y

ML

−1#,if L6= 0

1

Slog Y

M,if L= 0

(1)

for 0< Y < ∞, where M > 0,S > 0and −∞ < L < ∞, and where the random variable Z

is assumed to follow a truncated standard normal distribution.

The probability density function for one observation yand its transform zis given by

fY(y) = yL−1exp −1

2z2

MLS√2πΦ1

S|L|(2)

where Φ() is is the cumulative distribution function (cdf) of a standard normal distribution.

Figure 21 shows the probability density function for diﬀerent values of L, M, and S.

Figure 21: The probability density function fY(y)for the BCCG distribution

with diﬀerent values for L, M, and S. Parameter values: (a) L = 1, M = (40, 45, 50), S

= 0.1; (b) L = (1, 10, 15), M = 45, S = 0.1; (c) L = 1, M = 45, S = (0.08, 0.1, 0.14).

Choosing BCCG, the additive model has the form

Christian Winkler 29

M=h1(x)

log(S) = h2(x)

L=h3(x)

(3)

where hi() (for i= 1,2,3) are non-parametric smoothing functions. Originally, cubic splines

cs() have been used as smoothing functions. As alternative to the classic approach, penal-

ized splines were introducted by Eilers and Marx (1996). Penalized Splines (or P-splines)

are piecewise polynomials deﬁned by B-spline basis functions in the explanatory variable,

where the coeﬃcients of the basis functions are penalized to guarantee suﬃcient smoothness

(Stasinopoulos, 2007). The (gamlss) package oﬀers the function pb() for ﬁtting penalized

splines where df is the desired equivalent number of degrees of freedom.

The the model with the non-parametric functions hk(k= 1,2,3) is ﬁtted by maximizing the

penalized log likelihood function lp, which is deﬁned as

lp=ld−1

2

3

X

k=1

λkZ∞

−∞

h00

k(x)dx

=ld−1

2λ1Z∞

−∞

h00

1(x)dx −1

2λ2Z∞

−∞

h00

2(x)dx −1

2λ3Z∞

−∞

h00

3(x)dx

(4)

where h00

i(x)is the second derivative of hi(x)with respect to x.λ1,λ2, and λ3are smoothing

parameters, which have to be predeﬁned.

The likelihood function of the data is

ld=

n

X

i=1

li(5)

and liis the log likelihood function of observation yiwhich can be computed with (2). The

penalized log likelihood function (4) is maximized iteratively using either the RS() algorithm

(Rigby and Stasinopoulos (2005)) or CG() algorithm (Cole and Green), which in turn uses a

backﬁtting algorithm to perform each step of the Fisher scoring procedure.

In summary, the LMS method can be applied to a training dataset dataset_training by using

the following piece of code:

LMS_model <- gamlss(y ~ pb(x, df = M_df),

sigma.formula = ~ pb(x, df = S_df),

nu.formula = ~pb(x, df = L_df),

family = "BCCG",

method = RS(),

data = dataset_training)

30 RefCurv: A Software for the Construction of Pediatric Reference Curves

B. RefCurv - Installation and Software Architecture

RefCurv is currently available as version 0.4.2 for Windows (32-bit) and Linux. You can ﬁnd

installation instructions for all systems on https://refcurv.com. The source code for each

version can be found in the GitHub respository of RefCurv.

For Windows, RefCurv 0.4.2 comes as complete package and does not require any other

dependencies to be installed. We tested the software with the versions mentioned below.

The main program is written in Python (3.4.0 32-bit) and relies on following packages (with

version):

•numpy (1.14.2)

•scipy (1.1.0)

•matplotlib (2.2.2)

•pandas (0.22.0)

•PyQt4 (4.11.4)

Furthermore, RefCurv is based on R(3.5.2 for 32-bit) and gamlss (5.1-2) add-on package as

statistical engine.

C. Bayesian Information Criterion (BIC)

The Bayesian information criterion (BIC) or Schwarz information criterion (also SIC, SBC,

SBIC) is a criterion for model selection. It is typically used to choose among a models with

a diﬀerent setting of hyperparameters. The model with the lowest BIC is preferred.

The BIC is deﬁned as

BI C = ln(n)k−2 ln(ˆ

ld)(6)

where ˆ

lpis the maximized value of the likelihood function lp(5). nis the number of observa-

tions and kis the number of parameters estimated by the model.

The BIC can help to ﬁnd a compromise between model complexity and goodness of ﬁt. On

the one hand, it penalizes high complexity with the term ln(n)k. On the other hand, the

goodness of ﬁt is represented as 2 ln(ˆ

ld). A high goodness of ﬁt will result in a low BIC.

D. LMS parameter estimation from percentile curves

Fenton and Sauve (2007) proposed using Cole’s methods to estimate the LMS parameters

from percentile curves. They used the Fenton growth chart for preterm infants and generated

new percentile curves from the estimated and smoothed LMS parameters. As a result, they

found the new curve to be similar to the original curves.

This approach can help to use existing charts for z-score prediction of new subjects. There-

fore, we implemented an automatized feature to estimate the LMS parameter values for a

Christian Winkler 31

given chart.

Figure 22 shows percentile curves and the probability density functions BCCG at three diﬀer-

ent positions of the covariate x= (44.2,110.6,177.0). LMS parameter values were estimated

by ﬁtting the probability density function to the percentile curves. The result of the estima-

Figure 22: LMS parameter estimation from percentile curves. The BCCG distri-

bution was ﬁtted to the percentile values. The density function for three diﬀerent positions

of the covariate x= (44.2,110.6,177.0) are highlighted.

tion from percentile curves are L, M, and S over the range of the covariate as shown in ﬁgure

23.

32 RefCurv: A Software for the Construction of Pediatric Reference Curves

Figure 23: LMS parameter values against the covariate.