Conference PaperPDF Available

Graduation methods to derive age-specific fertility rates from abridged data: a comparison of 10 methods using HFD data

Authors:

Abstract and Figures

This paper focuses on the transformation of age-specific fertility rates from five-year age groups into single age. We review and apply different statistical approaches and mathematical models to graduate fertility rates from grouped data into age-specific rates. We focus on approaches with (a) the most potential to graduate a wide range of fertility patterns (from pre- to post-transition patterns) and (b) the most minimalist data requirements (i.e., only one year of data available). We compare the performances of 10 methods using a sample of HFD countries for which we can compare empirical age-specific fertility rates against graduated ones derived from abridged fertility rates we computed based on annual births and exposure by 5-year age groups of the mother.
Content may be subject to copyright.
1
Graduation methods to derive age-specific fertility rates from abridged data:
a comparison of 10 methods using HFD data
Ye Liu
1
, Patrick Gerland
2
, Thomas Spoorenberg
3
, Kantorova Vladimira
4
, Kirill Andreev
5
Extended Abstract for the First Human Fertility Database Symposium
(Rostock, Germany, 3-4 November 2011).
Summary:
This paper focuses on the transformation of age-specific fertility rates from five-year age groups
into single age. We review and apply different statistical approaches and mathematical models to
graduate fertility rates from grouped data into age-specific rates.
We focus on approaches with (a) the most potential to graduate a wide range of fertility patterns
(from pre- to post-transition patterns) and (b) the most minimalist data requirements (i.e., only one
year of data available). We compare the performances of 10 methods using a sample of HFD
countries for which we can compare empirical age-specific fertility rates against graduated ones
derived from abridged fertility rates we computed based on annual births and exposure by 5-year
age groups of the mother.
Background:
Detailed fertility and population data by single calendar year and single age as used by the HFD
are available at best only for countries with good vital registration systems and with regular
population censuses providing detailed and regular demographic data to compute accurately age-
specific fertility rates. Such detailed data, especially by single year of age, are often lacking for
older time periods, and in some instances are less accurate for older birth cohorts due to age
heaping or even age exaggeration. More importantly, since the 1950s only about half of the
countries in the world are able to provide on a regular basis good and reliable vital registration
data. For the rest of the countries, only partial and often deficient information exists for some
years, and fertility data often depend on sample survey information rather than vital registration
and census data.
Since the most commonly available fertility data are often only published in abridged form, and
sometimes only for broader age groups, it is often useful to graduate fertility data from five-year
age groups into single-year for a range of analytical and modeling purposes where annual data
by single age are more convenient to work with.
The demographic literature offers various graduation methods to transform grouped data into
single age, especially in respect to fertility data, but so far no comprehensive evaluation of the
most popular methods is available benchmarking them against a wide range of populations and
time periods. This paper aims to fill this gap by using a sample of HFD countries to evaluate the
performance of 10 transformation methods for age-specific fertility rates.
1. Columbia University, Dept. of Statistics, 1255 Amsterdam Avenue, New York, NY 10027, USA. E-mail:
liuyebest@gmail.com
2. United Nations, DESA, Population Division. Estimates and Projection Section. Room DC2-1914 – 2 UN
Plaza. New York, NY 10017, USA. E-mail: gerland@un.org
3. United Nations, DESA, Population Division. Estimates and Projection Section. Room DC2-1908 – 2 UN
Plaza. New York, NY 10017, USA. E-mail: s poorenberg@un.org
4. United Nations, DESA, Population Division. Fertility Section. Room DC2-1904 – 2 UN Plaza. New York,
NY 10017, USA. E-mail: kantorova@un.org
5. United Nations, DESA, Population Division. Estimates and Projection Section Room DC2-1912 – 2 UN
Plaza. New York, NY 10017, USA. E-mail: andreev@un.org
2
Data:
This paper draws extensively on the Human Fertility Database and focuses on seven OECD
countries providing a wide range of fertility patterns from pre- to post-transition patterns.
The sample of countries and time periods includes: Sweden (1891-2001), Canada (1921-2007),
Germany (1956-2009), Austria (1951-2008), United States of America (1933-2006), Czech
Republic (1950-2009) and France (1946-2009).
Annual data by single-year of age and calendar year for "all birth orders combined" were
downloaded from the HFD web site (http://www.humanfertility.org/) in July 2011. The series used
as inputs were birth counts, female population exposure and age-specific fertility rates.
We aggregated births and female exposure by five-year age groups (from 15-19 to 50-54) and
computed corresponding abridged fertility rates for each year and country. We assumed that the
five-year fertility rates are centered on the mid of the corresponding age groups.
Analytical strategy:
The methods reviewed here have the most basic or minimal data requirement and can be applied
with cross-sectional data available only for one period or year. The choice of methods evaluated
was essentially driven by the type of data typically available for most countries (i.e., only abridged
rates).
We excluded from this analysis two types of methods due to their data requirements:
Age-specific methods (all smoothing methods relying on detailed single-age data) ;
Time series approaches (all methods relying on time series availability, especially Age-
Period-Cohort methods (e.g., Lee-Carter approaches and functional analysis (Hyndman
and Booth ; Hyndman and Shahid Ullah 2007), state-space logistic models (Rueda-
Sabater and Alvarez-Esteban 2008; Rueda and Rodríguez), Support Vector Machines
(Kostaki et al. 2009))
We focused on three types of general methods (non-parametric, osculatory interpolations and
parametric) to convert ASFR by five-year age groups into single age, and use HFD data as a
benchmark with the goal of reproducing the single age rates using only the abridged ones. Other
approaches are not as promising as general methods able to deal with a wide range of patterns
for both developed and developing countries from pre- to post-transition patterns. To summarize
the performance of each model, we computed the sum of square errors (SSE) by age and by
year for each country and model, and use the mean SSE as goodness of fit.
1. Standard non-parametric statistical model: Monotone Piecewise Cubic Interpolation
A wide variety of non-parametric statistical approaches (e.g., kernels and splines) exist (de Beer
2011; McNeil, Trussell and Turner 1977; Schmertmann 2003 ), but one of the most convenient
approaches is the Monotone Piecewise Cubic Interpolation (Fritsch and Carlson 1980; Smith,
Hyndman and Wood 2004). We tested this approach on the original data and after some
transformations:
The first application uses the original data regrouped into five-year age groups. Due to the
five-year aggregation, this approach requires some assumptions at the tails when data are
not available below age 17 or above age 52. In these cases, we extrapolated using the
decreasing ratio of the last two elements, e.g. for the points smaller than the interpolation
range, we let x(i)=x(i+1)*x(i+1)/x(i+2) ; for the points larger than the interpolation range, we let
x(i)=x(i-1)*x(i-1)/x(i-2).
The second application uses a Logit transformation of the original five-year fertility rates (to
constraint the interpolated rates to be bounded between [0,1]), and an anti-logit
transformation back after the interpolation.
3
The third application relies on the cumulative transformation of the data to do the
interpolation. The use of cumulated distributions for model fitting, smoothing and interpolation
is popular in demography because it helps to smooth noisy data and makes it easier to
interpolated data with uneven spacing or open age groups.
2. Osculatory Interpolations
A second approach we considered for its robustness and computational performance relies on
osculatory interpolations. In this section, we have tested four formulas (Beer, Beer Modified,
Karup and Sprague) to subdivide age groups into fifth (Swanson and Siegel 2004). One of the
drawbacks of high degree polynomial interpolations is that negative points may occur at the first
five-group points and the last three five-group points. Consequently we have made the following
adjustments for the negative points for each formula:
Assuming j is largest index of points at the first group which is negative, we adjusted it in the way
that x(j)=x(j+1)*x(j+1)/x(j+2) from the j index to the first point, and summed up all the points in the
group. Then, for j from 1 to 5, we assumed that x(j) equals x(j) times the given value of this group,
and divided by the sum obtained from the previous step. The same adjustments are done for the
last groups, while the smallest index needs to be detected, and adjustments should be made from
the group involved in the final group.
3. Parametric Models
Several parametric functions have commonly been used in demography, especially to model
fertility age patterns (e.g., Beta, Gamma, Pearson curves, Hadwiger (Chandola, Coleman and
Hiorns 1999; Hoem et al. 1981), etc.) and Coale-Trussel (1974) and Xie (1992) models. Typically
these parametric models fit well specific sets of fertility data only for some countries and time
periods for which these functional shapes apply.
The aim of these models is to reduce to a few
interpretable parameters the empirical distributions either to answer analytical questions or to
project trends in the parameters in order to predict future fertility age patterns (Rogers 1986;
Thompson et al. 1989).
The parsimony of these models and the functional shape they impose on the data are useful to
smooth or even adjust poor-quality data by imposing a structure considered more appropriate
than empirically observed. These features are not only strengths, but also limitations preventing
the general use of any of these parametric functions to all countries and time periods. One of the
major problems with this approach is the inability for most of these models to deal with the
change of shape that occurs during the first (or even the second) fertility transition experienced by
most countries. The dependence on any fixed functional form imposes too many constraints and
“rigidity” to any given shape.
In this paper we focus on three general parametric models that have been identified as more
flexible for both historical and contemporary age fertility patterns. We fit them using least squares
estimates:
The first model we use here is the Gompertz model (Goldstein 2010), with the form of:
f(x)=K*a*exp[-a/b*exp(-b*x)-b*x]
The second model is a Modified Gompertz model, which uses the same function as the
Gompertz model when the age x is less than 45. When the age is larger or equal to 45, the
modified model is the product of the functions f(x) and g(x), where g(x) takes the value of 1 at
45 and takes the value of 0 at 54, which is a linear function between 45 and 54.
The third model used is the Two Peak model (Kostaki and Peristera 2007), simply because it
can represent the pattern of two bulges and a flat peak. Its formula is:
f(x)=c
1*
exp{-[(x-u
1
)/σ
1
]
2
}+ c
2*
exp{-[(x-u
2
)/σ
2
]
2
}
4
Preliminary results:
With respect to the modeling by age, the best performing methods providing the best goodness of
fit across the most ages (i.e., smallest errors from empirical single ASFR) are the Beer and
Sprague interpolation methods. The worst approaches are the Monotone Piecewise Cubic
Interpolation with cumulative transformation, where greater errors occur at the mid age of the first
two age groups, and the Gompertz model (Table 1 and Figure 1 for Sweden).
For the Monotone Piecewise Cubic Interpolation, the original data without transformation provide
the smallest error in most instances. In absolute terms, errors become very small after age 45.
The application of this approach on transformed rates (either through Logit transformation or
using a cumulated distribution) leads to larger errors.
For all osculatory Interpolations, there is no great difference in errors at each age, and errors are
smaller after age 45.
For the Parametric Model, the Modified Gompertz Model does a better job than Gompertz Model
after age 45, but both of them are not so efficient to fit older ages compared to other methods.
And for most cases, there are larger errors around the age of 20 and 25 in the Two Peak Model.
Table 1. Mean of the Sum of Square Errors by Age (from 15 to 54):
Sweden Canada Germany Austria USA Czech France Overall*
Original data with Monotone
Cubic interpolation
0.0026 0.0034 0.0011 0.0013 0.0015 0.0028 0.0042 0.0024
Logit transformed data with
Monotone Cubic
0.0024 0.0034 0.0014 0.0020 0.0031 0.0046 0.0029 0.0028
Cumulated data with Monotone
Cubic
0.0048 0.0063 0.0026 0.0039 0.0071 0.0121 0.0039 0.0058
Beer interpolation method 0.0010 0.0014 0.0004 0.0006 0.0007 0.0020 0.0014 0.0011
Beer Modified interpolation
method
0.0012 0.0016 0.0005 0.0007 0.0011 0.0031 0.0016 0.0014
Karup interpolation method 0.0021 0.0027 0.0011 0.0016 0.0027 0.0056 0.0022 0.0026
Sprague interpolation method 0.0009 0.0015 0.0004 0.0005 0.0007 0.0028 0.0015 0.0012
Gompertz function 0.0140 0.0086 0.0022 0.0026 0.0049 0.0040 0.0032 0.0056
Modified Gompertz function 0.0093 0.0072 0.0021 0.0024 0.0046 0.0039 0.0030 0.0047
Two Peak function 0.0026 0.0024 0.0013 0.0023 0.0033 0.0102 0.0039 0.0037
(*) unweighted average of 7 countries
5
Figure 1. Sweden: Errors by Age for selected graduation models
6
With respect to the modeling by time, we reach the same conclusion. The best methods are the
Beer Method and Sprague Method which provide the best goodness of fit over the most years.
The worst approaches are the Monotone Piecewise Cubic Interpolation with cumulative
transformation and Gompertz model (Table 2 and Figure 2 for Sweden).
For most methods, relative large errors occur around the 1950’s and 1960’s. Particularly, the
errors of the Two Peak Model are unstable for some countries. In the Parametric Models, Two
Peak Model has a very good fit for the recent twenty years, while at the same time, Gompertz and
Modified Gompertz Model are so not convincing.
Table 2. Mean of Sum Square Error by Year:
Sweden Canada Germany Austria USA Czech France Overall*
Original data with Monotone
Cubic interpolation
0.0009 0.0016 0.0008 0.0009 0.0008 0.0019 0.0026 0.0013
Logit transformed data with
Monotone Cubic
0.0008 0.0016 0.0010 0.0013 0.0017 0.0030 0.0018 0.0016
Cumulated data with Monotone
Cubic
0.0016 0.0029 0.0019 0.0027 0.0038 0.0081 0.0024 0.0034
Beer interpolation method 0.0003 0.0007 0.0003 0.0004 0.0004 0.0013 0.0009 0.0006
Beer Modified interpolation
method
0.0004 0.0007 0.0004 0.0005 0.0006 0.0021 0.0010 0.0008
Karup interpolation method 0.0007 0.0013 0.0008 0.0011 0.0015 0.0037 0.0014 0.0015
Sprague interpolation method 0.0003 0.0007 0.0003 0.0004 0.0004 0.0018 0.0009 0.0007
Gompertz function 0.0047 0.0040 0.0017 0.0018 0.0027 0.0027 0.0020 0.0028
Modified Gompertz function 0.0032 0.0033 0.0016 0.0017 0.0025 0.0026 0.0019 0.0024
Two Peak function 0.0009 0.0011 0.0010 0.0016 0.0018 0.0068 0.0024 0.0022
Figure 2. Sweden: Errors by Year for selected graduation models
7
Conclusion
Using the HFD data for selected developed countries covering a range of historical and
contemporary fertility patterns by age, we can conclude that the Beer and Sprague formulas are
the best methods to transform fertility rates from standard five-year age groups into single age
rates.
Estimation of non-parametric models with limited age groups can be at time problematic due to
the large number of parameters to estimate with a limited number of empirical observations. In
this context, parametric models are easier to estimate since they are more parsimonious, but they
often depend on an initial guess that can sometime be problematic to find the optimal parameters.
The more parameters involved, the more errors may occur during the estimation ; therefore, the
more parsimonious approaches should be favored with limited data. Finally, nearly all the
methods examined in this paper are inefficient to capture the peak of the fertility age pattern
(occurring often in the middle of a five-year age group – see Annex plots showing Swedish fertility
age pattern graduation for selected years).
References:
Chandola, T., D.A. Coleman, and R.W. Hiorns. 1999. "Recent European Fertility Patterns: Fitting Curves to 'Distorted'
Distributions." Population Studies 53(3):317-329.
Coale, A.J.and T.J. Trussell. 1974. "Model Fertility Schedules: Variations in The Age Structure of Childbearing in Human
Populations." Population Index 40(2):185-258.
de Beer, J. 2011. "A new relational method for smoothing and projecting age-specific fertility rates: TOPALS."
Demographic Research 24(18):409-454.
Fritsch, F.N.and R.E. Carlson. 1980. "Monotone Piecewise Cubic Interpolation." SIAM Journal on Numerical Analysis
17(2):238-246.
Goldstein, J.R. 2010. "A behavioral Gompertz model for cohort fertility schedules in low and moderate fertility
populations." in MPIDR Working Paper. Rostock: Max Planck Institute for Demographic Research.
8
Hoem, J.M., D. Madsen, J.L. Nielsen, E.-M. Ohlsen, H.O. Hansen, and B. Rennermalm. 1981. "Experiments in Modelling
Recent Danish Fertility Curves." Demography 18(2):231-244.
Hyndman, R.J.and H. Booth. "Stochastic population forecasts using functional data models for mortality, fertility and
migration." International Journal of Forecasting 24(3):323-342.
Hyndman, R.J.and M. Shahid Ullah. 2007. "Robust forecasting of mortality and fertility rates: A functional data approach."
Computational Statistics & Data Analysis 51(10):4942-4956.
Kostaki, A., J. Moguerza, A. Olivares, and S. Psarakis. 2009. "Graduating the age-specific fertility pattern using Support
Vector Machines." Demographic Research 20(25):599-622.
Kostaki, A.and P. Peristera. 2007. "Modeling fertility in modern populations." Demographic Research 16(6):141-194.
McNeil, D.R., T.J. Trussell, and J.C. Turner. 1977. "Spline Interpolation of Demographic Data." Demography 14(2):245-
252.
Rogers, A. 1986. "Parameterized Multistate Population Dynamics and Projections." Journal of the American Statistical
Association 81(393):48-61.
Rueda-Sabater, C.and P.C. Alvarez-Esteban. 2008. "The analysis of age-specific fertility patterns via logistic models."
Journal of Applied Statistics 35(9):1053-1070.
Rueda, C.and P. Rodríguez. "State space models for estimating and forecasting fertility." International Journal of
Forecasting 26(4):712-724.
Schmertmann, C. 2003. "A system of model fertility schedules with graphically intuitive parameters." Demographic
Research 9(5):81-110.
Smith, L., R. Hyndman, and S. Wood. 2004. "Spline interpolation for demographic variables: The monotonicity problem."
Journal of Population Research 21(1):95-98.
Swanson, D.and J.S. Siegel. 2004. "The methods and material of demography." Pp. 640. San Diego, CA: Academic
Press.
Thompson, P.A., W.R. Bell, J.F. Long, and R.B. Miller. 1989. "Multivariate Time Series Projections of Parameterized Age-
Specific Fertility Rates." Journal of the American Statistical Association 84(407):689-699.
Xie, Y.and P. Ellen Efron. 1992. "Age Patterns of Marital Fertility: Revising the Coale-Trussell Method." Journal of the
American Statistical Association 87(420):977-984.
9
Annex: Swedish fertility age pattern for selected years
Sweden 1900 fertility age pattern graduation
10
Sweden 1920 fertility age pattern graduation
11
Sweden 1940 fertility age pattern graduation
12
Sweden 1960 fertility age pattern graduation
13
Sweden 1980 fertility age pattern graduation
14
Sweden 2000 fertility age pattern graduation
... The problem of expanding a table of abridged rates into a complete schedule is in a sense the complement of smoothing: While smoothing seeks to remove noise, expansion attempts to construct a signal. For both fertility and mortality there exists an established set of tools for expanding abridged schedules at the national level, where the population is large enough that sample variance is neglected or a modelling of its age-specific structure is not considered necessary (Kostaki & Panousis 2001;Liu, Gerland, Spoorenberg, Vladimira, & Andreev 2011). ...
... Parametric methods provide insight into the underlying drivers of birth and death and are useful for describing a distribution using a small number of meaningful indices (Coale & Trussell 1974;Kostaki 1991;Schmertmann 2003). Spline methods are useful when it is important to expand a schedule in a manner that keeps the original abridged rates unchanged (Elandt-Johnson & Johnson 1980;Grigoriev & Jdanov 2015;Hsieh 1991;Liu et al. 2011). ...
Preprint
Full-text available
I show how a combination of de Beer's TOPALS method and P-Splines can be used to expand abridged fertility and mortality schedules for small populations. The method provides a useful framework for continuously deforming a fit between exact interpolation and indirect standardisation using a single parameter, the B-Spline penalty. I use the approach to estimate complete fertility and mortality profiles for indigenous Australians and compare the results with parametric and spline methods.
... To address this issue, several disaggregation methods have been developed (McNeil et. 1977;Smith, Hyndman, Wood, 2004;Liu, et al. 2011;Schmertmann, 2012;Jasilioniene et al. 2012). Using a sample of HFD countries, Liu et al. (2011) tested 10 different methods that derive age-specific fertility rates from abridged data, and concluded that the modified Beers method (de Beer, 2011) provided the best fit. ...
... 1977;Smith, Hyndman, Wood, 2004;Liu, et al. 2011;Schmertmann, 2012;Jasilioniene et al. 2012). Using a sample of HFD countries, Liu et al. (2011) tested 10 different methods that derive age-specific fertility rates from abridged data, and concluded that the modified Beers method (de Beer, 2011) provided the best fit. Using the HFD and the US Census International Database, Schmertmann (2012) compared the performance of the calibrated spline (CS) with that of the Beers and HFD methods. ...
Technical Report
Full-text available
Occasionally, there is a need to split aggregated fertility data into a fine grid of ages. For this purpose, several disaggregation methods have been developed. Yet these methods have some limitations. We seek to identify a method that satisfies the following criteria: 1) shape-the estimated fertility curves should be plausible and smooth; 2) fit-the predicted values should closely trace the observed values; 3) non-negativity-only positive values should be returned; 4) balance-the estimated five-year age group totals should match the input data; and in case of birth order data 5) parity-the balance by parity has to be maintained. To our knowledge, none of the existing methods fully meets the first four criteria. Moreover, no attempt has been made to extend the restrictions to criterion (5). To address the disadvantages of the existing methods, we introduce two alternative approaches for splitting abridged fertility data: namely, the quadratic optimization (QO) method and the neural network (NN) method. We mainly rely on high-quality fertility data from the Human Fertility Database (HFD), Additionally, we use a large and heterogeneous dataset from the Human Fertility Collection (HFC). The performance of the proposed methods is evaluated both visually (by examining of the obtained fertility schedules), and statistically using several metrics of fit. The QO and NN methods are tested against the current HFD splitting protocol (HFD method) and the calibrated spline (CS) method. The results of thorough testing suggest that both methods perform well. The main advantage-and a distinguishing feature-of the QO approach is that it meets all of the requirements listed above. However, it does not provide a fit as good as that of the NN and CS methods. In addition, when it is applied to birth order data, it can sometimes produce implausible shapes for parity 1. To account for such cases, we have developed individual solutions, which can easily be adapted to account for other cases that might occur. While the NN method does not satisfy the balance and parity criteria, it returns better results in terms of fit than the other methods. The QO method satisfies the needs of large databases such as the HFD and the HFC. While this method has very strict requirements, it returns plausible fertility estimates regardless of the nature of the input data. The NN method appears to be a suitable alternative for use in individual cases in which the priority is given to the fit criterion.
... With aggregated data at hand there is often the need to estimate agespecific distributions on a more detailed grid of ages, e.g. by single year of age, to compare, for example, nonagenarians and centenarians over time. This problem has been addressed in the literature and several solutions have been suggested for ungrouping age-at-death distributions [3] and fertility patterns [4]. ...
... Other studies in the literature compare various methods for ungrouping. They focus though on parametric and non-parametric models together that estimate specific distributions, such as overall age-specific mortality [3] or age-specific fertility [4]. They also do not tackle the problem of open-ended intervals. ...
Article
Full-text available
Background Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data. Methods From an extensive literature search we identify five methods for ungrouping count data. We compare the performance of two spline interpolation methods, two kernel density estimators and a penalized composite link model first via a simulation study and then with empirical data obtained from the NORDCAN Database. All methods analyzed can be used to estimate differently shaped distributions; can handle unequal interval length; and allow stretches of 0 counts. ResultsThe methods show similar performance when the grouping scheme is relatively narrow, i.e. 5-years age classes. With coarser age intervals, i.e. in the presence of open-ended age groups, the penalized composite link model performs the best. Conclusion We give an overview and test different methods to estimate detailed distributions from grouped count data. Health researchers can benefit from these versatile methods, which are ready for use in the statistical software R. We recommend using the penalized composite link model when data are grouped in wide age classes.
... Division (Liu et al. 2011) recently used HFD data to compare the accuracy of several interpolation methods for fertility schedules. They concluded that the best overall method for recovering single-year age-specific rates 8 99% of fitted single-year rates with the CS model are within .01 of the equivalent HFD data. ...
... Interpolated CS schedules are smoother and fit known data better. CS calculation is also much simpler than the HFD splines or the Beers variant used by Liu et al. (2011), because it does not require complex adjustments for edge effects and negative values. ...
Article
Full-text available
OBJECTIVE I develop and explain a new method for interpolating detailed fertility schedules from age-group data. The method allows estimation of fertility rates over a fine grid of ages, from either standard or non-standard age groups. Users can calculate detailed schedules directly from the input data, using only elementary arithmetic. METHODS The new method, the calibrated spline (CS) estimator, expands an abridged fertility schedule by finding the smooth curve that minimizes a squared error penalty. The penalty is based both on fit to the available age-group data, and on similarity to patterns of 1fx schedules observed in the Human Fertility Database (HFD) and in the US Census International Database (IDB). RESULTS I compare the CS estimator to two very good alternative methods that require more computation: Beers interpolation and the HFD's splitting protocol. CS replicates known 1fx schedules from 5fx data better than the other two methods, and its interpolated schedules are also smoother. CONCLUSIONS The CS method is an easily computed, flexible, and accurate method for interpolating detailed fertility schedules from age-group data. COMMENTS Data and R programs for replicating this paper’s results are available online at http://calibrated-spline.schmert.net
... Beer (2011) used the non-linear least-squares method to estimate model parameters. Liu et al. (2011) proposed using single-age agespecific fertility data. In contrast to the 5-year age group, Asili et al. (2014) tested model suitability using the norm of the residual obtained from MATLAB. ...
Article
Full-text available
Fertility pattern analysis and modeling to smooth age-specific fertility rates (ASFRs) form a well-established research field that holds particular importance for Asian countries. In developed nations, ASFRs typically display a bimodal skewed fertility curve, whereas, in developing countries, they usually exhibit a unimodal skewed fertility curve that diverges from the normal one. For decades, demographic experts worldwide have been interested in creating models using deterministic and stochastic approaches to represent these fertility curves. In this regard, parametric and non-parametric models have been created, with the latter providing a better fit for ASFR data. This research investigates the evolution of fertility models aimed at smoothing ASFRs. It explores suitable alternative models for countries with fast-declining, unimodal, and skewed fertility curves of ASFRs, such as Nepal and Malaysia. Nepal's fertility rate is transitioning from a high level toward the replacement rate (2.1) at the year 2021; meanwhile, Malaysia's fertility rate (1.7) in the year 2021 has dropped below the replacement rate. Given the lack of a universally applicable model for ASFR pattern variation, this study proposes the Kumaraswamy log-logistic distribution as a promising model to represent the ASFRs of Nepal and Malaysia accurately. Various approaches, including the Akaike information criterion, and Bayesian information criterion, are employed to validate the fitting of the proposed model.
... Beer (2011) used the non-linear least-squares method to estimate model parameters. Liu et al. (2011) proposed using single-age agespecific fertility data. In contrast to the 5-year age group, Asili et al. (2014) tested model suitability using the norm of the residual obtained from MATLAB. ...
Article
Fertility pattern analysis and modeling to smooth age-specific fertility rates (ASFRs) form a well-established research field that holds particular importance for Asian countries. In developed nations, ASFRs typically display a bimodal skewed fertility curve, whereas, in developing countries, they usually exhibit a unimodal skewed fertility curve that diverges from the normal one. For decades, demographic experts worldwide have been interested in creating models using deterministic and stochastic approaches to represent these fertility curves. In this regard, parametric and non-parametric models have been created, with the latter providing a better fit for ASFR data. This research investigates the evolution of fertility models aimed at smoothing ASFRs. It explores suitable alternative models for countries with fast-declining, unimodal, and skewed fertility curves of ASFRs, such as Nepal and Malaysia. Nepal’s fertility rate is transitioning from a high level toward the replacement rate (2.1) at the year 2021; meanwhile, Malaysia’s fertility rate (1.7) in the year 2021 has dropped below the replacement rate. Given the lack of a universally applicable model for ASFR pattern variation, this study proposes the Kumaraswamy log-logistic distribution as a promising model to represent the ASFRs of Nepal and Malaysia accurately. Various approaches, including the Akaike information criterion, and Bayesian information criterion, are employed to validate the fitting of the proposed model.
... Let y (j) groups into single-year. It is desirable to work with yearly data by single age in many statistical models (see also Liu et al., 2011). There are two most distinct patterns of the Australian age-specific fertility rates shown by the rainbow plots. ...
Preprint
Full-text available
Fertility differentials by urban-rural residence and nativity of women in Australia significantly impact population composition at sub-national levels. We aim to provide consistent fertility forecasts for Australian women characterized by age, region, and birthplace. Age-specific fertility rates at the national and sub-national levels obtained from census data between 1981-2011 are jointly modeled and forecast by the grouped functional time series method. Forecasts for women of each region and birthplace are reconciled following the chosen hierarchies to ensure that results at various disaggregation levels consistently sum up to the respective national total. Coupling the region of residence disaggregation structure with the trace minimization reconciliation method produces the most accurate point and interval forecasts. In addition, age-specific fertility rates disaggregated by the birthplace of women show significant heterogeneity that supports the application of the grouped forecasting method.
... Let y (j) groups into single-year. It is desirable to work with yearly data by single age in many statistical models (see also Liu et al., 2011). There are two most distinct patterns of the Australian age-specific fertility rates shown by the rainbow plots. ...
Article
Fertility differentials by urban–rural residence and nativity of women in Australia significantly impact population composition at sub-national levels. We aim to provide consistent fertility forecasts for Australian women characterized by age, region, and birthplace. Age-specific fertility rates at the national and sub-national levels obtained from census data between 1981 and 2011 are jointly modeled and forecast by the grouped functional time series method. Forecasts for women of each region and birthplace are reconciled following the chosen hierarchies to ensure that results at various disaggregation levels consistently sum up to the respective national total. Coupling the region of residence disaggregation structure with the trace minimization reconciliation method produces the most accurate point and interval forecasts. In addition, age-specific fertility rates disaggregated by the birthplace of women show significant heterogeneity that supports the application of the grouped forecasting method.
... Fertility rate does not vary as much as for HFD but curvature is discontinuous across nodes. BEERS S Fertility rates are aggregated into five-years intervals and then split using Beers method (Liu, Gerland, Spoorenberg, Vladimira & Andreev 2011). CALIBRATED SPLINE NP Fit fertility rates with B-splines to maximise projection onto a calibrated set of fertility shapes (Schmertmann 2014). ...
Technical Report
Full-text available
Osier is a software library for demographic analysis. It provides facilities for creating flexible data structures, building standard demographic curves, and calculating key rates. It is available as an Excel addin with plans to export it to R and Octave. It is currently at the Beta stage and available from the author.
... Nous pouvons citer à titre d'exemple le modèle de Coale and Trussel (1974), le modèle de Xie and Pementel (1992) et le modèle de Gompertz adapté pour la fécondité. Pour plus de détails concernant les méthodes paramétriques d'ajustement des taux de fécondité, le lecteur peut consulter Liu et al. (2011). ...
Article
Full-text available
Age-specific fertility rates can be smoothed using parametric models or splines. Alternatively a relational model can be used which relates the age profile to be fitted or projected to a standard age schedule. This paper introduces TOPALS (tool for projecting age patterns using linear splines), a new relational method that is less dependent on the choice of the standard age schedule than previous methods. TOPALS models the relationship between the age-specific fertility rates to be fitted and the standard age schedule by a linear spline. This paper uses TOPALS for smoothing fertility age profiles for 30 European countries. The use of TOPALS to create scenarios of the future level and age pattern of fertility is illustrated by applying the method to project future fertility rates for six European countries.
Article
Full-text available
A topic of interest in demographic literature is the graduation of the age-specific fertility pattern. A standard graduation technique extensively used by demographers is to fit parametric models that accurately reproduce it. Non-parametric statistical methodology might be alternatively used for this graduation purpose. Support Vector Machines (SVM) is a non-parametric methodology that could be utilized for fertility graduation purposes. This paper evaluates the SVM techniques as tools for graduating fertility rates In that we apply these techniques to empirical age specific fertility rates from a variety of populations, time period, and cohorts. Additionally, for comparison reasons we also fit known parametric models to the same empirical data sets.
Article
Full-text available
In demography, it is often necessary to obtain a monotonic interpolation of data. A solution to this problem is available using the Hyman filter for cubic splines. However, this does not seem to be well known amongst demographers, and no implementation of the procedure is readily available. We remedy these problems by outlining the relevant ideas here, and providing a function for the R language.
Article
Full-text available
In this paper, we introduce logistic models to analyse fertility curves. The models are formulated as linear models of the log odds of fertility and are defined in terms of parameters that are interpreted as measures of level, location and shape of the fertility schedule. This parameterization is useful for the evaluation, and interpretation of fertility trends and projections of future period fertility. For a series of years, the proposed models admit a state-space formulation that allows a coherent joint estimation of parameters and forecasting. The main features of the models compared with other alternatives are the functional simplicity, the flexibility, and the interpretability of the parameters. These and other features are analysed in this paper using examples and theoretical results. Data from different countries are analysed, and to validate the logistic approach, we compare the goodness of fit of the new model against well-known alternatives; the analysis gives superior results in most developed countries.
Article
Fritsch and Carlson developed an algorithm which produces a monotone C1 piecewise cubic interpolant to a monotone function. We show that the algorithm yields a third-order approximation, while a modification is fourth-order accurate. (Author)
Article
Necessary and sufficient conditions are derived for a cubic to be monotone on an interval. These conditions are used to develop an algorithm that constructs a visually pleasing monotone piecewise cubic interpolant to monotone data. Several examples are given that compare this algorithm with other interpolation methods. 5 figures.
Article
We introduce multivariate state space models for estimating and forecasting fertility rates that are dynamic alternatives to logistic representations for fixed time points. Strategies are provided for the Kalman filter and for quasi-Newton algorithm initialization, that assure the convergence of the iterative fitting process. The broad impact of the new methodology in practice is shown using data series from Spain, Sweden and Australia, and by comparing the results with a recent approach based on functional data analysis and also with official forecasts. Very satisfactory short- and medium-term forecasts are obtained. Besides this, the new modeling proposal provides practitioners with several suitable interpretative tools, and the application here is an interesting example of the usefulness of the state space representation in modelling real multivariate processes.
Article
A new method is proposed for forecasting age-specific mortality and fertility rates observed over time. This approach allows for smooth functions of age, is robust for outlying years due to wars and epidemics, and provides a modelling framework that is easily adapted to allow for constraints and other information. Ideas from functional data analysis, nonparametric smoothing and robust statistics are combined to form a methodology that is widely applicable to any functional time series data observed discretely and possibly with error. The model is a generalization of the Lee–Carter (LC) model commonly used in mortality and fertility forecasting. The methodology is applied to French mortality data and Australian fertility data, and the forecasts obtained are shown to be superior to those from the LC method and several of its variants.
Article
Age–sex-specific population forecasts are derived through stochastic population renewal using forecasts of mortality, fertility and net migration. Functional data models with time series coefficients are used to model age-specific mortality and fertility rates. As detailed migration data are lacking, net migration by age and sex is estimated as the difference between historic annual population data and successive populations one year ahead derived from a projection using fertility and mortality data. This estimate, which includes error, is also modeled using a functional data model. The three models involve different strengths of the general Box–Cox transformation chosen to minimise out-of-sample forecast error. Uncertainty is estimated from the model, with an adjustment to ensure that the one-step-forecast variances are equal to those obtained with historical data. The three models are then used in a Monte Carlo simulation of future fertility, mortality and net migration, which are combined using the cohort-component method to obtain age-specific forecasts of the population by sex. The distribution of the forecasts provides probabilistic prediction intervals. The method is demonstrated by making 20-year forecasts using Australian data for the period 1921–2004. The advantages of our method are: (1) it is a coherent stochastic model of the three demographic components; (2) it is estimated entirely from historical data with no subjective inputs required; and (3) it provides probabilistic prediction intervals for any demographic variable that is derived from population numbers and vital events, including life expectancies, total fertility rates and dependency ratios.