OBJECTIVE I develop and explain a new method for interpolating detailed fertility schedules from age-group data. The method allows estimation of fertility rates over a fine grid of ages, from either standard or non-standard age groups. Users can calculate detailed schedules directly from the input data, using only elementary arithmetic. METHODS The new method, the calibrated spline (CS) estimator, expands an abridged fertility schedule by finding the smooth curve that minimizes a squared error penalty. The penalty is based both on fit to the available age-group data, and on similarity to patterns of 1fx schedules observed in the Human Fertility Database (HFD) and in the US Census International Database (IDB). RESULTS I compare the CS estimator to two very good alternative methods that require more computation: Beers interpolation and the HFD's splitting protocol. CS replicates known 1fx schedules from 5fx data better than the other two methods, and its interpolated schedules are also smoother. CONCLUSIONS The CS method is an easily computed, flexible, and accurate method for interpolating detailed fertility schedules from age-group data. COMMENTS Data and R programs for replicating this paper’s results are available online at http://calibrated-spline.schmert.net
... While an abundance of methods for estimating complete schedules from abridged data are available (for example, Baili et al., 2005), demographers continue to work to augment existing techniques or develop new ones. The main aim is, of course, to improve accuracy, however important secondary conditions are at play including ensuring flexibility in the application of the method and developing tools with relatively easy computational parameters and requirements (Schmertmann, 2014). The science of smoothing and expanding data is, therefore, an ongoing continuous improvement process for demographers. ...
... In addition to P-TOPALS we use the calibrated splines estimator (CS) to estimate a standard for fertility and mortality data. CS is a non-parametric method for smoothing and expansion that produces demographically plausible profiles by penalizing fits that deviated significantly from a reference set of shapes derived from historical fertility and mortality datasets (Dyrting & Taylor, 2023;Schmertmann, 2014). ...
... Single-year age-specific fertility rates for ages 15 to 49 were calculated by dividing births by population. The single-year rates were smoothed using fertility P-TOPALS (Dyrting, 2018) using a standard obtained by fitting the rates with fertility calibrated splines (Grigorieva et al., 2020;Schmertmann, 2014). Figure 1 shows 2021 raw and smoothed fertility rates for the NT. ...
Sparsely populated areas of developed countries are regions of great demographic diversity and dynamism. While they remain strategically and economically important, trends in urbanization and technology have increased their relative sparsity and isolation making centralized government, service delivery and planning a challenge. Populations of their sub-jurisdictions are small and often exhibit significant heterogeneity in key demographic characteristics, not least between their Indigenous first residents and non-Indigenous citizens. Development of projection models for these areas is challenged by significant input data paucity, biases and structural issues related to the data collection and estimation architectures in place to gather input data across diverse and small populations. While this is the case, the demand for and importance of projections is no less for sparsely populated areas than elsewhere. Variants of the cohort component model are important tools for population projections for SPAs, with their grounding in the demographic accounting equation and modest input requirements. Nevertheless, to attain fit-for-purpose input data requires demographers to consider and select from a growing number of methods for smoothing issues with input data for projections for these regions. In this article we analyze the contributions of recent advances in methods for estimating fertility, mortality, and migration rates for small and diverse populations such as those in SPAs, focusing on the very sparsely populated jurisdiction of the Northern Territory of Australia. In addition to the contributions of our method itself, results at the detailed level demonstrate how abnormal and challenging ‘doing’ projections for sparsely populated areas can be.
... For expanding abridged schedules of fertility rates, the calibrated spline (CS) estimator is a method that integrates the strengths of polynomial, parametric, and relational approaches (Schmertmann 2014). CS combines over-parameterization using B-splines with a structured factoring of shapes using singular value decomposition. ...
... This paper has shown how the core elements of the CS method for fertility can be worked into improving mortality schedules, despite the significant differences in the way these two processes are modelled. In Schmertmann (2014), the fertility rate is a linear function of the B-spline weights and the variance of the fitting errors is independent of the rate, which leads to a linear equation for the optimal spline weights. In contrast, in the CS method, the force of mortality is a non-linear function (an exponential) of the B-spline weights, death rates are a non-linear function of the force of mortality, the variance of the fitting errors is proportional to the death rate, and the optimal spline weights satisfy a system of non-linear equations. ...
Demographers have developed a number of methods for expanding abridged mortality data into a complete schedule; however, these can be usefully applied only under certain conditions, and the presence or absence of one or more additional sources of incompleteness can degrade their relative accuracy, lead to implausible profiles, or even cause the methods to fail. We develop a new method for expanding an abridged schedule based on calibrated splines; this method is accurate and robust in the presence of errors in mortality rates, missing values, and truncation. We compare its performance with the performance of existing methods for expanding abridged data and find that it is superior to current methods at producing accurate and plausible complete schedules over a broad range of data-quality conditions. The method when applied is a valuable addition to existing tools for estimating mortality, especially for small nations, countries with incomplete vital statistics, and subnational populations.
... We made use of this in the Results section to derive reference schedules. Additional possibilities for expanding this work include extending the calibrated splines (CS) methodology [75,76] to migration by performing shape-calibration using the HIMD 1-year implied out-migration probabilities, and investigating how implied outmigration changes with interval length for the countries that record changes of address over more than one interval (Botswana, Canada, Greece, Mozambique, Philippines, Senegal, Spain, Trinidad and Tobago) to gain further understanding of the 1-year/5-year problem [36,37,77]. ...
The majority of migration moves globally are internal within national borders. This makes internal migration intensities an important component for understanding the dynamics of population change according to size, composition and across geographies. While incorporating migration into demography’s quantitative framework allows a description of population change across both time and space, and mathematical and conceptual frameworks for migration have been developed, researchers lack a public repository of historical age-origin-destination-specific migration probabilities that is in a common format and spans a range of countries. Addressing this requires a robust method for inferring migration probabilities from census and survey data when there are significant levels of uncertainty from small-sample noise and age aggregation. In this paper we extend the P-TOPALS and P-spline methods for smoothing migration probabilities to apply to grouped data by ages to develop a methods protocol for a harmonised, homogeneous format and multi-nation Human Internal Migration Database. We find our method out-performs a hybrid spline-parametric method in terms of both accuracy and plausibility. We illustrate the method by estimating complete age-origin-destination migration probabilities for more than 50 countries using microdata samples from IPUMS International. This work advances the stock of migration data from which demographers and others can draw from in the analysis and projection of population change.
... Modelling the age distribution of fertility rates is an essential step in a number of demographic applications. When only a tight fit to an observed schedule is required, as in the generation of single-age rates from grouped data, nonparametric models, typically based on splines, tend to produce the best results [1]. No particular price is then placed on whether each model parameter may be interpreted in any meaningful way. ...
Fitting statistical models to aggregate data is still the dominant approach in many demographic and biodemographic applications. Although these macro-level models have proven useful for a variety of tasks, they often have no demographic interpretation. Individual-level modelling, on the other hand, offers a deeper understanding of the mechanisms underlying observed patterns. Their parameters represent quantities in the real world, instead of pure mathematical abstractions. However, estimating these parameters using real-world data has remained a challenge. The approach we introduce in this article attempts to overcome this limitation. Using a likelihood-free inference technique, we show that it is possible to estimate the parameters of a simple but demographically interpretable individual-level model of the reproductive process by exclusively relying on the information contained in a set of age-specific fertility rates. By estimating individual-level models from widely available aggregate data, this approach can contribute to a better understanding of reproductive behaviour and its driving mechanisms, bridging the gap between individual-level and population-level processes. We illustrate our approach using data from three natural fertility populations.
... To generate regional and global aggregates from the country-level estimates, we weighted the proportions of births occurring within marriage with the estimates of the number of births to girls below age 18 for the year 2015 from the 2022 revision of the World Population Prospects (United Nations 2022b). World Population Prospects estimates fertility rates by five-year age groups using a Bayesian hierarchical time series model fit to empirical estimates of fertility rates and then graduated into single age rates using the Calibrated Spline method (Schmertmann 2014). The estimated rates were then used in the cohort-component projection to determine births by single age of the mother, together with the reconstructed female population by single age. ...
Eliminating child marriage is seen by policy makers and advocates as a path toward reducing births to girls below age 18, as most early births have been previously found to occur within marriage. There has been little recent evidence, however, of the marital context in which early childbearing occurs or how this relationship varies across space and levels of development. Using survey and vital registration data covering approximately 95 percent of the world's births to mothers younger than 18 years, we estimated the share of first births that occur within marriage at the global, regional and national levels. We found that more than half of births to mothers below age 18 worldwide take place in sub-Saharan Africa, and this share will continue to grow. Globally, 76 percent of first births to mothers below age 18 occur within marriage and there are large regional differences. Over the past two decades, the share of first births to mothers below age 18 occurring within marriage declined in most countries with data available, but there are important exceptions. Although most first births to women below age 18 occur following seven months of marriage, the sequencing of child marriage and early childbearing varies widely across countries.
... When available, data by single age were used. Estimates for five-year age groups were graduated into single age using a spline model which was recalibrated using over 4,500 age-specific patterns from high quality vital registration, fertility surveys and health and demographic surveillance systems from low-and middle-income countries (Schmertmann 2014). ...
Background
Considering the soaring health-related costs directed toward a growing, aging, and comorbid population, the health sector needs effective data-driven interventions while managing rising care costs. While health interventions using data mining have become more robust and adopted, they often demand high-quality big data. However, growing privacy concerns have hindered large-scale data sharing. In parallel, recently introduced legal instruments require complex implementations, especially when it comes to biomedical data. New privacy-preserving technologies, such as decentralized learning, make it possible to create health models without mobilizing data sets by using distributed computation principles. Several multinational partnerships, including a recent agreement between the United States and the European Union, are adopting these techniques for next-generation data science. While these approaches are promising, there is no clear and robust evidence synthesis of health care applications.
Objective
The main aim is to compare the performance among health data models (eg, automated diagnosis and mortality prediction) developed using decentralized learning approaches (eg, federated and blockchain) to those using centralized or local methods. Secondary aims are comparing the privacy compromise and resource use among model architectures.
Methods
We will conduct a systematic review using the first-ever registered research protocol for this topic following a robust search methodology, including several biomedical and computational databases. This work will compare health data models differing in development architecture, grouping them according to their clinical applications. For reporting purposes, a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 flow diagram will be presented. CHARMS (Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies)–based forms will be used for data extraction and to assess the risk of bias, alongside PROBAST (Prediction Model Risk of Bias Assessment Tool). All effect measures in the original studies will be reported.
Results
The queries and data extractions are expected to start on February 28, 2023, and end by July 31, 2023. The research protocol was registered with PROSPERO, under the number 393126, on February 3, 2023. With this protocol, we detail how we will conduct the systematic review. With that study, we aim to summarize the progress and findings from state-of-the-art decentralized learning models in health care in comparison to their local and centralized counterparts. Results are expected to clarify the consensuses and heterogeneities reported and help guide the research and development of new robust and sustainable applications to address the health data privacy problem, with applicability in real-world settings.
Conclusions
We expect to clearly present the status quo of these privacy-preserving technologies in health care. With this robust synthesis of the currently available scientific evidence, the review will inform health technology assessment and evidence-based decisions, from health professionals, data scientists, and policy makers alike. Importantly, it should also guide the development and application of new tools in service of patients’ privacy and future research.
Trial Registration
PROSPERO 393126; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=393126
International Registered Report Identifier (IRRID)
PRR1-10.2196/45823
... When available, data by single age were used. Estimates for five-year age groups were graduated into single age using a spline model which was recalibrated using over 4,500 age-specific patterns from high quality vital registration, fertility surveys and health and demographic surveillance systems from low-and middle-income countries (Schmertmann 2014). ...
Available online:
https://www.un.org/development/desa/pd/sites/www.un.org.development.desa.pd/files/wpp2022_summary_of_results.pdf
The 2022 Revision of World Population Prospects is the twenty-seventh edition of official United Nations population estimates and projections that have been prepared by the Population Division of the Department of Economic and Social Affairs of the United Nations Secretariat. It presents population estimates from 1950 to the present for 237 countries or areas, underpinned by analyses of historical demographic trends. This latest assessment considers the results of 1,758 national population censuses conducted between 1950 and 2022, as well as information from vital registration systems and from 2,890 nationally representative sample surveys The 2022 revision also presents population projections to the year 2100 that reflect a range of plausible outcomes at the global, regional and national levels.
... We obtained 5-year age-specific fertility rates for each of the 50 states for 2010 (Martin et al., 2012). We transformed the 5-year age-specific fertility rates into 1-year age-specific fertility rates using the method designed by Schmertmann (2012). This method uses his torical consistencies in fertility schedules to estimate the most likely 1-year age-specific fertility rates. ...
Levels of fertility and the shape of the age-specific fertility schedule vary substantially across U.S. regions with some states having peak fertility relatively early and others relatively late. Structural institutions or economic factors partly explain these heterogeneous patterns, but regional differences in personality might also contribute to regional differences in fertility. Here, we evaluated whether variation in extraversion, agreeableness, conscientiousness, neuroticism, and openness to experience measured at the U.S. state-level was associated with the level, timing, and context of fertility across states above and beyond sociodemographics, voting behavior, and religiosity. Generally, states with higher levels of agreeableness and conscientiousness had more traditional fertility patterns, and states with higher levels of neuroticism and openness had more nontraditional fertility patterns, even after controlling for established correlates of fertility (r ~ |.50|). Personality is an overlooked correlate that can be leveraged to understand the existence and persistence of fertility differentials.
... In this sense the D-spline penalties developed here for mortality are very similar to the cohort shape penalties developed from the Human Fertility Database at the Max Planck Institute for Demographic Research and Vienna Institute of Demography by Schmertmann et al. (2014) for forecasting fertility schedules. The penalties investigated here are especially similar to the spline-based procedure for fertility rate interpolation proposed in Schmertmann (2014). ...
Background: High-dimensional parametric models with penalized likelihood functions strike a good balance between bias and variance for estimating continuous age schedules from large samples. The penalized spline (P-spline) approach is particularly useful for these purposes, but it in small samples it can often produce implausible age schedule estimates. Objective: I propose and evaluate a new type of P-spline model for estimating demographic rate schedules. These estimators, which I call D-splines, regularize and smooth high-dimensional splines by using demographic patterns rather than generic mathematical rules. Methods: I compare P-spline estimates of age-specific mortality rates to three alternative D-spline estimators, over a large number of simulated small populations with known rates. The penalties for the D-spline estimators are derived from patterns in the Human Mortality Database. Results: For mortality estimates in small populations, D-spline estimators generally have lower errors than standard P-splines. Conclusions: Using penalties based on demographic information about patterns and variability in rate schedules improves P-spline estimators for small populations. Contribution: This paper expands demographers' toolkit by developing a new category of P-spline estimators that are more reliable for estimating mortality in small populations.
Modelling is a well-established concept for understanding the typical shape and pattern of age-specific fertility. The distribution of India’s age-specific fertility rate (ASFR) is unimodal and positively skewed and is distinct from the ASFR of the developed countries. The existing models (P-K model, Gompertz model, Skew-normal model and G-P model considered here) that were developed, based on the experiences of the developed countries, failed to fit the single-year age-specific fertility pattern for India as a whole and for the six selected states. Our study has proposed four flexible models, to capture the diverse age pattern of fertility, observed in the Indian states. The proposed models were compared in three ways; among themselves, with the original models and with the popular Hadwiger model. The parameters of these proposed models were estimated through the Non-Linear Least Squares Method. To find the model with best fit, we used the corrected version of Akaike’s Information Criterion (AICc). Optimization of the four original models was successfully done. When the model was fitted to the empirical data of the 4th round of the National Family Health Survey conducted in 2015–2016, the results of this study showed that all the four proposed models outperform their corresponding original models and the Hadwiger model. When comparison among the proposed models was done, the Modified Gompertz Model provided the best fit for India, Uttar Pradesh and Gujarat. Whereas, the Modified P-K model gave the best fit for West Bengal, Tripura and Karnataka. The Modified G-P model is the most suitable model for Punjab. Although our proposed models illustrated the fitting of ASFR for India as a whole and the selected six states only, it provides an important tool for the policymakers and the government authorities to project fertility rates and to understand the fertility transitions in India and various other states.
This paper focuses on the transformation of age-specific fertility rates from five-year age groups into single age. We review and apply different statistical approaches and mathematical models to graduate fertility rates from grouped data into age-specific rates.
We focus on approaches with (a) the most potential to graduate a wide range of fertility patterns (from pre- to post-transition patterns) and (b) the most minimalist data requirements (i.e., only one year of data available). We compare the performances of 10 methods using a sample of HFD countries for which we can compare empirical age-specific fertility rates against graduated ones derived from abridged fertility rates we computed based on annual births and exposure by 5-year age groups of the mother.
This book is based on the author's experience with calculations involving polynomial splines. It presents those parts of the theory which are especially useful in calculations and stresses the representation of splines as linear combinations of B-splines. After two chapters summarizing polynomial approximation, a rigorous discussion of elementary spline theory is given involving linear, cubic and parabolic splines. The computational handling of piecewise polynomial functions (of one variable) of arbitrary order is the subject of chapters VII and VIII, while chapters IX, X, and XI are devoted to B-splines. The distances from splines with fixed and with variable knots is discussed in chapter XII. The remaining five chapters concern specific approximation methods, interpolation, smoothing and least-squares approximation, the solution of an ordinary differential equation by collocation, curve fitting, and surface fitting. The present text version differs from the original in several respects. The book is now typeset (in plain TeX), the Fortran programs now make use of Fortran 77 features. The figures have been redrawn with the aid of Matlab, various errors have been corrected, and many more formal statements have been provided with proofs. Further, all formal statements and equations have been numbered by the same numbering system, to make it easier to find any particular item. A major change has occured in Chapters IX-XI where the B-spline theory is now developed directly from the recurrence relations without recourse to divided differences. This has brought in knot insertion as a powerful tool for providing simple proofs concerning the shape-preserving properties of the B-spline series.
Time series methods are used to make long-run forecasts, with confidence intervals, of age-specific mortality in the United States from 1990 to 2065. First, the logs of the age-specific death rates are modeled as a linear function of an unobserved period-specific intensity index, with parameters depending on age. This model is fit to the matrix of U.S. death rates, 1933 to 1987, using the singular value decomposition (SVD) method; it accounts for almost all the variance over time in age-specific death rates as a group. Whereas e0 has risen at a decreasing rate over the century and has decreasing variability, k(t) declines at a roughly constant rate and has roughly constant variability, facilitating forecasting. k(t), which indexes the intensity of mortality, is next modeled as a time series (specifically, a random walk with drift) and forecast. The method performs very well on within-sample forecasts, and the forecasts are insensitive to reductions in the length of the base period from 90 to 30 years; some instability appears for base periods of 10 or 20 years, however. Forecasts of age-specific rates are derived from the forecasts of k, and other life table variables are derived and presented. These imply an increase of 10.5 years in life expectancy to 86.05 in 2065 (sexes combined), with a confidence band of plus 3.9 or minus 5.6 years, including uncertainty concerning the estimated trend. Whereas 46% now survive to age 80, by 2065 46% will survive to age 90. Of the gains forecast for person-years lived over the life cycle from now until 2065, 74% will occur at age 65 and over. These life expectancy forecasts are substantially lower than direct time series forecasts of e0, and have far narrower confidence bands; however, they are substantially higher than the forecasts of the Social Security Administration's Office of the Actuary.
I propose and examine a new family of models for age-specific fertility schedules, in which three index ages determine the schedule's shape. The new system is based on constrained quadratic splines. It has easily interpretable parameters, is flexible enough to fit a variety of "noiseless" schedules well, and is inflexible enough to avoid implausible estimates from noisy data. Across a set of over two hundred contemporary ASFR schedules, the new model fits a majority better, and in some cases much better, than the Coale-Trussell model. When fit to a recent Swedish time series, model parameters exhibit simple, regular changes over time, suggesting utility in forecasting applications. In simulated small-sample data the new model produces plausible ASFR estimates, with errors similar to Coale-Trussell.
B-splines are attractive for nonparametric modelling, but choosing the optimal number and positions of knots is a complex task. Equidistant knots can be used, but their small and discrete number allows only limited control over smoothness and fit. We propose to use a relatively large number of knots and a difference penalty on coefficients of adjacent B-splines. We show connections to the familiar spline penalty on the integral of the squared second derivative. A short overview of B-splines, their construction, and penalized likelihood is presented. We discuss properties of penalized B-splines and propose various criteria for the choice of an optimal penalty parameter. Nonparametric logistic regression, density estimation and scatterplot smoothing are used as examples. Some details of the computations are presented. Keywords: Generalized linear models, smoothing, nonparametric models, splines, density estimation. Address for correspondence: DCMR Milieudienst Rijnmond, 's-Gravelandse...
Methods Protocol for the Human Fertility Database
Jan 2011
A Jasilioniene
Da Jdanov
E M Sobotka
Andreev
V M Zeman
Shkolnikov
A Jasilioniene, DA Jdanov, T Sobotka, EM Andreev, K Zeman, and VM
Shkolnikov, 2011. "Methods Protocol for the Human Fertility Database".
R: A Language and Environment for StatisticalComputing. R Foundation for Statistical Computing
Jan 2003
DEMOGR RES
81-110
R Development Core Team, 2011. R: A Language and Environment for
StatisticalComputing. R Foundation for Statistical Computing. Vienna.
http://www.R-project.org
CP Schmertmann, 2003. "A system of model fertility schedules with
graphically intuitive parameters". Demographic Research 9/5:81-110.
Third Printing (rev.). US Bureau of the Census, US Government Printing Office
Jan 1975
H S Shryock
Siegel
HS Shryock and JS Siegel, 1975. The Methods and Materials of
Demography, Vol 2. Third Printing (rev.). US Bureau of the Census, US
Government Printing Office. Washington DC.