Available via license: CC BY 4.0
Content may be subject to copyright.
Page 1/17
Association between time of assessment within a school year and physical
tness of primary school children
Paula Teich ( paula.teich@uni-potsdam.de )
University of Potsdam
Kathleen Golle
University of Potsdam
Reinhold Kliegl
University of Potsdam
Article
Keywords:
Posted Date: December 29th, 2023
DOI: https://doi.org/10.21203/rs.3.rs-3793043/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License
Additional Declarations: No competing interests reported.
Page 2/17
Abstract
The dissociation of effects of age, time of assessment and cohort is a well-known challenge in developmental science. We examined effects of time of
assessment in the school year on children’s physical tness using data from 75,362 German third-graders from seven cohorts. Children were tested once either
in the rst or second school term of third grade. Tests examined cardiorespiratory endurance (6-min run), coordination (star-run), speed (20-m sprint), lower
(standing long jump) and upper (ball-push test) limbs muscle power, and exibility (stand-and-reach test). We estimated the effect of time of assessment
using a regression discontinuity design specied in a linear mixed model with random factors child and school and adjusted for age and cohort effects.
Coordination, speed, and upper limbs muscle power were better in second compared to rst school term, with boys exhibiting a larger increase of upper limbs
muscle power than girls. There was no evidence for changes in cardiorespiratory endurance, lower limbs muscle power, and exibility between assessments.
Previously reported age and sex effects as well as secular tness trends were replicated. Thus, there is evidence for improvement of some physical tness
components beyond age and cohort effects that presumably reects the benet of physical education. Effects of assessment time should be taken into
consideration in performance-based grading or norm-based selection of children.
Introduction
Physical tness is an important health marker of children and adolescents 1–3. Especially cardiorespiratory endurance is associated with better cardiovascular
health and a lower risk for obesity 2,4. Further, cardiorespiratory and muscular tness are positively related to children’s health-related quality of life 5, and
cognitive function 6,7.
There are several well-known gender- and age-related effects on children’s physical tness 8–10. The direction of gender effects depends on the physical tness
component. For example, boys outperform girls in cardiorespiratory endurance and muscle power but the reverse is true for static balance and exibility 8,11,12.
The size of gender differences also varies strongly by physical tness component, for instance, they are larger for muscle power than for coordination 8,13.
There is also age-related development of physical tness 8–10. Within the ninth year of life (i.e., in third grade of primary school), age-related development is
linear for six tests assessing cardiorespiratory endurance, coordination, speed, lower and upper limbs muscle power, and static balance, and developmental
rates do not differ signicantly between the sexes. However, physical tness components differ strongly in how much they develop within one year. For
example, age effects in third grade are comparatively small for cardiorespiratory endurance and large for upper limbs muscle power 8,13.
Given that children’s performance in most physical tness tests improves with increasing age, the question is whether this presumably simple age-related
developmental effect can be dissociated from a presumably specic developmental effect due to the amount of physical education received in the school
year. The latter should be reected in an effect of time of assessment within the school year on physical tness. That is, better physical tness later in the
school year may not only be due to increased age, but also partially due to more exposure to physical education relative to beginning of the school year. In the
present study, a very large sample of third-graders from the Federal State of Brandenburg, Germany, that were enrolled to school according to the legal keydate
(i.e., keyage children, varying in age between 8.0 and 9.0 on September 30 in third grade) were tested either in rst or second school term of the school year. As
keyage third-graders tested in the second school term were on average half a year older than keyage third-graders tested in the rst school term, there is a
confound of age and time of assessment (i.e., school term) at the group level. However, because there is a sucient overlap of children’s age ranges between
school terms, the effects of age-related development and time of assessment (i.e., rst vs. second school term) can be dissociated in principle by estimating
effects of age within school terms.
Time of assessment in the year has been the topic of some earlier research. Children’s cardiorespiratory tness was reported to decline during the summer
break 14,15, illustrating the importance of physical education classes and school sports for tness development, especially in children for whom school sports
is the main source of moderate-to-vigorous activity. In this case, a lack-of-exposure effect was counter to or stronger than the expected age-related
improvement. In contrast, Drenowatz et al. 16 reported more monthly development in several tness components during the summer break compared to the
school year. In their study, 214 primary school children were tested at the beginning and end of each school year for a period of four years. Age- and sex-
standardized scores in the 6-min run, push-ups, sit-ups, and standing long jump (i.e., cardiorespiratory endurance, muscular endurance, lower limbs muscle
power) were higher at the beginning than at the end of the school year. They suggest that their results may be due to a summer-related increase of physical
activity in their sample. However, performance in the 20-m sprint, balancing and side-ways jumping was better at the end of the school year, while performance
in the stand-and-reach test was not affected by time of assessment.
Finally, Hjorth et al. 17 tested children’s physical tness three times within a school year and reported higher cardiorespiratory endurance of third- and fourth-
graders in spring, relative to the previous fall or winter 17. However, this study did not adjust for age when estimating the effect of assessment time on physical
tness. In summary, previous research on the effects of time of assessment on children’s physical tness reported a varied prole of results.
A second confound of assessment time within the school year, that is specic to data used in the present study, is cohort. Data with the same assessment
protocol are available for cohorts from 2009 to 2015. In cohorts 2011 to 2015, time of assessment was always in the
rst school term
, while for cohorts 2009
and 2010, time of assessment was in the
second school term
. Thus, an effect of time of assessment must also be dissociated from potential cohort-related
changes in performance that vary in direction and size between physical tness components 8,18. For example, cardiorespiratory endurance was shown to
decline whereas speed increased across cohorts 2011–2015 in a previous report of these data 8. In the previous report, Fühner et al. 8 did not include data
from cohorts 2009 and 2010 to avoid the confound with time of assessment.
In the present study, the confound between cohort and time of assessment was addressed with a regression discontinuity design (RDD) 19,20, that is we
assume that the change in time of assessment between cohorts 2010 and 2011 becomes visible (1) in a step-up or step-down change in performance when
Page 3/17
extrapolating secular trends forward from 2009–2010 and backwards from 2015–2011 to the date of this design discontinuity or (2) in a change in the
secular trend before and after the two segments of cohorts.
Due to the positive effects structured exercise may have on physical tness 14,15, and as children tested in the second school term have on average been
exposed to an additional half year of physical education classes than children tested earlier in the school year, we expected better tness of third-graders in
the second compared to the rst school term after statistically adjusting for age-related and cohort-related correlates. We also tested whether the prole of sex
and age effects reported previously 8 are moderated by time of assessment (i.e., we test age x school term and sex x school term interactions). We expected to
replicate cohort-related trends in physical tness, specically a decline of cardiorespiratory endurance and an increase of speed over the years.
Methods
Experimental approach
Starting in school year 2009/10, the EMOTIKON research project (uni-potsdam.de/en/emotikon/) has annually tested the physical tness of all third-graders in
the Federal State of Brandenburg, Germany. EMOTIKON was mandated and approved by the Ministry of Education, Youth and Sport of Brandenburg. Based on
the Brandenburg School Law, participation is obligatory for all public primary schools 21.
The present study used data from cohorts 2009 until 2015. In the German school system, each school year has two school terms. The rst school term usually
begins in late summer or early fall (i.e., August or September) and the second school terms begins in winter, typically around mid-February, and lasts until the
summer break (i.e., between end of June and mid-July). School-summer holidays last six weeks in Germany. In the school years 2009/10 and 2010/2011,
tness tests were conducted during the second school term, between April and June 2010 (i.e., cohort 2009) or mid-February to April 2011 (i.e., cohort 2010).
Note that, because third-graders belonging to cohorts 2009 and 2010 were tested in their second school term, their data was collected in 2010 and 2011,
respectively. Starting in school year 2011/12 (i.e., cohorts 2011 until 2015), the tness tests were conducted in the rst school term, between September and
November.
Data from cohorts 2011 to 2015 were published in Fühner et al. 8. Adding data from the cohorts 2009 and 2010 (i.e., school years 2009/10 and 2010/11)
allowed us (1) to test effects of timing of assessment on physical tness, and also (2) to include data from a sixth tness test assessing exibility, which was
part of the EMOTIKON test battery from 2009 until 2015. Despite some differences between the data sets, we expected to replicate age and sex effects on the
ve physical tness tests analyzed previously 8. We also expected to replicate cohort effects reported previously for ve of the six tness components 8,18.
Prior to the EMOTIKON tests, schools and parents received written information about the EMOTIKON research project, information on data processing and
data protection. Schools received instructions on test administration. Research was conducted in accordance with the latest Declaration of Helsinki 22 and the
Brandenburg School Law 21. The authors received the data completely anonymized from the Ministry of Education, Youth and Sport of the Federal State of
Brandenburg. None of the researchers had access to personally identiable information of the children.
Participants
96,956 children participated in the EMOTIKON research project between the school years 2009/10 and 2015/16. Based on previous research that showed a
delayed physical tness development for children with a delayed school enrollment 23,24, we limited analyses to children with school enrollment according to
the legal key date. In their year of school enrollment, these children had turned six before the legal key date, which is September 30 in the Federal State of
Brandenburg, Germany. This left us with 75,398 children. We excluded children with a disability or autism (
N
= 35). For each physical tness test and
separately for boys and girls, test scores outside of a ± 3
SD
range were excluded. This left us with 440,139 test scores from 75,362 children in seven cohorts.
Table1 provides a sample description including number of children and schools as well as children’s ages. For a more detailed sample description including
children’s mean test scores in the rst and second school term, see Table S1 in the Supplements.
Table 1
Sample description
Time of assessment N children N schools N observations Age
mean (SD)
1st school term (cohorts 2011 to 2015) 54,190 (50.6% girls) 462 317,565 8.51 (0.29)
2nd school term
(cohorts 2009 + 2010)
21,172 (50.9% girls) 417 122,574 9.07 (0.31)
total 75,362 (50.7% girls) 469 440,139 8.66 (0.38)
Physical tness tests
The EMOTIKON tests assess the six physical tness components cardiorespiratory endurance (i.e., 6-minute run), coordination (i.e., star-run), speed (i.e., 20-m
linear sprint), lower (i.e., standing long jump) and upper (i.e., ball-push test) limbs muscle power, and exibility (i.e., stand-and-reach test). Physical education
teachers conducted the physical tness tests, following a standard procedure (for more details, please see uni-potsdam.de/en/emotikon/projekt/methodik).
Prior to the physical tness tests, children received a warm-up consisting of running exercises and games. Children were encouraged to achieve their best
performance in the physical tness tests.
Page 4/17
Cardiorespiratory endurance
. The 6-min run assessed children’s cardiorespiratory endurance. For six minutes, children ran as far as possible around a eld of
the size 9 m x 18 m ( ≙ 54 m). The eld was marked by six pylons that were set at a 9-m distance from each other. If a child stopped between two pylons at
the stop signal, they were allowed to continue to the next pylon. The total running distance up to that last pylon was recorded in meters and used for analysis.
In children aged 7 to 11 years, the 6-minute run showed a test-retest reliability of
r
= .92 25.
Coordination
. The star-run was used to assess coordination under time pressure. Children had to run a star-like pattern with a total distance of 50.912 m as
fast as possible. The pattern was marked by ve pylons, four of which were set at the corners of a 9 m x 9 m square and one pylon marking the center.
Starting from the center, children had to run to each of the other four pylons, touch it by hand and run back to the center. The order of movement directions
and movement forms (i.e., running forward, running backward, side-steps to the right side, side-steps to the left side) that children had to use to complete the
parkour was standardized. Time was measured with a 1/10 s accuracy. Each child completed the star-run test twice, and the better result was used in
analysis. In 8- to 10- year-old children, the star-run test showed a test-retest reliability (intra-class-correlation coecient) of .68 (95% CI: .53 − .79) 26.
Speed
. The 20-m linear sprint tested linear sprint speed. Children stood in an upright position, one foot on the start line. After an acoustic start signal, they
sprinted as fast as possible over a distance of 20 meters. Time was measured in seconds with a 1/10 s accuracy. Children had two trials; the fastest trial was
used for analysis. In children aged 7 to 11 years, the 20-m linear sprint test showed a test-retest reliability of
r
= .90 25.
Lower limbs muscle power (PowerLOW)
. The standing long jump tested muscle power of the lower limbs (PowerLOW). From a standing upright position with
their feet parallel, children had to jump as far as possible. They had to jump with both legs and land with both feet together. Children were allowed to swing
their arms before and during the jump but they were not allowed to touch the oor with their hands after landing. The jump distance between the starting line
and their heels at landing was measured to the nearest centimeter. The children completed the standing long jump twice, the trial with the better jump distance
was used for analysis. The standing long jump showed a test-retest reliability (ICC) of .94 (95% CI .93 − .95) in children aged 6 to 12 years 27.
Upper limbs muscle power (PowerUP)
. The ball-push test assessed muscle power of the upper limbs (PowerUP). Children had to hold a 1 kg medicine ball in
front of their chest with their arms bent and then push the ball as far as possible with both hands. The pushing distance was measured in meters with a 10 cm
accuracy. Each child completed the ball-push test twice. The trial with the better pushing distance was used for analysis. In 8- to 10-year-old children, the ball
push test showed a test-retest reliability (ICC) of .81 (95% CI: .71 − .87) 26.
Flexibility
. Children’s exibility was tested using the stand-and-reach test. Children stood barefoot with their feet together on a box on which a centimeter scale
was attached. 100 cm marked the edge of the box. They stretched out their arms and held them shoulder-wide above their head. On an exhale, children bent
their upper body forward, their knees remaining straight. With their ngertips, they reached down as far as possible on the centimeter scale. Distance reached
on the scale was measured to the nearest one centimeter. Children had two trials; the better result was used for analysis. In children aged 7 to 11 years, the
stand-and-reach test showed a test-retest reliability of
r
= .94 25.
Statistics
Data preprocessing and analysis was done using
R
(4.2.3) 28, the
RStudio IDE
29, and
Julia
(Version 1.9) 30. For data preprocessing, we used the
tidyverse
packages 31 in
R
. Linear mixed models were t using the
MixedModels.jl
32 and
MixedModelsMakie.jl
33 packages in
Julia
. Partial effects were computed with
the
MixedModelExtras.jl
package 34 in
Julia
.
Preprocessing was similar to the one reported in previous studies 8,12. Based on a Box-Cox distributional analysis 35, a reciprocal transformation of the star-run
and the 20-m sprint scores brought model residuals in line with a normal distribution. For the stand-and-reach test, test scores were squared.
The original unit of the star-run and the 20-m sprint test was seconds. We transformed their units into meters/seconds by multiplying the reciprocal scores
(1/seconds) of the star-run with 50.912 (distance in meters of the star-run) and the reciprocal scores of the 20-m sprint with 20 (distance in meters of the 20-m
sprint). Consequently, for all six tness tests, higher scores indicate a better performance.
We rst calculated z-scores for each physical tness test, separately for boys and girls. Scores outside of a ± 3
SD
range were dened as outliers and excluded
from data analysis. We then recalculated z-scores for each physical tness test aggregated over boys and girls, to keep sex-related differences in the data.
Contrasts were similar to the ones specied in previous analyses 8,12,13. For the six-level factor physical tness component, we specied ve contrasts
comparing (1) cardiorespiratory endurance vs. coordination, speed and powerLOW (i.e., cardiorespiratory endurance vs. tests of acceleration, E vs. CSL), (2)
coordination vs. speed and powerLOW (C vs. SL), (3) speed vs. powerLOW (S vs. L), (4) cardiorespiratory endurance, coordination, speed and powerLOW vs.
powerUP (ECSL vs. U), and (5) cardiorespiratory endurance, coordination, speed and powerLOW vs. flexibility (ECSL vs. F). These contrasts were motivated by
the fact that the rst four tness components are positively correlated and indicative of the latent construct “physical tness”, whereas correlations between
powerUP and the other tness components are lower 8,12,13. Similarly, as exibility is neither energetically determined nor information-oriented, but reects a
passive system of energy transmission 36, correlations between exibility (F) and the rst four physical tness components (ECSL) were also expected to be
low. Moreover, expected sex differences in exibility are qualitatively different from the other tests (i.e., girls > boys) 10,11. The factor assessment was dummy
coded, with “rst school semester” as reference category. A sequential difference contrast of the factor sex compared the physical tness of boys and girls,
with positive estimates indicating a better performance of boys, and negative estimates indicating a better performance of girls. Age was centered at 8.5
years, and cohort was centered at 2010.5.
A regression discontinuity design (RDD) 19,20 provides two statistical tests of theoretical relevance. First, we test whether there is an assessment effect at
2010.5 (i.e., at the date separating cohorts with assessment in second vs. rst school semester). The effect is computed as the difference between the
Page 5/17
intercepts of the forwardly extrapolated secular linear trend of 2009–2010 cohorts and the intercept of the backwardly extrapolated secular linear trend of
2015–2011 cohorts. The second RDD statistic is a test of the interaction between the dummy coded assessment factor and the linear trend across cohorts
(centered at 2010.5). A signicant positive interaction indicates that the slope across cohorts with assessment in the second school term (i.e., cohorts 2009
and 2010) was larger than the linear slope for cohorts with assessment in the rst school term (i.e., cohorts 2011–2015). We also allowed for a quadratic
trend across all cohorts. This RDD design was specied in an LMM with child (
N
= 75,362) and school (
N
= 469) as random factors.
Parsimonious RDD-based LMM selection 37 is documented in script
Assessment.qmd
in the OSF repository (https://osf.io/4vj2q/). The goal was to t an
LMM that included all relevant variance components and correlation parameters without overparameterization. We started with a complex LMM including the
xed effects of assessment, sex, a second-order polynomial age trend, a third-order polynomial cohort trend and interactions between xed effects, all nested
under the factor levels of physical tness component. A quadratic age and a cubic cohort effect and interactions between sex x age or sex x cohort indicated
overparameterization and were dropped. The nal LMM included the xed effects for assessment, sex, age (linear), cohort (linear and quadratic), the
interaction between assessment and sex, the interaction between assessment and age, and the interaction between assessment and cohort (linear), all nested
under the levels of the factor physical tness component. For the random factor school, we included variance components of physical tness component, sex,
age, and the interaction between assessment and cohort (linear), with assessment and cohort as well as their interaction nested under the levels of physical
tness component. Correlation parameters were included for all variance components except for sex. For the random factor child, we included physical tness
component-related variance components and correlation parameters. In line with earlier practice, we interpret effects with|z| ≥ 2.0 as statistically signicant.
Results
The prole of results is visualized in Fig.1, displaying physical tness for the six tness components by age and time of assessment (i.e., rst vs. second
school term). A table of corresponding means and standard deviations in the original task metrics is available as Table S1 in the Supplement. LMM-based
inferential xed effect estimates and associated standard errors and z-values are assembled in Table2; variance components and correlation parameters
related to the random effects child and school are shown in Table3. Figure2, nally, provides visualizations of partial effect predictions based on LMM
parameters with a focus on RDD effects. In the following, we report results for each of the six components with reference to Fig.1, Table2, and Fig.2.
Page 6/17
Table 2
Fixed effect estimates, standard errors and z-values of the linear mixed
model
b
SE z
Intercept -0.079 0.016 -5.07
E vs. CSL 0.093 0.029 3.18
C vs. SL 0.059 0.038 1.56
S vs. L -0.024 0.032 -0.75
ECSL vs. U 0.079 0.028 2.83
ECSL vs. F -0.130 0.026 -5.08
Cardiorespiratory endurance (6-min run)
Assessment 0.040 0.041 0.97
Age 2011–2015 (linear) 0.097 0.014 6.85
Δ Age 2009–2010 (linear) -0.036 0.027 -1.33
Sex 0.514 0.008 61.50
Assessment x sex -0.007 0.015 -0.49
Cohort 2011–2015 (linear) -0.033 0.015 -2.15
Δ Cohort 2009–2010 (linear) 0.044 0.038 1.17
Cohort 2009–2015 (quadratic) 0.005 0.003 1.81
Coordination (star-run)
Assessment 0.144 0.046 3.17
Age 2011–2015 (linear) 0.317 0.014 22.56
Δ Age 2009–2010 (linear) -0.021 0.026 -0.79
Sex 0.237 0.008 28.62
Assessment x sex 0.005 0.015 0.35
Cohort 2011–2015 (linear) -0.023 0.016 -1.41
Δ Cohort 2009–2010 (linear) 0.023 0.043 0.55
Cohort 2009–2015 (quadratic) 0.004 0.003 1.57
Speed (20-m sprint)
Assessment 0.097 0.046 2.10
Age 2011–2015 (linear) 0.262 0.014 18.52
Δ Age 2009–2010 (linear) -0.050 0.027 -1.90
Sex 0.307 0.008 36.75
Assessment x Sex -0.013 0.015 -0.90
Cohort 2011–2015 (linear) 0.032 0.016 2.09
Δ Cohort 2009–2010 (linear) -0.105 0.040 -2.66
Cohort 2009–2015 (quadratic) -0.004 0.003 -1.74
PowerLOW (standing long jump)
Assessment -0.006 0.037 -0.16
Age 2011–2015 (linear) 0.227 0.015 15.59
Δ Age 2009–2010 (linear) -0.003 0.027 -0.10
Sex 0.372 0.009 43.30
Assess x sex 0.010 0.015 0.68
Cohort 2011–2015 (linear) 0.049 0.015 3.35
Δ Cohort 2009–2010 (linear) -0.189 0.032 -5.87
Page 7/17
b
SE z
Cohort 2009–2015 (quadratic) -0.010 0.003 -3.78
PowerUP (ball-push test)
Assessment 0.222 0.037 5.98
Age 2011–2015 (linear) 0.519 0.013 38.78
Δ Age 2009–2010 (linear) -0.031 0.025 -1.23
Sex 0.645 0.008 81.12
Assessment x Sex 0.053 0.014 3.76
Cohort 2011–2015 (linear) 0.028 0.014 1.97
Δ Cohort 2009–2010 (linear) -0.082 0.035 -2.33
Cohort 2009–2015 (quadratic) -0.004 0.002 -1.51
Flexibility (stand-and-reach test)
Assessment -0.048 0.033 -1.43
Age 2011–2015 (linear) -0.045 0.015 -3.03
Δ Age 2009–2010 (linear) -0.004 0.028 -0.14
Sex -0.429 0.009 -49.20
Assessment x sex -0.069 0.016 -4.41
Cohort 2011–2015 (linear) -0.033 0.015 -2.27
Δ Cohort 2009–2010 (linear) 0.019 0.031 0.61
Cohort 2009–2015 (quadratic) 0.006 0.003 2.25
Δ Cohort/Age 2009–2010 (linear) = change in linear slope from Cohort/Age 2011–2015. Endurance = cardiorespiratory endurance (i.e., 6-min run),
coordination = star-run, Speed = 20-m linear sprint, PowerLOW = lower limbs muscle power (i.e., standing long jump), PowerUP = upper limbs muscle power (i.e.,
ball-push test), Flexibility = stand-and-reach test. E vs. CSL = cardiorespiratory endurance vs. coordination, speed and powerLOW, C vs. SL = coordination vs.
speed and powerLOW, S vs. L = speed vs. powerLOW, ECSL vs. U = cardiorespiratory endurance, coordination, speed and powerLOW vs. powerUP, ECSL vs. F =
cardiorespiratory endurance, coordination, speed and powerLOW vs. exibility. Bold = |z| > 2.0, linear mixed model random factors: schools (469) and children
(75,362), observations = 440,139. For estimates of variance components and correlation parameters, see Table3.
Assessment and cohort effects
Cardiorespiratory endurance
. There was no evidence for a better 6-min run performance in the second compared to the rst school term at the critical date for
assessment (i.e., no signicant assessment effect in 2010.5 for 8.5-year-old children). As expected and reported previously 8,18, there was a small but
signicant decline in the 6-min run performance for the 2011–2015 cohorts (
b
= -0.033, z = -2.15, see Fig.2), with no evidence for an interaction of the cohort
trend with assessment.
Coordination
. As shown in Figs.1 and 2, 8.5-year-old children exhibited better star-run performance when tested in the second compared to the rst school
term (
b
= 0.144, z = 3.17). Figure2 depicts children’s physical tness by cohort and shows a discontinuity of cohort trends at 2010.5 (i.e., between cohorts with
assessment in rst and second school term), indicating an assessment effect on test performance. Grey points are observed cohort means, black points are
partial effects of physical tness test, cohort, and assessment (i.e., without effects of age). As xed effect estimates describe changes in units of standard
deviation,
b
*SD translates these effects into their original test metric. For the star-run, the positive assessment effect translates to a performance increase of
0.042 m/s from rst to second school term. As shown in Fig.2, there was no evidence for linear or quadratic cohort trends of the star-run test performance.
Speed
. Children tested in the second school term outperformed children tested in the rst school term in the 20-m sprint (
b
= 0.097, z = 2.10), which translates
to a performance difference of 0.04 m/s. Again, this effect is visible in the discontinuity of cohort trends at 2010.5 shown in Fig.2. As expected, speed
increased linearly in cohorts 2011–2015 (
b
= 0.032, z = 2.09), but the cohort effect differed between cohorts with assessment in second (2009 + 2010) and rst
(2011–2015) school term (
b
= -0.105, z = -2.66). A re-parameterized LMM with the linear cohort trend nested under the levels of assessment showed that 20-m
sprint performance declined from 2009 to 2010 (
b
= -0.073, z = -2.04). Details on this LMM are reported in script
Assessment.qmd
in the OSF repository.
PowerLOW
. As shown in Figs.1 and 2, there was no evidence for a change of performance in the standing long jump between rst and second school
semester estimated at 2010.5 for 8.5-year-old children. Standing long jump performance was characterized by a linear increase (
b
= 0.049, z = 3.35) during
cohorts 2011–2015 and an overall quadratic decline (
b
= -0.010, z = -3.78). Linear cohort trends differed between assessments (
b
= -0.189, z = -5.87). A re-
parameterized LMM with the linear cohort trend nested under the levels of assessment (i.e., rst and second school term) showed a linear decrease of
standing long jump performance before 2010.5 (
b
= -0.140, z = -5.28).
Page 8/17
PowerUP
. Children’s performance in the ball-push test was approximately 17 cm better in the second compared to the rst school term (
b
= 0.222, z = 5.98)
when estimated at 2010.5. Linear and quadratic cohort trends were not signicant, but there was an interaction between the linear cohort effect and
assessment time (
b
= -0.082, z = -2.33). In a re-parameterized LMM with the linear cohort trend nested under the levels of assessment (i.e., rst and second
school semester), neither cohort trend was signicant, but there was a nonsignicant decreasing trend for cohorts 2009 and 2010 (
b
= -0.055, z = -1.86) and a
nonsignicant increasing trend between 2011 and 2015 (
b
= 0.027, z = 1.93).
Flexibility
. There was no evidence for a signicant main effect of assessment on stand-and-reach performance. As shown in Fig.2, there was a small linear
decline of the stand-and-reach test performance between 2011 and 2015 (
b
= -0.033, z = -2.27), followed by a plateau (
b
= 0.006, z = 2.25). There was no
evidence for an interaction between linear cohort trend and assessment.
Age and sex effects and interactions with assessment.
How do age and sex effects of this extended sample align with previously reported results 8,13? As
shown in Fig.1, performance increased linearly with age in cohorts 2011–2015 for the ve physical tness components cardiorespiratory endurance (
b =
0.097, z = 6.85), coordination (
b
= 0.317, z = 22.56), speed (
b
= 0.262, z = 18.52), powerLOW (
b
= 0.227, z = 15.59), powerUP (
b
= 0.519, z = 38.78), as reported in
previous studies 8,13. Going beyond earlier results, there was no evidence that age-related development in these physical tness components differed between
rst and second school term. Interestingly, exibility was the only physical tness component with a small negative age effect (
b
= -0.045, z = -3.03) in cohorts
2011–2015; this age effect also did not differ signicantly between assessments. Boys outperformed girls in ve of six physical tness tests assessing
cardiorespiratory endurance (
b
= 0.514, z = 61.50), coordination (
b
= 0.237, z = 28.62), speed (
b
= 0.307, z = 36.75), powerLOW (
b
= 0.372, z = 43.30), and
powerUP (
b
= 0.645, z = 81.12) in cohorts 2011–2015. For the rst four physical tness components, there was no evidence that sex effects differed between
assessments. For powerUP, however, there was a signicant assessment x sex interaction (
b
= 0.053, z = 3.76), indicating that boys’ performance improved
more than girls’ from rst to second school semester. Flexibility was the only physical tness component where girls outperformed boys (
b
= -0.429, z = -49.20,
see gure S1 in the Supplements) in cohorts 2011–2015. There was a signicant assessment x sex interaction for exibility (
b
= -0.069, z = -4.41), indicating
that the girls’ performance advantage was slightly larger in the second compared to the rst school term.
Differences between physical tness components in their assessment effects
. Are physical tness components differently affected by time of assessment? A
re-parameterized version of the LMM estimated the interactions of the physical tness component
contrasts
with assessment time. Details on this re-
parameterized LMM are reported in the OSF repository. The performance increase from rst to second school term was (1) larger for speed than for powerLOW
(S vs. L,
b
= 0.105, z = 2.03), (2) larger for powerUP than for the mean of cardiorespiratory endurance, coordination, speed, and powerLOW (ECSL vs. U,
b
=
-0.151, z = -3.68), and (3) larger for the mean of cardiorespiratory endurance, coordination, speed, and powerLOW than for exibility (ECSL vs. F,
b
= 0.115, z =
2.83). Differences in assessment effects between cardiorespiratory endurance and the mean of coordination, speed, and powerLOW (E vs. CSL), or between
coordination and the mean of speed and powerLOW (C vs. SL) were not signicant. There were two signicant three-way interactions involving physical
tness contrasts, assessment, and sex (ECSL vs. U x assessment x sex:
b
= -0.054, z = -3.72; ECSL vs. F x assessment x sex:
b
= -0.068, z = 4.01). The rst
interaction indicates that for both, boys and girls, the performance gain from rst to second school semester was larger for powerUP than for the mean of
cardiorespiratory endurance, coordination, speed, and powerLOW, but the difference in the performance increase between the tness components was larger
for boys. Similarly, the larger increase from rst to second semester of the mean of cardiorespiratory endurance, coordination, speed, and powerLOW than for
exibility was slightly larger for boys than for girls.
Variance components (VCs) and correlation parameters (CPs)
Table3 shows child- and school-related VCs and CPs. Conceptually, CPs represent interactions between the random factor and its associated VCs, or
interactions between two effects when adjusting for all xed effects.
Replication of previous ndings and their extension by exibility.
Variance of physical tness test contrasts were larger for children (i.e., VC range between
0.363 and 0.793) than for schools (i.e., VCs range between 0.149 and 0.519). CPs were in agreement with previous results that the rst four tests represent a
well-dened latent construct of physical tness, while powerUP is correlated much weaker with this cluster 8. The newly added component exibility was also
weakly correlated with the rst four physical tness components. In a re-parameterized LMM with physical tness
levels
instead of
contrasts
in the random
effect structure, cardiorespiratory endurance, coordination, speed and powerLOW correlated with CPs between 0.57 and 0.77 on the child level, but correlations
between powerUP and exibility with the other tness components were lower (
r
between 0.24 and 0.49 for powerUP and between 0.20 and 0.34 for exibility).
Details on this LMM are reported in script
Assessment.qmd
in the OSF repository. As shown in Table3, for both children and schools, the contrasts ECSL_U
and ECSL_F correlated positively with the intercept (child: 0.25 and 0.21, school: 0.25 and 0.49, respectively), indicating that children and schools with higher
average tness estimated at 2010.5 showed better performance in ECSL (cardiorespiratory endurance, coordination, speed and powerLOW) than in powerUP
or exibility. Further, as reported previously, the schools’ intercept and their age effect correlated positively (
r
= 0.33) 8, indicating that tter schools tended to
exhibit larger cross-sectional age gains.
Differences between schools in their assessment effects and cohort trends.
Schools differed in their assessment effects (i.e., VCs between 0.219 and 0.606)
and in their linear cohort trends between 2011 and 2015 (i.e., VCs between 0.009 and 0.035). For both, assessment effects and cohort trends, CPs were in line
with the “law of diminishing returns” 38,39, indicating that schools with higher average performances in 2010.5 (1) exhibited smaller assessment effects and
(2) were less likely to exhibit secular physical tness gains between 2011 and 2015. Schools with lower average performances and possibly less active
children may have had “more to gain” by increased amount of structured exercise and thus exhibited larger assessment effects. There were also positive CPs
between assessment effects and the linear cohort trends between 2011 and 2015 of the corresponding physical tness components (CPs between 0.41 and
0.53), indicating that schools promoting larger assessment effects were more likely to exhibit positive cohort trends between 2011 and 2015. Finally, schools
also differed in the magnitude of change in linear cohort slopes before and after 2010.5 (i.e., Δ Cohort 2009–2010 [linear], VCs between 0.139 and 0.467). The
CPs between the change in cohort slope and assessment or cohort effects must be interpreted with caution. We had no explicit prediction of their direction;
Page 9/17
they may arise from an assessment effect limiting the range of an associated cohort effect (or vice versa). Further details of CPs presented in Table3 are
documented in the Supplementary Material of this article.
Page 10/17
Table 3
Child- and school-related variance components and correlation parameters of the linear mixe
VC CP
Int E_CSL C_SL S_L ECSL_U ECSL_F Assessment Cohort 2011–2015 (line
E C S pL pU F E C S
Child
Int 0.315
E_CSL 0.407 -0.15
C_SL 0.370 -0.09 + 0.01
S_L 0.363 -0.09 + 0.07 +
0.05
ECSL_U 0.530 +
0.25 + 0.16 -0.07 +
0.06
ECSL_F 0.793 +
0.21 + 0.02 -0.05 +
0.11 + 0.26
School
Int 0.061
E_CSL 0.273 -0.03
C_SL 0.519 +
0.24 -0.15
S_L 0.337 +
0.23 -0.08 -0.12
ECSL_U 0.229 +
0.25 -0.04 +
0.21 +
0.21
ECSL_F 0.149 +
0.49 + 0.02 +
0.25 +
0.12 + 0.42
Assessment
E 0.433 -0.22 -0.34 -0.02 +
0.02 + 0.01 -0.16
C 0.576 -0.20 + 0.23 -0.36 +
0.03 -0.11 -0.14 -0.08
S 0.606 -0.17 + 0.13 +
0.15 -0.37 -0.21 -0.11 +
0.04 +
0.19
pL 0.320 -0.07 + 0.09 +
0.23 +
0.34 + 0.12 -0.11 +
0.10 +
0.16 +
0.15
pU 0.348 -0.10 + 0.02 +
0.16 +
0.03 + 0.41 + 0.00 +
0.01 +
0.00 +
0.01 +
0.25
F 0.219 -0.03 -0.08 +
0.08 -0.12 + 0.10 + 0.42 +
0.10 -0.11 +
0.04 -0.20 +
0.08
Cohort
2011–2015
(linear)
E 0.020 -0.33 -0.64 -0.06 -0.06 -0.24 -0.31 +
0.41 -0.07 -0.01 -0.04 -0.12 +
0.03
C 0.035 -0.48 + 0.29 -0.72 -0.01 -0.27 -0.39 +
0.06 +
0.45 +
0.14 +
0.01 -0.08 -0.07 +
0.10
S 0.025 -0.36 + 0.23 +
0.29 -0.63 -0.22 -0.20 +
0.03 +
0.04 +
0.53 +
0.05 +
0.05 +
0.08 +
0.05 +
0.08
pL 0.011 -0.19 + 0.21 +
0.26 +
0.40 -0.04 -0.14 +
0.08 +
0.04 +
0.10 +
0.55 +
0.13 -0.06 -0.02 +
0.05 +
0.11
pU 0.016 -0.30 -0.07 +
0.04 -0.04 + 0.62 -0.05 +
0.12 -0.01 -0.06 +
0.12 +
0.44 +
0.11 +
0.06 +
0.08 +
0.15
F 0.009 -0.16 + 0.05 +
0.02 -0.19 -0.02 + 0.50 +
0.02 -0.02 +
0.16 -0.10 -0.04 +
0.46 +
0.01 +
0.08 +
0.22
Page 11/17
VC CP
Δ Cohort
2009–2010
(linear)
E 0.319 +
0.13 + 0.25 +
0.07 +
0.07 + 0.21 + 0.12 +
0.57 -0.03 -0.01 +
0.12 +
0.01 +
0.01 -0.33 -0.07 -0.03
C 0.467 +
0.27 -0.06 +
0.44 -0.00 + 0.18 + 0.29 -0.16 +
0.44 +
0.05 +
0.21 +
0.06 -0.08 -0.17 -0.43 +
0.01
S 0.368 +
0.12 -0.04 -0.09 +
0.19 -0.03 + 0.09 -0.03 +
0.15 +
0.58 +
0.11 -0.08 +
0.01 -0.01 +
0.11 -0.20
pL 0.164 +
0.21 -0.11 +
0.06 +
0.03 + 0.16 + 0.09 +
0.07 +
0.08 +
0.04 +
0.60 +
0.17 -0.13 -0.02 -0.10 -0.06
pU 0.272 +
0.23 + 0.14 +
0.12 +
0.07 -0.14 + 0.06 -0.17 +
0.07 +
0.06 +
0.04 +
0.54 +
0.03 -0.21 -0.17 -0.05
F 0.139 +
0.17 -0.12 +
0.03 -0.01 + 0.12 + 0.07 +
0.08 -0.19 -0.06 -0.17 +
0.11 +
0.67 +
0.03 -0.11 -0.09
Age 0.003 +
0.33 + 0.24 -0.09 +
0.28 + 0.07 + 0.23 -0.05 +
0.24 +
0.00 +
0.20 -0.30 -0.09 +
0.03 +
0.18 -0.06
Sex 0.003 - - - - - - - -
E = cardiorespiratory endurance (i.e., 6-min run), C = coordination (i.e., star-run), S = speed (i.e., 20-m linear sprint), pL = lower limbs muscle power (i.e.,
powerLOW, standing long jump), pU = upper limbs muscle power (i.e., powerUP, ball-push test), F = exibility (i.e., stand-and-reach test). E_CSL =
cardiorespiratory endurance vs. coordination, speed and powerLOW, C_SL = coordination vs. speed and powerLOW, S_L = speed vs. powerLOW, ECSL_U =
cardiorespiratory endurance, coordination, speed and powerLOW vs. powerUP, ECSL_F = cardiorespiratory endurance, coordination, speed and powerLOW vs.
exibility. Assessment = Assessment effect estimated at 2010.5, Cohort 2011–2015 (linear) = Linear cohort trend between 2011 and 2015, Δ Cohort 2009–
2010 (linear) = Change in linear cohort slope from linear cohort trend between 2011 and 2015. VC = variance component, CP = correlation parameter.
Theoretically relevant correlations are set in bold. LMM random factors: schools (469) and children (75,362), observations = 440,139. VC for Residual = 0.192.
Discussion
We examined effects of time of assessment in the school year on children’s physical tness using data from 75,362 German third-graders from seven cohorts.
Children were tested either in the rst or second school term of third grade in primary school. As time of assessment was confounded with age and cohort, we
used a regression discontinuity design to estimate assessment effects while adjusting for quadratic cohort trends and linear age effects on physical tness.
Children’s coordination, speed, and upper limbs muscle power were higher in the second, compared to the rst school term. Boys exhibited a larger
improvement of upper limbs muscle power from rst to second school term than girls. Upper limbs muscle power improved more from rst to second school
term than the mean of cardiorespiratory endurance, coordination, speed and lower limbs muscle power, four highly correlated physical tness components.
There was no reliable evidence for changes in cardiorespiratory endurance, powerLOW or exibility from rst to second school term.
The primary reason for better coordination, speed and upper limbs muscle power later in the school year while adjusting for children’s ages arguably is most
likely that children in the second school term were on average exposed to an additional half year of structured exercise in physical education classes than
children tested in the rst school term. Possibly, the intensity of physical activity during physical education classes in the present sample was not high enough
to lead to a signicant improvement in cardiorespiratory endurance from rst to second school term. Further, tests that improved from rst to second school
term all assessed abilities that are partly information-oriented according to the classication by Bös et al. 36 (i.e., coordination under time pressure, speed,
upper limbs muscle power), while cardiorespiratory is energetically determined. According to Caspersen et al. 40, the three physical tness components with
better performance in second than in the rst school term are skill-related. The present ndings may thus indicate that school-based physical education
classes enhance information-oriented, skill-related abilities to a greater extent than mainly energetically determined physical tness components like
cardiorespiratory endurance. Reliable improvement for upper but not lower limbs muscle power could be due to the prevalence of ball games improving
performance on ball-push test relative to activities that might benet standing long jumps in physical education.
In line with the hypothesis that physical education classes and further school sports activities may primarily target information-oriented abilities, a previous
study found better age- and sex-standardized performance of primary school children in tests assessing speed and coordination (i.e., 20-m sprint, backwards
balancing, and sideways-jumping) at the end of the school year, compared to the beginning. However, they reported better 6-min run and standing long jump
performance at the beginning of the school year after the summer holidays, suggesting an association with a summer-related increase of physical activity in
their sample 16.
In the present study, coordination was assessed by the star-run, in which children had to memorize different directions and forms of movement in a specic
order. As the star-run is associated with a high cognitive load, an improvement from rst to second school term might not only reect better tness, but also
improved executive function.
Schools differed in their assessment effects, likely related to differences in schools’ physical education lessons. In line with the “law of diminishing returns”,
schools with lower average tness at 2010.5 tended to exhibit larger assessment effects. We did not expect this result, and we would have been able to
Page 12/17
explain the opposite effect by assuming that schools with a higher average tness conduct more effective physical education classes and are located in areas
with more opportunities to be physically active, and thus may also promote larger tness gains (i.e., assessment effects) within the school year.
Future research is needed to examine which specic factors are associated with larger effects of assessment time within the school year, and there is research
that may provide some guidance: The effectiveness of physical education lessons can differ depending on their quantity and quality (e.g., teaching strategies
used and physical activity intensity) 41. In the Federal State of Brandenburg, children usually receive around three physical education lessons per week, but the
distribution of the lesson quota on different school semesters and school grades can differ between schools 42. Further, in cases of teacher shortage, the
number of physical education lessons per week might be temporarily reduced in some schools and schools can thus differ in their exact amount of physical
education classes. We do not have this information, but future studies may take into account school-specic amount and content of physical education
lessons, teaching strategies 41, or time spent in moderate-to-vigorous activity during physical education class 43,44 to examine effects of time of assessment
on children’s physical tness.
Besides effects on assessment time on physical tness, the present study tested effects of age and sex on children’s tness levels. As data from cohorts 2011
until 2015 and ve tness tests (i.e., assessing cardiorespiratory endurance, coordination, speed, powerLOW and powerUP) have been analyzed and published
previously 8, we expected to replicate age and sex differences in these tness tests. As expected, boys outperformed girls in ve tness tests assessing
cardiorespiratory endurance, coordination, speed, and muscle power. The better performance of boys in these tests is likely related to differences in body
composition and activity levels. Pre-adolescent boys exhibit lower fat and higher lean mass than pre-adolescent girls 45,46. In fact, recent analyses have shown
that after statistically adjusting for differences in body constitution (i.e., height-mass ratio), partial effects of sex on physical tness no longer favored boys 13.
There is also evidence that school-aged boys tend to exhibit higher activity levels than girls 17,47–49, indicated by higher daily step counts 49, more time in
moderate-to-vigorous physical activity 17,48 and less sedentary behavior 47,48.
Our study included data from an additional sixth tness test, namely the stand-and-reach test, assessing exibility. In line with previous research 10,11,50,51,
girls exhibited better exibility than boys. In contrast to the other tness components tested in the present study, performance in the stand-and-reach test does
not depend on energetically determined or information-oriented abilities, but reects a passive system of energy transmission and is mostly anatomically
determined 36,52. The better exibility of girls may be explained by higher body fat percentage and lower muscle mass 45,53 and resulting lower tissue density
in girls. Behavioral aspects like gender-specic sports participation might contribute to the better exibility of girls. For instance, girls may be more frequently
encouraged to participate in dance or gymnastics, while certain sports that enhance muscle tone are more popular in boys 54–56. However, a study examining
associations of sports type and motor skills in primary school children found no evidence for an advantage of children participating in dance or gymnastics in
the sit-and-reach test 57.
In line with previous research 8,13, third-graders’ age effects were linear in six tness components, and this also applied to exibility. Age gains in
cardiorespiratory endurance, coordination, speed, powerLOW and powerUP were of the same size as those reported previously 8,12,13, with the largest age gain
for powerUP and the smallest for endurance. Interestingly, exibility was the only one out of the six tness components with a small negative age effect. This
negative age effect on stand-and-reach performance may be explained by an age-related decline in sitting/standing height ratio, i.e., an increased leg length
relative to trunk length 53. In line with this assumption, performance in the sit-and-reach test was negatively associated with body height in youth aged 11 to
17 years 58,59. Other factors possibly associated with the age-related decline in exibility might be a larger femur growth rate relative to muscular and tendon
growth, or joint-specic changes like increases in bone or cartilage mass around the hip joint 60. Other studies on the development of exibility in children and
adolescents yield inconsistent results. A cross-sectional study reported a decline of exibility between the ages 11 and 17 years 58, while a longitudinal study
assessing the tness development in children between 9 and 12 years showed that in girls, exibility increased linearly, whereas for boys there was no
evidence for changes in exibility during this period 11. Other research reported no evidence for changes in exibility between the ages 4 and 17 55 or 7 and 11
years 50. In contrast to the studies mentioned above, our study included data from a large sample of children within a very small age window (i.e., 7.9 to 9.6
years). Age-related changes in exibility during this period were small and may not have been detectable in studies with smaller samples and wider age
ranges.
High correlations on the child level between tests assessing cardiorespiratory endurance, coordination, speed and powerLOW, indicating the latent construct of
physical tness, as well as lower correlations between the ball-push test assessing powerUP with the other tness tests were also replicated 8. Correlations
between performance in the stand-and-reach test and the other tness tests were lower. As mentioned above, exibility, unlike the other tness components, is
not energetically determined or information-oriented, but is classied as a passive system of energy transmission 36,52. Although exibility has been classied
as a component of health-related physical tness by Caspersen and colleagues 40, researchers have argued that exibility is less indicative of health than
other tness components 61,62, and is not part of same “physical tness” construct as tests assessing cardiorespiratory endurance, speed, muscle
power/strength, and coordination 63; it may thus be assessed with lower priority 61. In a study investigating the association between overweight and
performance in three tness tests, performance in the sit-and-reach test was not associated with children’s weight status, in contrasts to two tests assessing
cardiorespiratory endurance and muscular strength 64. Similarly, BMI percentile of fourth-graders was negatively associated with performance in all tness
tests (i.e., including the 6-min run, standing long jump, and the 20-m sprint), except for the stand-and-reach test 51. Due to its lower association with children’s
health status compared to the other physical tness tests, the stand-and-reach test was removed from the EMOTIKON test battery in 2016 and replaced by the
one-legged stance test assessing static balance.
Our study has limitations. We did not use experimental data, but tested effects of assessment time, age, sex, and cohort on children’s physical tness using
quasi-experimental observational data. Due to the lack of experimental control and randomization, one must be careful when interpreting results based on
observational data, especially when deriving recommendations for practice 65. For instance, in the present study, age, time of assessment, and cohort were
Page 13/17
confounded. The dissociation of age, assessment time, and cohort effects is a well-known challenge in developmental science 66–68. In our study, age and
assessment time are positively correlated (i.e., children are, on average, older in second than in rst school term), but, as the age ranges of the two school
terms overlap, effects of age and time of assessment could be dissociated. As shown in Fig.2 in the article and S2 in the Supplements, cohort effects were
rather small, whereas signicant assessment effects of the three physical tness components were larger. Another limitation relates to the fact that 2009 was
the rst cohort in which the EMOTIKON study was conducted state-wide in the Federal State of Brandenburg, Germany. Some of the performance differences
between cohorts 2009 and 2010 may therefore be due to factors specically associated with implementing the test protocol for the rst time in cohort 2009,
instead of due to secular physical tness trends. If this is the case, using extrapolations of cohort effects from cohorts with assessment in second school term
(2009–2010) to estimate the assessment effect may slightly over- or underestimate the effect.
As children in our study were tested either in the rst (i.e., fall) or second half (i.e., winter and spring) of the school year, assessment-time related tness
differences may be associated with seasonal variations in physical activity or anthropometric measures. There is evidence for associations of children’s
activity levels with seasonal variables, like temperature 48,69,70, precipitation 48,69 and hours of daylight 48,70. Some research indicates that children tend to
exhibit higher activity levels and less sedentary behavior in spring, compared to fall or winter 17,71, and there is evidence that children’s performance in several
physical tness tests is better in summer than in winter 50. Further, body composition may differ by season 72, and some studies have even suggested
seasonal variations in children’s height gains 72–74. Future research on the effects of assessment time on children’s physical tness may thus also assess
body constitution or physical activity levels.
When testing children’s physical tness, timing of assessment within the school year matters. Performance in tests assessing coordination, speed, and upper
limbs muscle power was better in the second, compared to the rst half of the school year, with boys exhibiting a larger increase of upper limbs muscle power
than girls. We found no evidence of changes in cardiorespiratory endurance, lower limbs muscle power and exibility from rst to second school term. Timing
of assessment of physical tness should be considered when generating norm values and when comparing children’s physical tness to such norms.
Declarations
Acknowledgements
We thank all physical education teachers who conducted the EMOTIKON tests and all children who participated in the assessments. We used extensions of
the MixedModels.jl software by Phillip Alday, Douglas Bates, and Dave Kleinschmidt to t the linear mixed models and to obtain partial effect predictions.
Author contributions
P.T., K.G., and R.K., contributed to conception, design and interpreted the results; K.G. organized data collection; P.T. and R.K. carried out data analysis; P.T.
wrote the rst draft of the manuscript; all authors were involved in iterative revisions; all authors provided nal approval of the version to be published and
agreed to be accountable for all aspects of the work.
Availability of data and materials
Data as well as R and Julia scripts are available in the Open Science Framework (OSF) repository: https://osf.io/4vj2q/.
Funding
The study was supported by the Ministry of Education, Youth, and Sport of the Federal State Brandenburg, Germany and the University of Potsdam, Germany.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests
The authors declare no competing interests.
Ethics approval and consent to participate
The EMOTIKON project is mandated and approved by the Ministry of Education, Youth and Sport of the Federal State of Brandenburg, Germany. According to
the Brandenburg School Law, participation is mandatory for all public primary schools in the Federal State of Brandenburg, Germany21. Written consent to
participate is not required. Research was conducted in accordance with the latest Declaration of Helsinki22 and the Brandenburg School Law21.
References
1. García-Hermoso, A., Ramírez-Campillo, R. & Izquierdo, M. Is Muscular Fitness Associated with Future Health Benets in Children and Adolescents? A
Systematic Review and Meta-Analysis of Longitudinal Studies.
Sports Med. Auckl. NZ
49, 1079–1094 (2019).
Page 14/17
2. García-Hermoso, A., Ramírez-Vélez, R., García-Alonso, Y., Alonso-Martínez, A. M. & Izquierdo, M. Association of Cardiorespiratory Fitness Levels During
Youth With Health Risk Later in Life: A Systematic Review and Meta-analysis.
JAMA Pediatr.
174, 952–960 (2020).
3. Mintjens, S.
et al.
Cardiorespiratory Fitness in Childhood and Adolescence Affects Future Cardiovascular Risk Factors: A Systematic Review of
Longitudinal Studies.
Sports Med. Auckl. NZ
48, 2577–2605 (2018).
4. Raghuveer, G.
et al.
Cardiorespiratory Fitness in Youth: An Important Marker of Health: A Scientic Statement From the American Heart Association.
Circulation
142, e101–e118 (2020).
5. Bermejo-Cantarero, A.
et al.
Relationship between both cardiorespiratory and muscular tness and health-related quality of life in children and
adolescents: a systematic review and meta-analysis of observational studies.
Health Qual. Life Outcomes
19, 127 (2021).
. Meijer, A.
et al.
Cardiovascular tness and executive functioning in primary school-aged children.
Dev. Sci.
24, e13019 (2021).
7. van der Niet, A. G., Hartman, E., Smith, J. & Visscher, C. Modeling relationships between physical tness, executive functioning, and academic achievement
in primary school children.
Psychol. Sport Exerc.
15, 319–325 (2014).
. Fühner, T., Granacher, U., Golle, K. & Kliegl, R. Age and sex effects in physical tness components of 108,295 third graders including 515 primary schools
and 9 cohorts.
Sci. Rep.
11, 17566 (2021).
9. Ortega, F. B.
et al.
European tness landscape for children and adolescents: updated reference values, tness maps and country rankings based on nearly
8 million test results from 34 countries gathered by the FitBack network.
Br. J. Sports Med.
57, 299–310 (2023).
10. Tomkinson, G. R.
et al.
European normative values for physical tness in children and adolescents aged 9–17 years: results from 2 779 165 Eurot
performances representing 30 countries.
Br. J. Sports Med.
52, 1445–1456 (2018).
11. Golle, K., Muehlbauer, T., Wick, D. & Granacher, U. Physical Fitness Percentiles of German Children Aged 9–12 Years: Findings from a Longitudinal Study.
PLOS ONE
10, e0142393 (2015).
12. Teich, P.
et al.
Covid Pandemic Effects on the Physical Fitness of Primary School Children: Results of the German EMOTIKON Project.
Sports Med. - Open
9, 77 (2023).
13. Bähr, F., Wöhrl, T., Teich, P., Puta, C. & Kliegl, R. Impact of Height-to-Mass Ratio on Physical Fitness of German Third-Grade Children.
Unpubl. Prepr.
(2023)
doi:https://osf.io/ztyfp/.
14. Fu, Y., Brusseau, T. A., Hannon, J. C. & Burns, R. D. Effect of a 12-Week Summer Break on School Day Physical Activity and Health-Related Fitness in Low-
Income Children from CSPAP Schools.
J. Environ. Public Health
2017, 9760817 (2017).
15. Yin, Z., Moore, J. B., Johnson, M. H., Vernon, M. M. & Gutin, B. The Impact of a 3-Year After-School Obesity Prevention Program in Elementary School
Children.
Child. Obes.
8, 60–70 (2012).
1. Drenowatz, C., Ferrari, G. & Greier, K. Changes in Physical Fitness during Summer Months and the School Year in Austrian Elementary School Children—A
4-Year Longitudinal Study.
Int. J. Environ. Res. Public. Health
18, 6920 (2021).
17. Hjorth, M. F.
et al.
Seasonal variation in objectively measured physical activity, sedentary time, cardio-respiratory tness and sleep duration among 8–11
year-old Danish children: a repeated-measures study.
BMC Public Health
13, 808 (2013).
1. Fühner, T., Kliegl, R., Arntz, F., Kriemler, S. & Granacher, U. An Update on Secular Trends in Physical Fitness of Children and Adolescents from 1972 to 2015:
A Systematic Review.
Sports Med.
51, 303–320 (2021).
19. Lee, D. S. & Lemieux, T. Regression Discontinuity Designs in Economics.
J. Econ. Lit.
48, 281–355 (2010).
20. Thistlethwaite, D. L. & Campbell, D. T. Regression-discontinuity analysis: An alternative to the ex post facto experiment.
J. Educ. Psychol.
51, 309–317
(1960).
21. Ministerium für Bildung, Jugend und Sport.
Gesetz über die Schulen im Land Brandenburg
. (2021).
22. World Medical Association. World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects.
JAMA
310, 2191–2194 (2013).
23. Fühner, T., Granacher, U., Golle, K. & Kliegl, R. Effect of timing of school enrollment on physical tness in third graders.
Sci. Rep.
12, 7801 (2022).
24. Teich, P., Fühner, T., Granacher, U. & Kliegl, R. Physical tness of primary school children differs depending on their timing of school enrollment.
Sci. Rep.
13, 1–16 (2023).
25. Bös, K.
Deutscher Motorik-Test 6 - 18 (DMT 6 - 18)
. (Czwalina, 2009).
2. Schulz, S.
Die Reliabilität des Sternlaufs und des Medizinballstoßes im EMOTIKON-Test [The reliability of the star-coordination-run test and the 1-kg
medicine ball-push test: Physical tness tests used in the EMOTIKON study].
(University of Potsdam, 2013).
27. Fernandez-Santos, J. R., Ruiz, J. R., Cohen, D. D., Gonzalez-Montesinos, J. L. & Castro-Piñero, J. Reliability and Validity of Tests to Assess Lower-Body
Muscular Power in Children.
J. Strength Cond. Res.
29, 2277–2285 (2015).
2. R Core Team. R: The R Project for Statistical Computing. https://www.r-project.org/. (2023).
29. RStudio Team. RStudio: Integrated Development for R. http://www.rstudio.com/. (2023).
30. Bezanson, J., Edelman, A., Karpinski, S. & Shah, V. B. Julia: A Fresh Approach to Numerical Computing.
SIAM Rev.
59, 65–98 (2017).
31. Wickham, H.
et al.
Welcome to the Tidyverse.
J. Open Source Softw.
4, 1686 (2019).
32. Bates, D.
et al.
JuliaStats/MixedModels.jl: v4.7.3. (2022) doi:10.5281/zenodo.7153199.
33. Alday, P. & Bates, D. palday/MixedModelsMakie.jl: v0.3.24. (2023) doi:10.5281/zenodo.8125544.
34. Alday, P. palday/MixedModelsExtras.jl: v1.1.0. (2023) doi:10.5281/zenodo.7979945.
35. Box, G. E. P. & Cox, D. R. An Analysis of Transformations.
J. R. Stat. Soc. Ser. B Methodol.
26, 211–252 (1964).
Page 15/17
3. Bös, K.
Handbuch Sportmotorischer Tests
. (Hogrefe, 1987).
37. Bates, D., Kliegl, R., Vasishth, S. & Baayen, H. Parsimonious Mixed Models.
arXiv:1506.04967
(2015).
3. Haskell, W. L. J.B. Wolffe Memorial Lecture. Health consequences of physical activity: understanding and challenges regarding dose-response.
Med. Sci.
Sports Exerc.
26, 649–660 (1994).
39. Warburton, D. E. R., Taunton, J., Bredin, S. S. D. & Isserow, S. H. The risk-benet paradox of exercise.
Br. Columbia Med. J.
58, 210–218 (2016).
40. Caspersen, C. J., Powell, K. E. & Christenson, G. M. Physical activity, exercise, and physical tness: denitions and distinctions for health-related research.
Public Health Rep.
100, 126–131 (1985).
41. García-Hermoso, A.
et al.
Association of Physical Education With Improvement of Health-Related Physical Fitness Outcomes and Fundamental Motor
Skills Among Youths: A Systematic Review and Meta-analysis.
JAMA Pediatr.
174, e200223 (2020).
42. Ministerium für Bildung, Jugend und Sport.
Verordnung über den Bildungsgang der Grundschule (Grundschulverordnung - GV)
. (2023).
43. Ha, A. S., Lonsdale, C., Lubans, D. R. & Ng, J. Y. Y. Increasing Students’ Activity in Physical Education: Results of the Self-determined Exercise and Learning
For FITness Trial.
Med. Sci. Sports Exerc.
52, 696–704 (2020).
44. Wong, L. S., Gibson, A.-M., Farooq, A. & Reilly, J. J. Interventions to Increase Moderate-to-Vigorous Physical Activity in Elementary School Physical
Education Lessons: Systematic Review.
J. Sch. Health
91, 836–845 (2021).
45. Kirchengast, S. Gender Differences in Body Composition from Childhood to Old Age: An Evolutionary Point of View.
J. Life Sci.
2, 1–10 (2010).
4. Wells, J. C. K. Sexual dimorphism of body composition.
Best Pract. Res. Clin. Endocrinol. Metab.
21, 415–430 (2007).
47. Burchartz, A.
et al.
Impact of weekdays versus weekend days on accelerometer measured physical behavior among children and adolescents: results from
the MoMo study.
Ger. J. Exerc. Sport Res.
52, 218–227 (2022).
4. Kharlova, I.
et al.
The Weather Impact on Physical Activity of 6-12 Year Old Children: A Clustered Study of the Health Oriented Pedagogical Project (HOPP).
Sports Basel Switz.
8, 9 (2020).
49. Rahman, S., Maximova, K., Carson, V., Jhangri, G. S. & Veugelers, P. J. Stay in or play out? The inuence of weather conditions on physical activity of
grade 5 children in Canada.
Can. J. Public Health.
110, 169–177 (2019).
50. Augste, C. & Künzell, S. Seasonal variations in physical tness among elementary school children.
J. Sports Sci.
32, 415–423 (2014).
51. Drenowatz, C.
et al.
Association of Body Weight and Physical Fitness during the Elementary School Years.
Int. J. Environ. Res. Public. Health
19, 3441
(2022).
52. Bös, K. & Mechling, H.
Dimensionen Sportmotorischer Leistungen
. (Hofmann-Verlag GmbH & Co. KG, 1983).
53. Malina, R. M., Bouchard, C. & Bar-Or, O.
Growth, Maturation, and Physical Activity
. (Human Kinetics, 2004).
54. Nogueira, H.
et al.
The environment contribution to gender differences in childhood obesity and organized sports engagement.
Am. J. Hum. Biol.
32,
e23322 (2020).
55. Woll, A., Kurth, B.-M., Opper, E., Worth, A. & Bös, K. The ‘Motorik-Modul’ (MoMo): physical tness and physical activity in German children and adolescents.
Eur. J. Pediatr.
170, 1129–1142 (2011).
5. Resaland, G. K.
et al.
Physical activity preferences of 10-year-old children and identied activities with positive and negative associations to
cardiorespiratory tness.
Acta Paediatr. Oslo Nor. 1992
108, 354–360 (2019).
57. Opstoel, K.
et al.
Anthropometric Characteristics, Physical Fitness and Motor Coordination of 9 to 11 Year Old Children Participating in a Wide Range of
Sports.
PLOS ONE
10, e0126282 (2015).
5. Bustamante Valdivia, A., Maia, J. & Nevill, A. Identifying the ideal body size and shape characteristics associated with children’s physical performance
tests in Peru.
Scand. J. Med. Sci. Sports
25, e155–e165 (2015).
59. Nevill, A., Tsiotra, G., Tsimeas, P. & Koutedakis, Y. Allometric Associations between Body Size, Shape, and Physical Performance of Greek Children.
Pediatr.
Exerc. Sci.
21, 220–232 (2009).
0. Hawkins, D. & Metheny, J. Overuse injuries in youth sports: biomechanical considerations:
Med. Sci. Sports Exerc.
33, 1701–1707 (2001).
1. Nuzzo, J. L. The Case for Retiring Flexibility as a Major Component of Physical Fitness.
Sports Med.
50, 853–870 (2020).
2. Ruiz, J. R.
et al.
Predictive validity of health-related tness in youth: a systematic review.
Br. J. Sports Med.
43, 909–923 (2009).
3. Utesch, T.
et al.
Die Überprüfung der Konstruktvalidität des Deutschen Motorik-Tests 6-18 für 9- bis 10-Jährige.
Z. Für Sportpsychol.
22, 77–90 (2015).
4. Casonatto, J.
et al.
Association between health-related physical tness and body mass index status in children.
J. Child Health Care
20, 294–303 (2016).
5. Brady, A. C., Grin, M. M., Lewis, A. R., Fong, C. J. & Robinson, D. H. How Scientic Is Educational Psychology Research? The Increasing Trend of
Squeezing Causality and Recommendations from Non-intervention Studies.
Educ. Psychol. Rev.
35, 37 (2023).
. Baltes, P. B. Longitudinal and Cross-Sectional Sequences in the Study of Age and Generation Effects.
Hum. Dev.
11, 145–171 (1968).
7. Fosse, E. & Winship, C. Analyzing Age-Period-Cohort Data: A Review and Critique.
Annu. Rev. Sociol.
45, 467–492 (2019).
. Schaie, K. W. Beyond calendar denitions of age, time, and cohort: The general developmental model revisited.
Dev. Rev.
6, 252–277 (1986).
9. Edwards, N. M.
et al.
Outdoor temperature, precipitation, and wind speed affect physical activity levels in children: a longitudinal cohort study.
J. Phys.
Act. Health
12, 1074–1081 (2015).
70. Harrison, F.
et al.
Weather and children’s physical activity; how and why do relationships vary between countries?
Int. J. Behav. Nutr. Phys. Act.
14, 74
(2017).
Page 16/17
71. Atkin, A. J., Sharp, S. J., Harrison, F., Brage, S. & Van Sluijs, E. M. F. Seasonal Variation in Children’s Physical Activity and Sedentary Time.
Med. Sci. Sports
Exerc.
48, 449–456 (2016).
72. Dalskov, S.-M.
et al.
Seasonal variations in growth and body composition of 8-11-y-old Danish children.
Pediatr. Res.
79, 358–363 (2016).
73. Gelander, L., Karlberg, J. & Albertsson-Wikland, K. Seasonality in lower leg length velocity in prepubertal children.
Acta Paediatr.
83, 1249–1254 (1994).
74. Mirwald, R. L. & Bailey, D. A. Seasonal height velocity variation in boys and girls 8–18 years.
Am. J. Hum. Biol.
9, 709–715 (1997).
Figures
Figure 1
Physical tness by age and time of assessment (rst vs. second school term). Points are binned child means with 95% CIs. Endurance = cardiorespiratory
endurance (i.e., 6-min run), Coordination = star-run, Speed = 20-m linear sprint, PowerLOW = lower limbs muscle power (i.e., standing long jump), PowerUP =
upper limbs muscle power (i.e., ball-push test), Flexibility = stand-and-reach test. Age was centered at 8.5 years (indicated by vertical line).
Page 17/17
Figure 2
Physical tness by cohort and time of assessment. The vertical line at 2010.5 separates cohorts with assessment in second school term (i.e., 2009 and 2010)
from cohorts with assessment in rst school term (i.e., cohorts 2011 – 2015). Grey points show zero-order cohort means with 95% CIs. Black points and lines
show partial effect predictions with effects of physical tness test, cohort, and assessment. Differences between black partial effect predictions and grey zero-
order means in cohorts 2011 – 2015 are related to differences between children in physical tness and between schools in physical tness contrasts,
assessment and cohort effects. The main source of the difference between black partial effect predictions and grey zero-order means for the 2009–2010
cohorts is statistical adjustment for children’s older age in the 2009–2010 cohorts (i.e., assessment in second school term). Endurance = cardiorespiratory
endurance (i.e., 6-min run), Coordination = star-run, Speed = 20-m linear sprint, PowerLOW = lower limbs muscle power (i.e., standing long jump), PowerUP =
upper limbs muscle power (i.e., ball-push test), Flexibility = stand-and-reach test.
Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.
AssessmentSupplements.pdf