A narrative review of stability and change in the mental health of children who grow up in family-based out-of-home care

  • Jeugdbescherming west

Abstract and Figures

The present review sought to address the following questions: What evidence is there that long-term, family-based out-of-home care (OOHC) has a general, population-wide effect on children’s mental health such that it is generally reparative or generally harmful? Does entry into long-term OOHC affect children’s mental health, as evidenced by prospective changes over the first years in care? And, is the reparative potential of long-term, family-based OOHC moderated by children’s age at entry into care? Fourteen studies were identified for review. We found no consistent evidence that family-based OOHC exerts a general, population-wide effect on the mental health of children in care; or that entry into care has an initial effect on children’s mental health; or that children’s age at entry into care moderates their subsequent mental health trajectories. Instead, several longitudinal studies have found that sizable proportions of children in care manifest meaningful improvement in their mental health over both short- and long-term time frames and that similarly sizable proportions experience meaningful deterioration in their mental health. Rather than asking whether long-term, family-based care is generally reparative or harmful for the development of previously maltreated children, future investigations should instead focus on identifying the systemic and interpersonal characteristics of care that promote and sustain children’s psychological development throughout childhood—and those characteristics that are developmentally harmful (i.e., for which children is the experience of care beneficial, and for which children is it not?). The review concludes with recommendations for the design of improved cohort studies that can address these questions.
With increasing numbers of children growing up in long-term out-of-home care (OOHC), gov-
ernments and children’s agencies need better information about how the experience of growing up
in care by children with prior exposure to severe social adversity affects their psychological
development and well-being. The present article seeks to add to this information, by reviewing
longitudinal studies that measured changes in children’s mental health while residing in long-term,
family-based (i.e., foster and kinship) OOHC. Estimating the reparative and harm potentials of
long-term, family-based OOHC requires an understanding of the relative long-term impact of pre-
care and within-care experiences, including complex transactional mechanisms. The most critical
developmental consequence of children’s early exposure to severe social adversity is poor mental
health. Numerous cross-sectional studies have established that children in care manifest high mean
levels and rates of mental health difficulties. Though rates vary a little by survey and location, up to
half of such children have clinical-level mental health difficulties, and another 20–25%have
difficulties approaching clinical significance (Oswald, Heil, & Goldbeck, 2010).
In jurisdictions where children predominantly enter long-term care following severe and per-
sistent maltreatment, a child’s age at entry into care approximates their length of post-birth
exposure to chronic and severe maltreatment. Furthermore, a child’s “age at entry into care”
strongly predicts their subsequent mental health difficulties—with entry at younger age being
protective (Burge, 2007; Hukkanen, Sourander, Bergroth, & Piha, 1999a; Tarren-Sweeney,
2008). This is consistent with cumulative trauma exposure models (Charlotte, Viding, Fearon,
Glaser, & McCrory, 2017), neuroscience, and attachment theory. Regardless of prior conditions,
the attachment systems of infants who enter foster care have been found to be responsive to
changes in parenting style (Dozier, Stovall, Albus, & Bates, 2001). Conversely, a study of late-
placed children found that the severity of their pre-care maltreatment was associated with their
maternal and self-representations, which in turn predicted children’s subsequent representations of
their relationships with foster mothers, as well as their subsequent mental health (Milan & Pin-
derhughes, 2000). A range of psychological and neurobiological processes in early childhood that
are critical to human social functioning are impaired by early and prolonged exposure to traumatic
maltreatment and by the absence of nurturing, sensitive care. However, it is important to note that
early exposure to severe and/or chronic maltreatment need not result in irreparable harm. Rather,
there is emerging evidence that its effects manifest as latent vulnerabilities that are mitigated to
varying degrees by children’s subsequent experience of optimal developmental experiences
(McCrory & Viding, 2015).
What then is known of the developmental effects of children’s experiences in family-based
OOHC? Attachment theory predicts that the developmental effects of OOHC should vary accord-
ing to the characteristics of a child’s attachment development prior to their entry into care, notably
their internal working model of attachment, and to caregiver sensitivity and their ability to provide
a “secure base” (Bowlby, 1988; Schofield, 2002). Many such children are primed for insecurity
when they enter care, due to their compromised attachment development and distorted representa-
tions of caregivers and caregiving, as well as the loss of their parents and being placed with
274 Developmental Child Welfare 1(3)
unfamiliar carers (Milan & Pinderhughes, 2000; van den Dries, Juffer, van IJzendoorn, &
Bakermans-Kranenburg, 2009).
There is accumulating evidence that quality of caregiving, caregiver bonding, caregiver
commitment, and maltreatment in care are factors that directly influence children’s felt
security and psychological development and regulate their potential to recover from attach-
ment- and trauma-related psychopathology (Dozier, Grasso, Lindheim, & Lewis, 2007; Quir-
oga & Hamilton-Giachritsis, 2016; Tarren-Sweeney, 2008). Several longitudinal studies have
also identified that children incur further deterioration in their mental health following place-
ment disruptions, which is a common occurrence in OOHC (Aarons et al., 2010; Delfabbro &
Barber, 2003; Newton, Litrownik, & Landsverk, 2000; Villodas, Litrownik, Newton, & Davis,
Rationale for the present review
A starting point for examining the developmentally reparative versus harmful effects of long-
term, family-based OOHC is prospective measurement of stability and change in children’s
mental health. In comparison to the large number of cross-sectional mental health surveys
conducted with children in care, there have been relatively few prospective studies. Much of
the available prospective data are compromised by high sample attrition, short prospective
time frames, and small sample size. A recent series of meta-analyses pooled prospective mean
score changes in externalizing difficulties (21 studies), internalizing difficulties (24 studies),
and total difficulties (25 studies) (Goemans, van Geel, & Vedder, 2015). These meta-analyses
showed no statistically or clinically significant changes over time in children’s internalizing
(Hedges’ g¼.10, 95%confidence interval (CI) ¼[.27, .07], p¼.25, N¼1,984),
externalizing (g¼.04, 95%CI ¼[.24, .15], p¼.66, N¼1,729), or total behavior
problems (g¼.10, 95%CI ¼[.28, .07], p¼.24, N¼2,523). Various moderator analyses
failed to show effects when comparing studies on study length, sample size, publication type,
attrition, or mean age. Instead, the three meta-analyses identified considerable heterogeneity
across the various study findings, with some reporting large mean increases in mental health
scores over time, and others reporting large reductions (Goemans et al., 2015).
What might account for this? One explanation is that family-based OOHC does not exert a
general, population-wide effect on children’s development. Group mean score changes are really
only informative if such changes reflect a general and largely uniform shift in the distribution of
mental health scores over time—that is, if children’s mental health generally improves or deterio-
rates while growing up in care. Secondly, discrepant findings may be accounted for by variability
in study design. A narrative review provides a vehicle for demystifying heterogeneous and dis-
crepant findings using scholarly reasoning—that can yield additional insights to those afforded by
The present review aims to address the following research questions with respect to children
(including infants and adolescents), who are placed into long-term, family-based OOHC, following
serious and/or chronic maltreatment.
1. What evidence is there that long-term, family-based care has a general, population-wide
effect on children’s mental health such that it is generally reparative or generally harmful?
2. Does entry into long-term OOHC affect children’s mental health, as evidenced by pro-
spective changes over the first years in care?
3. Is the reparative potential of long-term, family-based OOHC moderated by children’s age
at entry into care?
Review method
To perform the present review, we first conducted a literature search in PsycINFO, Medline, and
ERIC to identify studies published before July 2018, using the terms longitudinal,prospective or
repeated measures, combined with the terms out-of-home care,looked after children,foster care,
or kinship care. This was supplemented by manual searches of article references, and cross-
checking studies included in the recent meta-analysis (Goemans et al., 2015). Subsequent checks
were made to identify relevant studies published after July 2018 and prior to the present article
being published.
Study inclusion/exclusion criteria
1. Sample predominantly experienced pre-care maltreatment. The review is focused toward
understanding how children who predominantly experience severe and/or chronic maltreat-
ment in their parents’ care subsequently develop in long-term, family-based care. Long-
itudinal studies of the development of orphans or abandoned infants placed in foster care
without previous history of maltreatment refer to a different population and were excluded
from the present review. Similarly, attempts were made to confirm that study samples were
predominantly placed into care following maltreatment, either from information provided
in the study publication or from published descriptions of OOHC systems in the study
locations. All of the studies located in the present literature search that met all other
selection criteria were carried out in locations where children predominantly enter court-
ordered care due to maltreatment.
2. Measures obtained while children resided in family-based OOHC. Studies were included in
the present review if mental health data were measured on at least two occasions when
children were residing in family-based OOHC. Four studies were excluded because some
or most of the study samples were residing in their birth parents’ care at follow-up (Havnen,
Breivik, & Jakobsen, 2014; Newton et al., 2000; Proctor, Skriner, Roesch, & Litrownik,
2010; Villodas et al., 2016).
3. Prospective measurement using the same type of informants. A further methodological
problem is estimating prospective change from reports provided by different types of
informants at different time points, such as comparing birth parent reports at baseline with
foster carer reports at follow-up. Three studies were excluded because they estimated
baseline mental health from parent-report scores and follow-up mental health from foster
carer-report scores (Berger, Bruch, Johnson, James, & Rubin, 2009; Havnen et al., 2014;
Rubin, O’Reilly, Luan, & Localio, 2007).
4. Reliable informants of children’s mental health. Two cohort studies estimated foster chil-
dren’s mental health from social worker reports. However, social workers do not have
sufficient proximal engagement with children in care to reliably inform on their mental
health (English & Graham, 2000; McCrae & Barth, 2008), and those studies are thus
excluded from the present review (Barber & Delfabbro, 2005; Fanshel & Shinn, 1978;
Frank, 1980). Similarly, it is doubtful that children’s birth parents are able to accurately
report on their children’s mental health when they are not residing in their care, and one
study that employed this method was excluded from the review (Linares, Li, Shrout, Brody,
& Pettit, 2007).
5. Sufficient sample size. Consideration was given to excluding studies with small samples
that were insufficient for identifying effect sizes that are clinically meaningful. We calcu-
lated that the minimum sample size required for identifying a large effect size (defined as d
.80) with 95%CI is N¼25. Seven studies that retained very small samples (N<25) at
follow-up were excluded from the present review (Bogart, 1988; Gonzalez, 1999; Haight,
Black, & Sheridan, 2010; Lawrence, Carlson, & Egeland, 2006; Leathers, Spielfogel,
McMeel, & Atkins, 2011; Rushton, Treseder, & Quinton, 1995; White, 1997).
6. Measures reported for common cohorts. Three studies reported baseline and follow-up data
for nonidentical groups such that the differences in mean scores do not reflect average
within-subject change (Fernandez, 2008; Hiller & St. Clair, 2018; Portwood et al., 2018).
7. Sufficient prospective time frame. The shorter the prospective time frame, the more likely it
is that a study sample includes children in temporary and short-term care. Given the present
review is concerned with the stability of children’s mental health while growing up in care,
three studies that employed a prospective time frame of less than 12 months were excluded
(Bogart, 1988; Damen & Pijnenburg, 2005; Portwood et al., 2018).
8. Population studies. To avoid conflating the effects of growing up in care with the effects of
clinical or foster care interventions, only population (i.e., not treatment or intervention)
studies were selected for review.
9. Mean baseline and follow-up scores reported. A 5-year prospective study was excluded
because, rather than reporting mean scores from each of the 12-month follow-up assess-
ments, it reported the mean of four annual follow-up mean scores (Kang, Woo, Chun, Nho,
& Chung, 2017).
Statistical approaches
Aggregate change. Most of the studies selected for review reported group mean scores and standard
deviations at two or more time points from which mean change scores can be calculated. The
present review reports mean change scores as standardized mean score differences (Cohen’s d)—
the difference in mean scores expressed as a proportion of the score standard deviation for the
aggregate sample. Around half of these studies reported mean raw scores, and the remainder
reported mean standardized Tscores. Whereas raw scores are more precise than Tscores, age-
standardized Tscores account for normative, developmental shifts in the population distribution of
mental health difficulties, and thus provide a more appropriate metric for longitudinal studies that
span from preadolescence through to adolescence.
Within-subject change. While prospective changes in group mean scores estimate aggregate change,
they don’t give a sense of how many children experience meaningful improvement or deteriora-
tion, or no meaningful change in their mental health. This information is fundamental to under-
standing the therapeutic and harmful effects of growing up in care. In practice, since children’s
mental health difficulties are experienced as continuously distributed phenomena, allocating
change scores to change categories will always be imperfect. In doing so, it is important to
distinguish between statistical significance, clinical significance, and a level of change that is
perceptible to children and/or families, and/or that has developmental and social meaning.
Some studies have defined meaningful change as scores shifting from one severity range to
another over time, such as from “normal” to “clinical,” or from “borderline” to “normal.” One of
the reviewed studies identified change trajectories from repeated measures obtained over an 8-year
time frame using a growth modeling procedure (Proctor et al., 2010). While this is useful for
identifying changes in the rates of children who require clinical interventions, it is otherwise an
inferior method for estimating meaningful change. This is because small, imperceptible changes in
symptomatology (as little as a single raw score point) can push scores across a single clinical
threshold (e.g., from borderline to normal).
Two of the reviewed studies reported rates of meaningful change using the Reliable Change
Index (RCI). It describes the magnitude of change that is statistically reliable, that is, larger than
what can be statistically attributable to internal measurement error (Jacobson & Truax, 1991).
Jacobson and Truax (1991) originally proposed that the RCI be used as part of a “twofold”
procedure for establishing a level of change that is clinically significant, without elaborating on
what other criterion might be used to define clinically significant change. In practice, however, the
RCI has since been employed in treatment evaluation as the sole criterion for defining clinically
significant change. This approach is problematic, since there is no logical connection between the
scale of score differences that is accounted for by measurement error, and that which represents
clinically and developmentally meaningful change.
With this in mind, two of the reviewed studies attempted to define change that is clinically
meaningful, and which is thus likely to be perceptible and meaningful to children and their
caregivers. The first defined change in Child Behavior Checklist (CBCL) scores as mean-
ingful if they were (1) statistically significant (using the RCI), and also (2) clinically signif-
icant, as defined by a shift from the normal range to either the borderline or clinical range, or
from the borderline or clinical range to the normal range (Vanderfaeillie, Van Holen,
Vanschoonlandt, Robberechts, & Stroobants, 2013). In clinical research, measuring change
is based on the assumption that all study participants have clinical-level difficulties requiring
intervention. However, population studies include participants who manifest normative mental
health at baseline. Furthermore, the lower a child’s symptom scores are at baseline, the less
scopetheyhavetoexperienceimprovementintheir mental health—there is a “floor” effect.
For these children, a lack of meaningful change equates to sustained mental health. With this
in mind, the second of the reviewed studies (Tarren-Sweeney, 2017) differentiated between
those children whose scores were in the normal ranges on both occasions, and other partici-
pants, with the former constituting a sustained mental health group. For the latter group,
meaningful change was then defined by the spread of scores traversing both borderline and
clinical range cut-points. The reasoning for this was that a shift in scores from a normal range
to a clinical range, and vice versa, which traverses the borderline range, is clinically mean-
ingful. This method yielded more conservative estimates of meaningful change than the RCI
method (Tarren-Sweeney, 2017).
Thirty-five longitudinal studies that measured prospective changes in the mental health of
children in OOHC were located in the literature search. Of these, 21 studies were excluded
from the review. These are listed in Table 1, together with reason(s) for their exclusion. Of
the 14 studies selected for the present review, 10 measured children’s mental health over a
short time frame (3 years) and 4 were long-term studies (5 years). The child participants
in these 14 studies ranged in age from 2 to 18 years. Two of the studies reported aggregated
data for children residing in various types of care, but were included in the review because
more than 90%were in family-based care.
Table 1. Studies excluded from the review.
Study Location Reason(s) for exclusion
Ahmed et al. (2005) Iraq Orphans placed in foster care
Barber and Delfabbro (2005) Australia Respondents were social workers
Berger, Bruch, Johnson, James, and Rubin
U.S. Mixed respondents: baseline scores ¼birth
parents, follow-up ¼foster carers
Bogart (1988) U.S. Small sample size (N<25); study interval
<12 months
Damen and Pijnenburg (2005); Damen and
Veerman (2005)
The Netherlands Study interval <12 months
Fanshel and Shin (1978); Frank (1980) U.S. Respondents were social workers
Fernandez (2008) Australia Not within-subjects analyses (unequal
group N)
Gonzalez (1999) U.S. Small sample size (N<25)
Haight, Black, and Sheridan (2010) U.S. Small sample size (N<25); treatment study
Havnen, Breivik, and Jakobsen (2014) Norway Mixed respondents: baseline scores ¼birth
parents, follow-up ¼foster carers and
birth parents
Hiller and St. Clair (2018) United Kingdom Mean annual SDQ scores were reported for
different combinations of the sample
(large # missing values for each year).
Therefore not within-subject comparisons
Kang, Woo, Chun, Nho, and Chung (2017) South Korea Reported baseline mean scores, and the
mean of the annual follow-up mean scores
for years 2 through to 5
Lawrence, Carlson, and Egeland (2006) U.S. Small sample size (N<25)
Linares, Li, Shrout, Brody, and Pettit (2007) U.S. Respondents were birth parents
Leathers, Spielfogel, McMeel, and Atkins
U.S. Small sample size (N<25)
McAuley and Trew (2000) United Kingdom Small sample size (N<25); study interval
<12 months
Minnis et al. (2006) Scotland Treatment study
LONGSCAN study: Newton, Litrownik,
and Landsverk (2000); Proctor, Skriner,
Roesch, and Litrownik (2010); Villodas,
Litrownik, Newton, and Davis (2016)
U.S. Not exclusively in care. Undisclosed number
of children had returned to their parents
Portwood et al. (2018) Canada Study interval <12 months
Rubin, O’Reilly, Luan, and Localio (2007) U.S. Mixed respondents: baseline scores ¼birth
parents, follow-up ¼foster carers
Rushton, Treseder, and Quinton (1995) England Small sample size (N<25)
White (1997) U.S. Small sample size (N<25)
Note. SDQ ¼Strengths and Difficulties Questionnaire.
Question 1: What evidence is there that long-term, family-based care has a general,
population-wide effect on children’s mental health?
If growing up in OOHC has a general, population-wide effect on children’s mental health (i.e.,
where the experience is generally therapeutic, or is generally harmful), it should manifest as a
fairly constant mean rate of change across an entire population of children in care. Thus, assuming
that the aggregate developmental impact that care systems exert on a population remains the same
over time, the level of change measured over 1 year should be one fifth of that measured over 5
years. As time proceeds, however, longitudinal samples of children in care become progressively
less representative, due to high sample attrition and various survivor biases, as well as their
increasing age profile. Notwithstanding this critical limitation, it is useful to compare prospective
mental health data in relation to the time frames over which change was measured. Cohorts that are
representative of care populations, particularly with respect to the distributions of “age at entry into
care,” “time in care,” and “time in placement,” offer the possibility of estimating aggregate mental
health changes over defined periods.
Table 2 lists estimates of short-term stability and change from five studies that recruited
representative population samples, without respect to age at entry into care, time in care, and time
in placement. Three of the studies (which include the two largest) measured very small or negli-
gible (d¼.06 to .09) 12-month changes in mean raw CBCL broadband scale scores among 8- to
13-year-old Dutch foster children (N¼53; Bastiaensen, 2001); mean caregiver-reported SDQ
(Strengths and Difficulties Questionnaire) internalizing and externalizing difficulties among 4- to
17-year-old Dutch foster children (N¼180; Goemans, van Geel, & Vedder, 2018); and mean self-
reported scores on scales measuring prosocial behavior, emotional disorder and anxiety, conduct
disorder and physical aggression, and relational aggression among 10- to 17-year-old Canadian
foster children (N¼201; Perkins, 2008).
The fourth study measured 12-month changes in socio-emotional difficulties and competence
among 56 Norwegian 2-year-olds in foster care (Jacobsen, Moe, Ivarsson, Wentzel-Larsen, &
Smith, 2013), using the Infant–Toddler Social and Emotional Assessment (ITSEA). While
baseline mean social-emotional difficulties Tscores were unexpectedly low for children in
care, so too were Tscores for a Norwegian comparison group, suggesting the Tscore distribu-
tions (based on U.S. norms) are not valid for Norwegian toddlers. The foster children’s mean
baseline social-emotional difficulties Tscores were only 4–8 points higher than those for the
comparison group. They manifested small to moderate 12-month increases in mean carer-
reported externalizing (d¼.17) and internalizing (d¼.40) difficulties, contrasting with a
small improvement in their mean competence scores (d¼.18), while mean dysregulation
difficulties remained the same. However, the community sample’s mean internalizing difficul-
ties increased by the same number of Tscore points (d¼.46).
The fifth study reported rates of statistical change (RCI) over a 2-year period for 49 preado-
lescent foster children (Vanderfaeillie et al., 2013). Eight (16%) showed statistical improvement in
CBCL total problems scores, 18 (37%) showed deterioration, and 23 (47%) showed no change.
Furthermore, the rate of children with CBCL total problems Tscores 60 (borderline range cut-
point) increased from 24%to 41%. While the follow-up rate (41%) is reasonably consistent with
previous research estimates for this population (Oswald et al., 2010), the baseline rate (24%) falls
well short of those estimates. Without understanding why the baseline rate for the 49 surviving
participants was so low, it is difficult to interpret these findings.
280 Developmental Child Welfare 1(3)
Table 2. Estimates of short-term stability and change: Representative population samples recruited without reference to age at entry into care, time in care, or time
in placement.
Type of
age range
(years) NAttrition Scale
mean (SD)
mean (SD)
Effect size
(Cohen’s d)
Bastiaensen (2001) (The Netherlands) Foster 8–13 53 50% CBCL raw total 2 35.5 (19.6) 33.3 (23.9) .10
CBCL raw ext. 11.9 (8.1) 12.0 (9.5) þ.01
CBCL raw int. 8.9 (6.2) 8.5 (6.8) .06
Goemans, van Geel, and Vedder (2018) (The
Foster 4–17 180 58% SDQ raw ext. 1 7.60 (4.64) 7.14 (4.40) .10
SDQ raw int. 5.03 (3.83) 5.02 (3.80) 0
Jacobsen, Moe, Ivarsson, Wentzel-Larsen, and
Smith (2013) (Norway)
Foster 2 56 7% ITSEA ext. T1 52 (11.7) 54 (12.5) þ.17
ITSEA int. T49 (10.2) 53 (9.9) þ.40
ITSEA dys. T46 (10.4) 46 (13.7) 0
ITSEA com. T44 (11.7) 46 (10.4) þ.18
Perkins (2008) (Canada) Foster 10–17 201 45% AAR raw pro. 1 12.72 (4.05) 12.47 (4.10) .06
AAR raw emo. 4.84 (3.25) 4.70 (3.29) .04
AAR raw agg. 2.21 (2.35) 1.98 (2.44) .09
AAR raw relagg. 2.20 (2.21) 2.02 (2.25) .08
Meaningful change rates
Improvement No change Deterioration
Vanderfaeillie, Van Holen, Vanschoonlandt,
Robberechts, and Stroobants (2013)
Foster 6–12 49 36% CBCL total T2 16.3% 46.9% 36.7%
CBCL ext. T8.2% 61.2% 30.6%
CBCL int. T10.2% 73.5% 16.3%
Note. ITSEA ¼Infant–Toddler Social and Emotional Assessment (carer-report); SDQ ¼Strengths and Difficulties Questionnaire (carer-report); AAR ¼Assessment and Action
Record (AAR-C2; self-report); CBCL ¼Child Behavior Checklist (carer-report); RCI ¼Reliable Change Index; SD ¼standard deviation; ext. ¼externalizing; int. ¼internalizing; dys.
¼dysregulation; com. ¼competence; pro. ¼prosocial behavior; emo. ¼emotional disorder and anxiety; agg. ¼conduct disorder and physical aggression; relagg. ¼indirect aggression
(relational aggression); T¼Tscore; raw ¼raw score.
Meaningful change estimated by the RCI.
Table 3 lists estimates of long-term (5 years) stability and change from four studies that
recruited representative population samples, with respect to age at entry into care, time in care,
and time in placement. All of the long-term studies were afflicted by high sample attrition (33–
75%), thereby limiting the interpretability of their findings, and highlighting the need for more
definitive long-term cohort studies. The first study measured 5-year changes in depressive symp-
toms for 10- to 13-year-old (at baseline) Croatian children residing in long-term foster (N¼60;
Bulat, 2010). The foster care sample had small reductions in mean self-reported depressive symp-
toms over 5 years, as measured by the Youth Self-Report (YSR) anxious-depressed subscale, and
the Children’s Depression Inventory (CDI), but a moderate increase in mean carer-reported CBCL
anxious-depressed scores. While the self-report mean baseline scores are consistent with prior
estimates of depressive symptoms for children in care, the mean carer-report score (2.73) is a little
low (see, for example, Simmel et al., 2014), potentially suggesting those carer-report scores may
be unreliable.
The second study reported 7- to 9-year mental health changes for a small (N¼85) sample
of preadolescent (at baseline) Australian children in foster and kinship care (Tarren-Sweeney,
2017). The study showed no changes in mean CBCL age-standardized internalizing and
externalizing Tscores; small reductions in CBCL total problems (d¼.20) and CBCL
Social-Attention-Thought (SAT) problems (d¼.23); and a small reduction (d¼.26) in
attachment- and trauma-related difficulties, as measured by items common to the Assessment
Checklist for Children (ACC; baseline measure) and the Assessment Checklist for Adoles-
cents (ACA; follow-up measure) (the ACC-ACA score). The study also reported rates of
meaningful change, as described in the method section. Around a third of the children
manifested sustained mental health (35%based on the CBCL total score; 38%on the
ACC-ACA score). Of the remaining 65%of children who had clinical or elevated total CBCL
scores at baseline, roughly the same proportions (approximately 40%) showed meaningful
improvement, and meaningful deterioration, with the remaining 20%showing no meaningful
change. The equivalent rates for those children who had clinical or elevated ACC-ACA scores
at baseline were 40%,30%, and 30%, respectively. While this study had the longest prospec-
tive time frame of the 14 studies, it also incurred the highest sample attrition (75%). However,
analyses showed that those children retained at follow-up were broadly representative of the
larger baseline sample (N¼347).
The third study reported rates of meaningful change in carer-reported SDQ total difficulties
scores over a 5-year period for a small (N¼60) sample of preadolescent (at baseline) English
foster children (Biehal, Ellison, Baker, & Sinclair, 2010), with similar rates of children manifesting
“some or marked” improvement (38%) and “some or marked” deterioration (40%) and the remain-
ing 22%showing no meaningful change.
The fourth study reported rates of statistical change (RCI) in CBCL total scores over an 8-year
period for a small (N¼38) sample of preadolescent (at baseline) Norwegian children in foster and
kinship care (Vis, Handega
˚rd, Holtan, Fossum, & Thørnblad, 2016). Equal numbers of children (N
¼10, 26%) manifested statistically meaningful improvement and deterioration, while 18 (47%)
showed no meaningful change.
Conclusion on this question. While the evidence base is small and compromised by design limitations
(notably high sample attrition), most studies that recruited representative population samples (with
respect to age at entry into care, time in care, and time in placement) do not provide evidence that
OOHC exerts a general, population-wide effect on the mental health of children in care. We
Table 3. Estimates of long-term stability and change: Representative population samples recruited without reference to age at entry into care, time in care, or time
in placement.
Type of
age range
(years) NAttrition Scale
mean (SD) Follow-up mean (SD)
Effect size
Bulat (2010) (Croatia) Foster 10–13 60 48% CBCL anx-dep. 5 2.73 (2.83) 3.71 (3.88) þ.29
YSR anx-dep. 6.03 (4.06) 5.67 (4.96) .08
CDI total 8.32 (5.97) 7.53 (5.05) .14
Tarren-Sweeney (2017) (Australia) Foster and
4–11 85 75% CBCL total T7–9 59.4 (12.5) 56.9 (12.9) .20
CBCL ext. T56.8 (12.1) 57.3 (12.3) þ.04
CBCL int. T52.7 (11.3) 52.7 (11.5) 0
CBCL SAT 17.2 (12.0) 14.4 (12.1) .23
ACC/ACA 17.0 (15.2) 14.4 (13.9) .26
Meaningful change rates
Sustained mental
Improvement No change Deterioration
CBCL total 35.3% 27.1% 12.9% 24.7%
ACC/ACA 37.7% 24.7% 18.8% 18.8%
No change Deterioration
Marked Some Some Marked
Biehal, Ellison, Baker, and Sinclair (2010)
Foster 4–11 60 33% SDQ total 5 20% 18% 22% 25% 15%
No change Deterioration
Vis, Handega
˚rd, Holtan, Fossum, and
Thørnblad (2016) (Norway)
Foster and
4–9 38 52% CBCL total 8 26% 48% 26%
Note. CBCL ¼Child Behavior Checklist (carer-report); YSR ¼Youth Self-Report (self-report); CDI ¼Children’s Depression Inventory (self-report); SDQ ¼Strengths and Difficulties
Questionnaire (carer-report); anx-dep. ¼anxious-depressed; ext. ¼externalizing; int. ¼internalizing; SAT ¼nominal Social-Attention-Thought problems broadband scale; ACC/
ACA ¼scale constructed from 64 items shared by the Assessment Checklist for Children (ACC) and Assessment Checklist for Adolescents (ACA); T¼Tscore; SD ¼standard
deviation; RCI ¼Reliable Change Index.
“Sustained mental health” defined as scores within normal range at baseline and follow-up. For all other scores: “improvement” defined as CBCL total score reduction >11, ACC-
ACA shared-item score reduction >4; “no change” defined as CBCL total score change <12, ACC-ACA shared-item score change <5; “deterioration” defined as CBCL total score
increase >11, ACC-ACA shared-item score increase >4.
Change categories defined as: “marked improvement” ¼reduction of 5 or more points; “some improvement” ¼reduction of 2–4 points; “no change” ¼<2 point change; “some
deterioration” ¼increase of 2–4 points; “marked improvement” ¼increase of 5 or more points. No justification provided for the score ranges.
Meaningful change estimated by the RCI (cutoff is 8 raw score points for the CBCL total problems scale).
conclude that there is no consistent evidence that growing up in care is generally reparative or
generally harmful for children who enter care following exposure to severe social adversity.
Question 2: Does entry into long-term, family-based OOHC affect children’s mental
health, as evidenced by prospective changes over the first years in care?
Table 4 lists estimates of short-term stability and change for the remaining five studies selected for
review. The cohorts were not representative population samples, but instead were recruited in
relation to their time in care or placements. Three studies recruited samples shortly after they
entered care, one study recruited young children who had been in care for 2 years or less (Symanzik
et al., 2019), and one study recruited a sample of “difficult to place” children following placement
with new foster families (Staines, 2012).
Any effects that growing up in OOHC have on children’s mental health may not be uniform
over time. Isolating nonlinear, time-related effects is better achieved by following cohorts that are
recruited at (or before) entry into care. Of the three studies that recruited cohorts at entry into care,
two were separate cohorts in the U.S. National Survey of Child and Adolescent Well-Being
(NSCAW), a nationally representative study conducted over five waves (baseline, 6-month, 18-
month, 36-month, and 6- to 7-year follow-up) (Administration for Children, Youth and Families,
2001). The NSCAW measured child and adolescent mental health from caregiver-reported CBCL
scores, as well as self-reported scores on the YSR (the self-report version of the CBCL; age 11þ),
and post-traumatic stress subscale of the Trauma Symptom Checklist (TSCC; age 8þ).
The NSCAW Child Welfare (CW) cohort was 5,501 children aged 1–16 years at baseline,
recruited to the study following child maltreatment notifications, including a sub-cohort who
resided in OOHC at each stage of the study. Three published analyses of age-limited prospective
data obtained for this sub-cohort are included in the present review. The first analysis compared
18- to 24-month mental health changes for 2- to 4-year-old children who were placed in care (N¼
152), remained with their parents with support services (N¼274), or remained with parents
without services (N¼221) (Stahmer et al., 2009). The in-care group had a sizable, though
nonsignificant reduction in mean caregiver-reported CBCL total problems Tscores (d¼.44),
whereas children residing with their parents without support had increased scores (d¼.44), and
those receiving services had a smaller, nonsignificant increase. The second analysis reported 3-
year changes in self-reported YSR internalizing and externalizing raw scores for 234 children
between 11 and 14 years (Leonard & Gudin
˜o, 2016). Mean baseline scores were within the range
of previously reported estimates. Although an effect size could not be calculated (standard devia-
tions not reported), participants reported a modest 3-year increase in mean externalizing scores,
and a corresponding decrease in mean internalizing scores. The third analysis reported 18-month
and 3-year changes in rates of CBCL disorders (internalizing and externalizing clinical ranges) for
a wide-age sample (2–15 years at baseline, NSCAW) (Aarons et al., 2010). Baseline rates were
within the range of previously reported estimates (Tarren-Sweeney & Hazell, 2006). The rate of
externalizing disorders fell from 33.9%to 29.1%after 18 months, and then to 27.3%after 3 years,
which is a sizable reduction. The rate of internalizing disorders fell from 21.6%to 16.9%after 18
months, but rose to 20.8%after 3 years.
The NSCAW Long-Term Foster Care (LTFC) cohort consisted of 727 older children and
adolescents who entered care approximately 1 year before baseline. At baseline, 91%of this
sample resided in family-based care (58%in non-kin foster care, 32%in kinship care), and 9%
were in group homes (i.e., small residential units) (Administration for Children, Youth and
284 Developmental Child Welfare 1(3)
Table 4. Estimates of short-term stability and change: Samples recruited with reference to time in care or time in placement.
Study Type of care Analysis
Baseline age
range (years) NAttrition Scale
(years) Mean scores
1. Samples recruited following entry into care Baseline
mean (SD)
mean (SD)
(Cohen’s d)
NSCAW, Cohort 1 (U.S.) >90% foster
and kinship
Stahmer et al. (2009) 2–4 152 Not stated CBCL total T1.5–2 56.8 (11.3) 51.6 (11.9) .45
Leonard and Gudin
11–14 234 YSR raw ext. 3 14.7 16.6 SD not
reportedYSR raw int. 14.0 11.6
Clinical range rates
range rate
range rate
Aarons et al. (2010) 2–15 500 CBCL ext.
1.5 33.9% 29.1% 4.8%
3 27.3% 6.6%
CBCL int. clinical
1.5 21.6% 16.9% 4.7%
3 20.8% 0.8%
mean (SD)
mean (SD)
(Cohen’s d)
NSCAW, Cohort 2 (U.S.) >90% foster
and kinship
Barboza, Dominguez,
and Pinder (2017)
8–15 280 Unclear,
but low
CBCL ext. T1.5 60.1 (12.5) 59.4 (12.3) .04
3 58.4 (13.2) .13
TSCC-PTS T1.5 48.0 (10.1) 47.4 (9.3) .06
3 46.9 (9.5) .11
McWey, Cui, and
Holtrop (2014)
11–16 180 YSR ext. T1.5 53.3 (12.4) 54.7 (12.2) þ.11
3 55.2 (10.4) þ.17
McWey, Cui, and
Pazdera (2010)
13–16 106 CBCL ext. T3 61.5 (11.6)
57.8 (12.1) .31
CBCL int. T57.2 (12.9) 56.5 (10.9) .06
Strijker, van Oijen, and Knot-Dickscheit
(2011) (The Netherlands)
Foster and
11–17 60 23% CBCL raw total 1.5 30.4 (20.1) 39.9 (26.4) þ.41
CBCL raw ext. 9.7 (7.5) 14.2 (10.3) þ.51
CBCL raw int. 9.3 (7.1) 10.5 (8.0) þ.16
YSR raw total 37.0 (18.4) 31.5 (16.4) .32
YSR raw ext. 12.2 (6.4) 9.8 (5.7) .40
YSR raw int. 10.7 (7.4) 9.4 (6.1) .19
2. Children in care for 2 years or less at baseline Comm.
Symanzik et al. (2019) (Germany) Foster 2–7 71 17% ACC raw pseud. 1 1.97 (2.11) 1.71 (1.85) .13 .18
ACC raw insec. 3.50 (3.62) 2.89 (3.12) .18 .21
ACC raw indis. 6.21 (3.46) 5.26 (3.12) .29 .46
ACC raw total 11.68 (7.59) 9.87 (6.49) .26 .35
72 16% RPQ raw disin. 3.46 (3.62) 2.82 (3.46) .18 .28
RPQ raw inhib. 2.17 (2.53) 1.47 (2.70) .27 .38
RPQ raw total 5.64 (5.23) 4.29 (5.40) .25 .39
Table 4. (continued)
Study Type of care Analysis
Baseline age
range (years) NAttrition Scale
(years) Mean scores
3. Sample recruited following entry into new placement Clinical range rates
range rate
range rate
Staines (2012) (England and Wales) Foster 5–14 220 2% SDQ normal
1 36% 37% þ1%
SDQ borderline
25% 20% 5%
SDQ clinical
39% 43% þ4%
Note. CBCL ¼Child Behavior Checklist (carer-report); YSR ¼Youth Self-Report (self-report); ext. ¼externalizing; int. ¼internalizing; T¼Tscore; raw ¼raw score; TSCC PTS ¼
post-traumatic stress subscale of the Trauma Symptom Checklist for Children (self-report); ACC ¼Assessment Checklist for Children–Short Form; pseud. ¼pseudomature
interpersonal behavior; insec. ¼insecure interpersonal behavior; indis. ¼indiscriminate interpersonal behavior; RPQ ¼Relationship Problems Questionnaire; disin. ¼disinhibited
behavior; inhib. ¼inhibited; SDQ ¼Strengths and Difficulties Questionnaire (carer-report); SD ¼standard deviation; NSCAW ¼National Survey of Child and Adolescent Well-Being.
Aggregate group mean scores and SDs estimated from means and SDs reported separately for each gender. The estimations are approximate.
Effect size (Cohen’s d) for community comparison sample (n¼128–131).
SDQ “normal,” “borderline,” and “clinical” refer to ranges.
Families, 2001). Three published analyses of age-limited prospective data obtained for this
sub-cohort are included in the present review (Barboza, Dominguez, & Pinder, 2017; McWey,
Cui, & Holtrop, 2014; McWey, Cui, & Pazdera, 2010). The first analysis identified small 18-month
and 3-year reductions in carer-reported CBCL externalizing scores and self-reported trauma
symptom for 280 children between 8 and 15 years (Barboza et al., 2017). The second analysis
identified small increases in self-reported YSR externalizing Tscores at 18-month and 3-year
follow-up for 180 children between 11 and 16 years (McWey et al., 2014). However, baseline and
follow-up mean externalizing scores are a little lower than expected for this population (T¼53–
55), suggesting the possibility that adolescents underreported their difficulties. It is also not clear
how the older adolescents were retained in the 3-year follow-up, by which time some would have
been 19 years old. A parallel analysis of an older subset of this sample, namely 106 adolescents
between 13 and 16 years (at baseline), identified 3-year changes in carer-reported CBCL exter-
nalizing and internalizing Tscores (McWey et al., 2010). Baseline scores (T¼57, 62) were within
the range of previously reported estimates. There was a moderate 3-year reduction (d¼.31) in
carer-reported externalizing scores and a slight reduction in internalizing scores. Thus, adolescents
and their carers reported 3-year mental health changes in opposing directions, with adolescents on
average reporting modest deterioration and their carers reporting slight to moderate improvement.
The third study, which measured 18-month mental health changes among 60 Dutch adolescents,
also contrasted carer-reported and self-reported scores (Strijker, van Oijen, & Knot-Dickscheit,
2011). In this study, young people reported moderate 18-month improvement in their mental health
(d¼.19–.40), while their carers reported moderate deterioration (d¼.16–.51).
Conclusion on this question. The three studies present conflicting evidence on whether or not entry
into OOHC has an initial effect on children’s mental health, following their removal from mal-
treating families. The NSCAW is the best-designed longitudinal study of the mental health of
children in care carried out to date. Analyses of carer-reported scores for various age ranges of the
NSCAW samples identify small to moderate improvements in children’s mean mental health
scores during their first 3 years in care. However, the analysis by Aarons et al. (2010) suggests
that while children may generally benefit emotionally from being removed from abusive care, for
some children this effect is not sustained over the longer term. Conversely, in both NSCAW
cohorts adolescent self-reported mean internalizing and externalizing difficulties increased slightly
over the same time frames.
Question 3: Is the reparative potential of long-term, family-based OOHC moderated by
children’s age at entry into care?
Several surveys have identified that older children and adolescents in care have greater mental
health difficulties than younger children (Armsden, Pecora, Payne, & Szatkiewicz, 2000; Dubow-
itz, Zuravin, Starr, Feigelman, & Harrington, 1993; Heflinger, Simpkins, & Combs-Orme, 2000;
Meltzer, Corbin, Gatward, Goodman, & Ford, 2003). This age effect is largely an artifact of later-
placed children entering care with higher levels of pre-existing disturbance (Hukkanen, Sourander,
Bergroth, & Piha, 1999b; Tarren-Sweeney, 2008). Nevertheless, while older age at entry into care
is a marker for greater pre-care adversity, it might also moderate children’s subsequent response
and adjustment to care.
To what extent then do the prospective studies reviewed in this article shed light on the question
of whether children’s age at entry into care moderates the reparative potential of long-term care?
Two of the studies measured changes in very young (i.e., 2- to 4-year-olds) children’s socio-
emotional development over short periods. The first study measured small to moderate mean
deterioration in internalizing (d¼.17) and externalizing (d¼.40) difficulties (but improved
competence, d¼.18) over 12 months among 2-year-olds in care (Jacobsen et al., 2013). The
second measured a moderate mean improvement (d¼.45) over 18–24 months following entry into
care (Stahmer et al., 2009). A third study measured modest 1-year reductions in attachment
disorder symptoms and interpersonal difficulties among a sample of 2- to 7-year-old German
children in foster care who had been in care for 2 years or less at baseline and entered care
following a history of maltreatment (Symanzik et al., 2019). However, neither of the carer-
report measures used in this study were designed or validated for children under 4 or 5 years of
age, and the study unexpectedly measured comparable 1-year reductions in these symptoms and
difficulties among a community sample of same-aged children. While this latter finding is difficult
to interpret, it is possible that the measures that were designed for older children do not take
account of normative relationship behaviors manifested by very young children.
Thus, while we know that younger age at entry into care predicts lower mental health difficul-
ties, these studies do not clarify whether children who enter care at a young age are also more likely
to experience improvement in their mental health. Similarly, longitudinal studies of the mental
health trajectories of adolescents following their “late arrival” into care yielded conflicting find-
ings (Leonard & Gudin
˜o, 2016; McWey et al., 2010, 2014; Strijker et al., 2011). The two studies
that recruited representative adolescent foster care cohorts measured slight 1-year reductions in
mean self-reported difficulties (Perkins, 2008) and slight 5-year reduction in self-reported depres-
sive symptoms (Bulat, 2010).
Conclusion on this question. None of the reviewed studies had sufficiently robust design, or adequate
sample size and retention rates to definitively address this question. Similarly, there have not been
enough prospective studies that recruited similar-age cohorts at entry into care to assess the
consistency of any evidence.
None of the research questions that we posed for this review are comprehensively answered by the
available evidence. Perhaps our most important conclusion is that, as yet, no cohort study (or
research program) has had adequate design, scale, or scope to provide a definitive understanding of
the development and well-being of children who grow up in statutory care, or to contrast their
developmental pathways with that of other high-risk child populations growing up in different
forms of care. Notwithstanding this uncertainty, the current research base provides no evidence
that OOHC exerts a general, population-wide effect on the mental health of children in care,
consistent with Goemans, van Geel, and Vedder’s (2015) meta-analysis. In other words, they
provide no evidence that growing up in care is generally reparative or generally harmful for
children who enter care following exposure to severe social adversity. Instead, several longitudinal
studies have demonstrated that sizable proportions of children in care manifest meaningful
improvement in their mental health over short- and long-term time frames, and similarly sizable
proportions experience meaningful deterioration.
Various developmental theories (including attachment and social learning theories), as well as
research into the neurodevelopmental effects of early maltreatment, would predict that the repara-
tive and harm potentials of long-term care are moderated by such factors as children’s age when
288 Developmental Child Welfare 1(3)
entering care, their carers’ commitment, the strength of their carers’ relationships to them, and the
stability of their placements. There are likely to be complex transactional mechanisms that shape
children’s developmental trajectories as they grow up in care. It is also important to understand that
developmental change within care is moderated by children’s earlier exposure to severe social
adversity. The English and Romanian Adoption study found that the developmental effects of more
than 6 months’ exposure to institutional deprivation in early childhood persists for many through
childhood and adolescence—despite being subsequently raised by adoptive families (Sonuga-
Barke et al., 2017). This supports the notion that recovery from some forms of psychopathology
caused by early severe adversity tends to follow a long developmental trajectory even where a
child’s developmental conditions have markedly improved. There is even some evidence that early
chronic maltreatment incurs a delayed “sleeper effect” on later development, regardless of the
quality of intervening care (Li & Godinet, 2014).
Therefore, rather than asking whether long-term care is generally beneficial or harmful for the
development of previously maltreated children, future investigations should instead focus on the
questions “ ...what are the systemic and interpersonal characteristics of OOHC that promote and
sustain children’s psychological development throughout childhood, and what characteristics are
developmentally harmful? and ...for which children is OOHC beneficial, and for which chil-
dren is it not?” The answers to these questions are critical for improving policy and practice within
children’s services and for designing more effective clinical interventions for this population. This
knowledge will also help address the bigger question of whether our present OOHC systems can be
remedied to the point that they adequately facilitate children’s psychological development, or
whether they should be abandoned. We know that large numbers and proportions of children
placed into care effectively grow up without close and enduring familial relationships (Howard
& Berzin, 2011; Reimer & Scha¨fer, 2015). Yet humans are a social species that evolved such that
close and enduring familial relationships are essential for their psychosocial development. The
absence of historical and ethnographic precedents for children growing up in impermanent car-
egiving systems (Boswell, 1988) infers this experience lies outside the boundaries of human
adaptation—in other words, that being raised without a semblance of a permanent family is both
developmentally harmful and contrary to human evolution.
As stated above, no research programs have had adequate scale or scope to adequately address
these questions. To do so will require large and ambitious cohort studies that overcome some major
design obstacles, notably achieving adequate participant retention and reliable and valid measure-
ment. Given children’s dynamic care trajectories (including planned and unplanned placement
changes, restoration to parental care, and shifts to permanent guardianship and adoption), adequate
retention can only be feasibly attained by following children through various care arrangements,
including restoration and permanent orders. This approach also offers scope for comparing the
developmental trajectories of severely maltreated children who remain in their parents’ care versus
growing up in OOHC versus growing up in permanent guardianship/adoption versus subsequent
restoration. However, broadening the scope in this way greatly increases the sample size required
for these and other stratified analyses. Implementing any long-term cohort study of this type would
also require considerable resourcing and expertise to sustain an acceptable participation rate,
especially for children experiencing rapid placement changes. The method section of the present
review highlights some critical limitations in measuring this population’s mental health prospec-
tively, including scope for systematic respondent biases, poor inter-rater reliability, and needing to
employ different informants as children move through placements. Conducting a study that recruits
Tarren-Sweeney and Goemans 289
information from different caregivers at different times in a child’s life (such as parents, foster
carers, and adoptive parents) amplifies the risk of measurement error. With this in mind, we need to
consider whether alternative, non-psychometric measures might yield additional, more accurate
and reliable estimates of children’s mental health in large-scale population cohort studies—includ-
ing neurometric, biometric, and observational methods. Finally, given that many “within-care”
experiences that have developmental significance are systemically driven, and thus vary somewhat
across child welfare jurisdictions, these questions need to be more definitively addressed through
cross-jurisdictional and cross-national studies.
Developmental Child Welfare 1(3)
Developmental Child Welfare 1(3)
Developmental Child Welfare 1(3)
Developmental Child Welfare 1(3)
