Technical ReportPDF Available

Towards Early Identification of Mental Health Problems in Children's Social Care

February 2023

February 2023

DOI:10.13140/RG.2.2.10757.63205/1

Affiliation: University of Cambridge

Authors:

Katherine Parkin

University of Cambridge

Ryan Crowley

NYU Langone Medical Center

Efthalia Lina Massou

University of Cambridge

Marcos Del Pozo Baños

Swansea University

Show all 7 authorsHide

The purpose of this study is to advance the development of early identification tools for young people’s mental health (MH) problems in social care settings. Almost all young people with social care contact are likely to experience some kind of mental health problem, yet a small proportion of these are thought to have formal diagnoses, and even fewer receive treatment. Due to a poor integration of datasets between services for young people, there remain significant problems in identifying young people with mental health problems in social care settings. Failure to identify risk factors and mental-health-associated problems early can delay treatment; if accurate early identification tools can be developed, social care services could deliver more timely support for vulnerable young people. One valuable approach to exploring this is through data linkage. Given the multi-factorial nature of mental ill health, this study suggests that building accurate models to identify mental health problems will require access to large, representative datasets of multi-domain data that reflect a broad range of bio-, psycho-, and social factors. This project, conducted by researchers from the University of Cambridge, involved creating a linked database of education, health, and social care data to measure childhood mental health problems and their associated risk factors. Further details, including the protocol and research summary, can be accessed here: https://whatworks-csc.org.uk/research-report/towards-early-identification-of-mental-health-problems-in-childrens-social-care/

Content uploaded by Katherine Parkin

Content may be subject to copyright.

TOWARDS EARLY

IDENTIFICATION OF MENTAL

HEALTH PROBLEMS IN

CHILDREN'S SOCIAL CARE

February 2023

Acknowledgements

This study makes use of anonymised data held in the Secure Anonymised Information Linkage (SAIL)

Databank. We would like to acknowledge all the data providers who make anonymised data available for

research. This work was supported by the Adolescent Mental Health Data Platform (ADP). The ADP is

funded by MQ Mental Health Research Charity (Grant Reference MQBF/3 ADP). ADP and the author(s)

would like to acknowledge the data providers who supplied the datasets enabling this research study. The

views expressed are entirely those of the authors and should not be assumed to be the same as those of ADP

or MQ Mental Health Research Charity. We would also like to thank Professor Pietro Liò, Emma Rocheteau,

Dr Angela Wood, and Professor Zoe Kourtzi for providing supervision and guidance for the machine learning

aspects of this project. Finally, we would like to thank our funders, without whom this work would not have

been possible.

Authors

Katherine Parkin, Ryan Crowley, Efthalia Massou, Marcos Del Pozo Baños, Yasmin Friedmann, Ann John,

Anna Moore

Funding

This research was funded as part of the WWCSC Spark Grant Scheme.

All research at the Department of Psychiatry in the University of Cambridge is supported by the NIHR

Cambridge Biomedical Research Centre (BRC-1215-20014) and NIHR Applied Research Collaboration East of

England. The views expressed are those of the authors and not necessarily those of the NIHR or the

Department of Health and Social Care.

Katherine Parkin is funded by the National Institute for Health and Care Research (NIHR) School for Public

Health Research (SPHR) (Grant Reference Number PD-SPH-2015) and the NIHR Applied Research

Collaboration (ARC) East of England.

Dr Anna Moore is funded through an NIHR Clinical Lectureship funded by Anna Freud National Centre for

Children and Families (AFC). The Delphi Study, which provided foundational work for this project, was

funded by MRC Adolescent Engagement Awards MR/T046430/1. Dr Moore also holds grants from the Alan

Turing Institute and UKRI/DARE UK. Data linkage within SAIL was carried out by Dr Yasmin Friedmann at

the Adolescent Mental Health Data Platform (ADP), and was funded by Cambridgeshire and Peterborough

NHS Foundation Trust (CPFT).

Dr Marcos Del Pozo Baños, Dr Yasmin Friedmann and Professor Ann John are funded through the ADP,

which is funded by MQ Mental Health Research Charity (Grant Reference MQBF/3 ADP). The views

expressed are entirely those of the authors and should not be assumed to be the same as those of ADP or

MQ Mental Health Research Charity.

The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, the

Department of Health and Social Care, the AFC, the MRC, the Alan Turing Institute, the UKRI, the ADP

or MQ Mental Health Research Charity.

No competing interests declared.

About What Works for Early Intervention and Children’s Social Care

What Works for Children’s Social Care (WWCSC) and the Early Intervention Foundation (EIF) are merging.

The new organisation is operating initially under the working name of What Works for Early Intervention

and Children’s Social Care.

Our new single What Works centre will cover the full range of support for children and families from

preventative approaches, early intervention and targeted support for those at risk of poor outcomes,

through to support for children with a social worker, children in care and care leavers.

To find out more visit our website at: www.whatworks-csc.org.uk

About the Timely Project (Department of Psychiatry, University of Cambridge)

The Timely project, led by Dr Anna Moore, is seeking to develop digital tools to identify young people’s

mental health problems early. The project is using linked health, social care, education and genetic data,

and it explores the value of machine learning and artificial intelligence approaches.

Spark Grant Scheme

This research was funded as part of the WWCSC Spark Grant Scheme. The purpose of the scheme is to

fund new and innovative research in children’s social care, conducted by researchers who may struggle to

secure funding through other routes, particularly Early Career Researchers (ECRs) and/or researchers

from underrepresented, minoritised groups. This work is an important part of our mission to develop

capacity within the research community and generate high-quality evidence in children's social care.

If you’d like this publication in an alternative format such as Braille,

large print or audio, please contact us at: info@wweicsc.org.uk

CONTENTS

Executive summary 5

Background 5

Objectives and research questions 6

Design 6

Findings 7

Conclusions 7

Ongoing work 8

1. Introduction 9

2. Methods 12

3. Findings 20

4. Limitations 35

5. Discussion 39

6. Conclusions 42

7. Recommendations and implications 43

References 45

Appendices 49

Executive summary

Background

Due to poor integration of data held about young people, there remain significant problems

in identifying young people with mental health problems in social care settings. Young

people and families can suffer for prolonged periods without suitable mental health support

(DfE, 2020, 2021). Almost all young people with social care contact are likely to experience

some kind of mental health problem, yet only a small proportion of them are thought to have

a formal diagnosis and even less receive treatment (Berridge et al., 2020; Care Leavers’

Association, 2017; The Child Safeguarding Practice Review Panel, 2021). Failure to identify

risk factors and mental health-associated problems early can delay treatment and lead to

limited interventions failing to address significant causes of a young person’s difficulties

(Allen, 2011; DHSC & DfE, 2018). If accurate early identification tools could be developed,

this would help young people in children’s social care receive more timely support.

Routinely collected data from health, social care and education settings contain broad-

ranging information which can be used to understand an individual’s exposure to risk factors

for mental health problems. Thus, using these data may facilitate the prediction of mental

health outcomes. Machine learning methods could benefit from the large amount of data

available in routinely collected datasets; these algorithms can use this information to learn

from existing data and discover patterns which are then used to predict the outcome of

future observations. These machine learning approaches could supplement the standard

statistical approaches which are often used. We can then compare their performance and

investigate the relative benefits of both methods (traditional statistical approaches vs

machine learning approaches), in terms of the amount of predictive utility they offer, their

interpretability, and the relative challenges associated with their implementation.

Previous machine learning models have not been able to reach the performance needed for

clinical use in children’s social care settings (Clayton et al., 2020). In part, this may be

explained by the datasets used to build these models to date. The evidence describing the

mechanisms underlying mental ill health is rapidly evolving to reveal the role of biological

factors (e.g. physical health, immunology, inflammation and genetics). These interact with

early life experiences and the environment to confer resilience and susceptibility to mental

health problems. Therefore, given this multi-factorial nature of mental ill health, we

hypothesise that building accurate models to identify mental health problems will require

access to large, representative datasets of multi-domain data that reflect this broad range of

bio–psycho–social factors. In previous work (not yet published), we developed a framework

of risk factors for mental health problems based on Bronfenbrenner and Morris’s 2006

bioecological model (Bronfenbrenner & Morris, 2006), with emphasis on identifying risk

factors relevant to underserved populations. This work resulted in the identification of 287

risk factors which we grouped into a framework of eight domains. In this report, we present

work from the next stage of the study, in which we explore the utility of routinely collected

datasets for predicting young people’s mental health problems. To do this, we created and

characterised a linked multi-agency database (containing datasets from health, social care

and education) relating to most children in Wales and including the broad range of variables

identified in our earlier work. We used this linked database to measure the prevalence of

mental health problems within the cohort. This database was then used to explore various

machine learning methods to identify mental health problems in children in social care

settings.

Objectives and research questions

In this study, we aimed to create a linked database of health, social care and education data

in order to measure childhood mental health problems and their associated risk factors. A

linked database is useful for this purpose as it allows mental health problems and associated

risk factors to be measured from different agencies with which young people come into

contact, as opposed to limiting measurement to a single agency (such as GP or A&E); this

should allow for more comprehensive and holistic measurement of mental health problems

and their associated risk factors.

In addition to this measurement, we aimed to develop prototype models for early

identification of mental health problems of young people in social care settings. This work

was carried out within the Secure Anonymised Information Linkage (SAIL) Databank, but will

also inform the development of a similar linked database in Cambridgeshire and

Peterborough, known as CADRE (Child and Adolescent Data REsource; [formerly known as

Cam-CHILD]).

The study research questions were:

1. What is the best method of measuring mental health problems and risk factors for

young people’s mental health problems in linked administrative datasets?

2. What is the prevalence and distribution of mental health-associated problems and their

risk factors? How do patterns of mental health-associated problems vary between

social care, health and educational settings? How do they vary across Wales, UK?

3. What is the unrecognised mental health need in social care settings?

4. What are the relationships between risk factors and mental health problems?

5. What are the best methods for building predictive risk models and early identification

tools for young people’s mental health problems for use in social care settings?

6. Can findings and methods be replicated across databases (i.e. translated to CADRE)?

Design

For the measurement of mental health problems and associated risk factors, a retrospective

cohort study design was used, with cross-sectional analysis. For machine learning

approaches, the same cohort study and particular elements of this (i.e. site-level data) were

used, with the data being split into training, test and validation sets.

Our cohort of young people was defined as anyone who was aged 0–17 years in the period

between 1 January 2013 and 31 March 2020; all retrospective and subsequent data for

these individuals were included even if it fell outside this time period. The final cohort

consisted of 1.1 million young people in Wales, of which 46,704 had social care data and

were thus used in sub-sample analysis for early identification model prototyping. Though the

overall cohort comprises 1.1 million young people, sample sizes differ quite substantially

between datasets (as shown below in Table 3.2).

Findings

When measured in the Welsh GP dataset (WLGP), 14.85% of our cohort had at least one

mental or psychological health condition of interest, with mood disorders being most

common (12.96%) and severe mental illness (SMI) such as schizophrenia and bipolar being

least common (0.11%). When measured in the Patient Episode Dataset for Wales (PEDW),

the prevalence of any mental health condition of interest was 4.78%, with mood disorders

still being most common (2.73%) and SMI being least common (0.13%). In the rest of the

datasets we used, the prevalence of any mental health conditions of interest was between

<0.00% and 1.33%.

With regards to the measurement of risk factors, we found risk factors fell on a spectrum of

measurability, ranging from “directly measurable” to “derivable” to “measurable by proxy”

(defined in Table 3.3). We focused our efforts on the former two, and found important factors

associated with childhood mental health problems were spread across different data

sources, rather than being confined to any one particular database. Of 287 risk factors of

interest, 101 (35.19%) were measurable, of which 48 (16.72%) were directly measurable

and 53 (18.47%) were derivable. Of 101 risk factors of interest relating specifically to

underserved populations, 37 (36.63%) were measurable; this was broken down into 26

(25.74%) which were directly measurable and 11 (10.89%) which were derivable.

For the prototype early identification model in social care settings, we developed simple

statistical models and both basic Neural Network models with Rectified Linear Unit (ReLU)

activations and Graph Neural Networks (GNNs). The best-performing GNN model achieved

an AUROC (Area Under the Receiver Operating Characteristics Curve) of .815 and the best-

performing Neural Network achieved an AUROC of .800. These results indicate that the

GNN approach may provide a promising method for identifying young people with a mental

health diagnosis. However, greater accuracy and further validation is required prior to

considering clinical implementation. In comparison, standard logistic regression models

achieved an AUROC of .803.

Conclusions

This work in the SAIL Databank demonstrated that it was possible to link together multi-

agency data from social care, health and education settings. With this linked data, we were

then able to measure the prevalence of different mental health conditions and their

associated risk factors. Due to creating a linked multi-agency database, we were able to

measure different bio–psycho–social risk factors which would not have been measurable in

single-agency data. We could then include these in prototype early identification models.

Though these models’ performance was not sufficient for clinical use, they provide a solid

foundation to improve on. Mental health problems have bio–psycho–social causes and

correlates; thus, if we are to build accurate and implementable early identification tools, bio–

psycho–social databases from routine sources are likely to be required. This project

highlights the worth of bringing together rich data from different organisations with which

young people interact.

In summary, linkage of multi-agency data offers a promising way of developing early

identification tools because early warning signs for mental health problems which may be

missed in single-agency data can be combined, leading to a stronger signal for detecting

developing problems. Early identification of potential problems means that young people and

their families can be offered more timely and proportionate support, instead of waiting in

distress for problems to worsen and meet service thresholds. Furthermore, as more robust

early identification tools are developed, staff in contexts such as social care can use them as

an adjunct for decision-making to help them identify young people who may have additional

needs and to support smoother care pathways for young people and their families.

Ongoing work

This study is currently ongoing. As such, in this report, we present the findings to date. Thus

far, we have: gained approval to access all 18 desired databases; linked 18 databases in

SAIL; characterised our 18 databases of interest; characterised our cohort of interest;

mapped 287 risk factors to SAIL metadata and explored their measurability (Research

Question 1); measured mental health problems in health datasets (Research Question 1);

explored relationships between risk factors and mental health outcomes (Research

Questions 4 and 5); and developed an early prototype of a risk prediction tool for social

care settings (Research Question 5).

We have successfully applied to extend our access to the SAIL Databank and will continue

our analysis in the linked database we have created as part of this work. Ongoing work

involves: measuring the prevalence and distribution of risk factors for mental health

problems (Research Question 2); measuring the prevalence of mental health problems in

the health datasets when linked together (Research Question 2); measuring the distribution

of mental health problems by region (Research Question 2); measuring unrecognised

mental health need in social care (Research Question 3); improving the accuracy of the

protype risk prediction tool (before it could be considered for clinical use) (Research

Question 5); and replicating this work in the CADRE database (Research Question 6).

1. Introduction

Background and problem statement

There are high levels of mental health need in children’s social care settings (Berridge et al.,

2020; DfE, 2020, 2021; Maguire et al., 2019). However, the data to estimate the actual level

of need is very poor and existing figures are likely to be vast underestimates. In particular,

poor integration of information held about young people makes it difficult to accurately

estimate mental health need in this population. Access to childhood mental health support

can be challenging, and there are even more barriers to access for young people within

children’s social care settings (What Works for Children’s Social Care, 2016). It is important

to provide suitable mental health support in a timely manner to those who need it. However,

the current system is not set up to do this well, because it is unclear which interventions are

most useful and there is no clear way to effectively identify young people who have mental

health needs in social care settings. Moreover, young people in these settings have distinct

mental health needs (Care Leavers’ Association, 2017; The Child Safeguarding Practice

Review Panel, 2021), and there is some evidence that standard mental health interventions

may be harmful for some young people with a history of social care contact, for example,

looked-after children (Fong et al., 2015). As such, it is critically important to provide risk

factor-informed interventions to this population (for example, specific trauma-informed and

non-stigmatising interventions). At present, without this approach, outcomes for young

people with mental health problems in social care settings are poor, including high levels of

deliberate self-harm, crises, behavioural difficulties, difficulties accessing education, long-

term placements and NEET (i.e. Not in Education, Employment or Training), all of which can

lead to poor long-term health and social outcomes (Sanders, 2020).

In summary, without an effective means of early identification, young people and their

families can suffer for prolonged periods without suitable mental health support (DfE, 2020,

2021). Furthermore, a failure to identify risk factors and mental health-associated problems

early can delay treatment and lead to limited interventions failing to address significant

causes of a young person’s difficulties (Allen, 2011; DHSC & DfE, 2018). If accurate early

identification tools could be developed, this could help young people in children’s social care

receive more timely support. Machine learning methods offer one potential way to learn from

existing data on risk factors for young people’s mental health problems in order to build

effective predictive models for early identification of mental health problems. Previous

machine learning models have not been able to reach the performance needed for clinical

use in children’s social care settings (Clayton et al., 2020). In order to build accurate risk

prediction models suitable for clinical implementation, we hypothesise that an approach

using linked, multi-agency data with a large number of observations is required, effectively

linking risk factor data from social care, education and healthcare datasets.

To address the aforementioned problems, we suggest that we need to:

• Accurately understand the prevalence and distribution of mental health problems and

associated risk factors in social care settings across different geographical regions

• Understand the specific relationships between risk factors and mental health

outcomes

• Provide this information to commissioners so that they can match service funding to

the specific needs of the local populations, and make evidence-based and targeted

commissioning decisions, which offer a more effective use of the limited funds and

resources (including staff) available for service provision

• Develop reliable early identification tools, which do not rely on already overstretched

Child and Adolescent Mental Health Services (CAMHS).

Our hope is that this will lead to:

• Quicker access to assessment and intervention

• Enablement of research into interventions for young people with mental health

problems in social care

• Clarity for social workers about young people who are challenging to diagnose and

signpost

• Facilitation of conversations about access to mental health services. In turn, this will

improve outcomes and experiences for young people and their families through

better integration and access to mental health services.

Study aims

1. Expedite the build of a linked administrative database in Cambridgeshire and

Peterborough by using ADP/SAIL database to refine methods to:

a. Operationalise the measurement of mental health problems and risk factors

within multi-agency data

b. Develop methods to map the prevalence and distribution of mental health

problems and associated risk factors in multi-agency data

c. Estimate unidentified mental health need within social care.

2. Explore relationships between exposure to risk factors and mental health outcomes.

3. Explore the best methods for developing accurate and usable child and adolescent

mental health risk prediction algorithms.

4. Begin applying these methods to the CADRE database, in order to test validity of the

database and generalisability of the risk prediction algorithms.

Research questions