PreprintPDF Available

Development of a Suicide Early Warning System for Hong Kong Using Statistical Process Control and Nowcasting Based on Media Reported Suicide Deaths

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Background: Timely monitoring of suicide trends is a difficult task because suicide surveillance research often relies on data published by the government or health authorities, which are usually delayed due to investigative and administrative processes. The practice of nowcasting using recently published media reports to predict real-time trends provides new opportunities for timely suicide monitoring. Methods: We used a public dataset that collected local suicide news reporting in Hong Kong from 1 January 2019 to 31 December 2020. These reporting cases were summarized daily and categorized by age, gender, district, place of suicide, and suicide methods. The summarized daily suicide news information was regressed to the exact suicide count on the same day based on the data provided by the Coroner’s Court, which is responsible for death ascertainment of all unnatural causes of death (including suicide) to become the official record of suicide numbers for Hong Kong. In addition, a cumulative sum (CUSUM) and XmR control chart based on the predicted suicide count was proposed to monitor the suicide trend and alert the community by issuing warnings in a timely manner. Outcomes: The results showed that using elastic net regression yielded the best performance at a mean absolute error of 0·896. The CUSUM chart has shown the specificity is at 0·974, and the specificity of the XmR chart was at 0·963 compared with monitoring on the actual suicide count. Interpretation: This study showed that using local suicide news could allow for accurate and timely nowcasting to monitor real-time suicide trends. Such information could prove crucial for the development and implementation of suicide prevention interventions, particularly in response to new and emerging trends.
Content may be subject to copyright.
1
Development of a suicide early warning system for Hong Kong using statistical process
control and nowcasting based on media reported suicide deaths
Yu Cheng Hsu 1, PhD
Ingrid D. Lui 1,2, MPH
Tsz Mei Lam 1, BSc
Paul Siu Fai Yip 1,2, PhD
1 The Hong Kong Jockey Club Centre for Suicide Research and Prevention, Department of
Social Work and Social Administration, Faculty of Social Sciences, The University of Hong
Kong, 2/F, The University of Hong Kong Jockey Club Building for Interdisciplinary
Research, 5 Sassoon Road, Pokfulam, Hong Kong SAR, China
2 Department of Social Work and Social Administration, Faculty of Social Sciences, The
University of Hong Kong, 5/F, The Jockey Club Tower, The Centennial Campus, The
University of Hong Kong, Hong Kong SAR, China
Corresponding author:
Prof. Paul Siu Fai Yip
Address: The Hong Kong Jockey Club Centre for Suicide Research and Prevention, 2/F, The
University of Hong Kong Jockey Club Building for Interdisciplinary Research, 5 Sassoon
Road, Pokfulam, Hong Kong SAR, China
Email: sfpyip@hku.hk
Telephone: +852 2831 5232
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
2
ABSTRACT
Background
Timely monitoring of suicide trends is a difficult task because suicide surveillance research
often relies on data published by the government or health authorities, which are usually
delayed due to investigative and administrative processes. The practice of nowcasting using
recently published media reports to predict real-time trends provides new opportunities for
timely suicide monitoring.
Methods
We used a public dataset that collected local suicide news reporting in Hong Kong from 1
January 2019 to 31 December 2020. These reporting cases were summarized daily and
categorized by age, gender, district, place of suicide, and suicide methods. The summarized
daily suicide news information was regressed to the exact suicide count on the same day
based on the data provided by the Coroner’s Court, which is responsible for death
ascertainment of all unnatural causes of death (including suicide) to become the official
record of suicide numbers for Hong Kong. In addition, a cumulative sum (CUSUM) and
XmR control chart based on the predicted suicide count was proposed to monitor the suicide
trend and alert the community by issuing warnings in a timely manner.
Outcomes
The results showed that using elastic net regression yielded the best performance at a mean
absolute error of 0·896. The CUSUM chart has shown the specificity is at 0·974, and the
specificity of the XmR chart was at 0·963 compared with monitoring on the actual suicide
count.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
3
Interpretation
This study showed that using local suicide news could allow for accurate and timely
nowcasting to monitor real-time suicide trends. Such information could prove crucial for the
development and implementation of suicide prevention interventions, particularly in response
to new and emerging trends.
Funding
Quality Education Fund and the Hong Kong Jockey Club Charities Trust.
KEYWORDS
suicide prediction; media reporting; monitoring and surveillance; early warning system
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
4
INTRODUCTION
Like many other places around the world, suicide prevention has become an increasingly
important issue for Hong Kong. Based on data obtained from the Coroner’s Court, the age-
standardized suicide rate for 2022 was estimated to be 10·6 per 100,000 population,1 which is
slightly higher than the World Health Organization’s 2019 global estimate of 9·0 per 100,000
population.2 It has also been estimated that there are eight times as many suicide attempts as
there are completed suicides,3 thus early identification of population-level suicide risk is crucial
to formulate effective suicide prevention measures at times of heightened risk. The use of
surveillance data to monitor occurrences of suicide within communities allows relevant
stakeholders to get a sense of current suicide case numbers, ongoing or emerging trends (e.g.,
in high-risk populations or suicide methods), and the effectiveness of current interventions in
reducing suicides. However, the collection of timely surveillance data remains a challenge in
Hong Kong and around the world.
Suicide surveillance typically relies on two major data sources: death records (e.g., Coroner’s
Court reports,4 police records,5 or health records6) or government statistics (e.g., vital
statistics7). However, both data sources have several limitations which hinder the collection of
timely data. First, both data sources often delay the publication of official statistics due to
lengthy investigation and ascertainment processes. For example, in Hong Kong, all deaths
reported to the Coroners will be investigated by the police, yet the Coroner’s Court estimates
that the submission of police reports can take around one year, if not longer.8 Second, accessing
death records specifically usually involves accessing personal information, yielding privacy
concerns for both the deceased and their families. Taken together, these limitations make it
hard for relevant authorities to take preventive measures in a timely manner, and policies and
interventions that are based on outdated data may not truly reflect the current situation.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
5
Getting a more timely estimation of suicide events is of researchers’ and front line helping
professionals’ interest. Finding relevant indicators which are updated more frequently than
official records is a major research trend. For example, one editorial described how several
countries used police reports of suspected suicides as a proxy for actual suicide death records,9
while another used registered records on suspected suicide deaths as an indicator of real-time
suicide death numbers.10 Nonetheless, these data are still not always available in many
countries and are hard to gather, so alternative sources of data are needed to predict, rather than
monitor, suicide trends. Preliminary research has suggested that nowcasting methods, which
estimate “the number of events that have occurred when a certain proportion has not yet been
reported,” can estimate trends in suicide to make up for the time lag in data reporting mentioned
previously.11 Two main data sources have previously been used to nowcast suicide trends:
online keyword searching trends,12,13 and media reports.14–16 As an example that used both data
sources, Chai et al.17 used Google Trends search terms data and suicide-related media reporting
data to predict possible outbreaks within the following week. The present study will focus on
the latter data source, and in particular, modify the concept developed by Zeng et al.16 to
nowcast the suicide count based on the characteristics of the cases with regularized regression.
Beyond real-time estimation of suicide events, continuous monitoring and taking action as
early as possible is the ultimate goal for suicide prevention. One group of researchers tried to
gather the surveillance data to create a dashboard summarizing these records.18 Publishing
estimated suicide case data on its own can make it difficult to discern specific trends and sudden
outbreaks, therefore some suicide researchers have suggested taking the statistical process
control-perspective to detect variations in suicide rates.10 First proposed by Shewart,19 control
charts have long been used in industrial settings to detect abnormal variations in production
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
6
quality, but the approach has since been applied to epidemiological monitoring as well.10 In
suicide surveillance research, Chai et al.17 previously adopted the cumulative sum (CUSUM)
control chart method to detect the change point of the suicide patterns based on the internet
searching frequency. Spittal et al.10 proposed using Shewart chart and exponentially weighted
moving average (EWMA) charts to monitor the special variation in Australia based on the
preliminary register records.
Extending from the existing literature, we sought to develop a suicide case nowcasting and
statistical process control system based on the suicide cases reported in the local news. This
approach integrated previous works in the estimation of suicide deaths and continuous
monitoring and aimed to provide more accurate suicide count nowcasting with real-time
statistical process control monitoring. This approach could help to overcome the inherent time
delay in official reporting data for suicide incidence and serve as a supplement to official
suicide reports and an alternative to suicide monitoring systems to provide timely, albeit less
accurate estimations in Hong Kong. By leveraging the timeliness of news reporting to nowcast
suicide trends, this work aims to create a real-time suicide early warning system to alert relevant
stakeholders to adverse suicide trends, which could enable them to respond in a timely manner.
METHODS
This study developed a statistical learning method to nowcast the actual suicide counts through
the information revealed in the local Chinese-language news from 1 January 2019 to 31
December 2020. The data in the full year of 2019 was used to train and select the best model
for nowcasting suicide cases, and the data in 2020 was used to evaluate the model performance
as the reporting of all suicide cases that occurred in 2020 by the Coroner’s Court would have
been completed by 2023.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
7
Data Source
Two datasets were used in this study: Coroner’s Court reports and local online news reports.
According to the Coroners Ordinance,20 all unnatural deaths (including suspected suicides)
must be investigated by the Coroner’s Court. The process involves both police investigations
and Coroner’s inquests, making the Coroner’s Court dataset the most comprehensive source
available to identify the actual number of deaths by suicide that occurred in Hong Kong.
Suicide is determined as the cause of death only if the Coroner's Court can reasonably eliminate
the possibility of accident, homicide, or other causes, and suicide cases are subsequently
classified using the intentional self-harm codes X60-X84 in the International Classification of
Diseases 10th Revision.21 With consent from the Coroner's Court, we obtained all suicide death
case reports issued from 1 January 2019 to 31 December 2020. A total of 946 and 1,102 suicide
cases were identified in 2019 and 2020, respectively. However, the exact date of death could
not be determined for 249 cases in 2019 and 203 cases in 2020 even after the Coroner’s
investigation, and they were subsequently removed from our analysis. Ethics approval was
obtained from the Human Research Ethics Committee of the University of Hong Kong
(EA210305).
Local suicide news reports were obtained from an online public database.22 The database has
collected suicide news from five major local Chinese-language news media outlets since 2019,
and recorded the news report itself as well as information about the death. This included the
deceased’s gender, age, the district of suicide (categorized into 18 districts according to the
Hong Kong government district planning), the place of suicide (categorized as either a ‘public’
or ‘private’ area), and the method of suicide (categorized into ‘hanging,’ ‘jumping,’ ‘poisoning,’
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
8
or ‘others’). We extracted all local suicide news reports published between 1 January 2019 and
31 December 2020, identifying 1,456 reports. Cases were classified using the same data labels
as the online public database except for age, which was condensed into three age groups (‘24
years and below,’ ‘25 to 64 years,’ or ‘65 years and above’), and no record was excluded.
Nowcasting daily suicide cases
The daily suicide count was estimated by counting the number of suicide news reports in the
local Chinese-language news on the same day. Suicide news data was aggregated into the daily
summary. Three regression methods were tested, namely, elastic nets, XGBoost, and adaptive
boosting. The model to be estimated can be described as:
y
𝑡
=
β
𝑔
G
t
+
β
𝑎
t
+
β
𝑑
D
t
+
β
𝑝
P
t
+
β
𝑚
M
t
+
𝜖
where
y
𝑡
is the exact suicide count on the specific date, and
G
t
,
t
,
D
t
,
P
t
, and
M
t
are the gender,
age group, district of suicide, place of suicide, and method of suicide count on the same date
from the local news report and an error term
𝜖
. For example, there were two suicide cases
reported on 1 January 2019: one was a 30-year-old male and the other was a 70-year-old female,
both of whom died in the North District. Thus, the corresponding gender and columns will take
the value 1, and the North District column will take the value 2. This setting is also beneficial
to understanding the local media’s pattern in reporting suicide news because the importance
(or weight) of the variable indicates the news preference for reporting this type of suicide.
News reports from 2019 were used for training the model and obtaining the best model using
mean absolute error (MAE) as the evaluation metric. MAE is the mean absolute difference
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
9
between the predicted suicide count (
𝑦
𝑡
) and the actual suicide count (
𝑦
𝑡
) at the date
𝑡
and can
be expressed as:
MAE
=
𝑛
𝑖
=
1
|
𝑦
𝑖
𝑦
𝑖
|
𝑛
The news in 2020 was used for testing the performance of the proposed model. The model was
built using Python 3·9 with the elastic net, XGBoosting, and Adaptive boosting in the scikit-
learn package.
Developing the continuous monitoring of suicide through statistical quality control
Two monitoring schemes were implemented to monitor the suicide news trend in terms of
instant surges in suicide events, and short-period increasing trends of suicide events. The first
instant break of suicide news was monitored by the XmR chart to detect the statistical outliers,
and the short-period increasing trend was monitored by the CUSUM control chart.
Monitoring instant surge
An XmR chart was implemented to monitor outliers. The monitor statistics is:
𝑋
𝑡
=
(
𝑦
𝑖
+
𝑦
𝑖
+
1
)/2
The control limit is defined
𝑋
𝑡
±
𝐸
2
𝑀𝑅
, where
𝐸
2
is the control chart constant (2·659 in our
case), and
𝑀𝑅
is the average of the moving range MR, where
𝑀𝑅
=
|
𝑦
𝑖
+
𝑦
𝑖
+
1
|
/2
Monitoring short-period increasing trend
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
10
A CUSUM chart was used to monitor the short-period suicide trends. The CUSUM monitor
process can be described as:
S
0
=
0
S
𝑡
=
max
[
0,
𝑆
𝑡
1
+
𝑍
𝑡
𝑘
]
Where
S
𝑡
is the monitor statistics with an initial value of 0 and altering over time.
𝑍
𝑡
is the
normalized nowcasted suicide counts and k is the slack value that CUSUM tolerates. In this
study, we empirically set the upper control limit at 3·5 times of standard error and the slack
value at 0·6 standard error of the data. This specification was set to determine the in-control
average run length at a year, and the average run length to detect an out-of-control pattern is 6
observations.23 The specificity and F1 score are used to compare the difference out-of-control
points using the actual suicide count and the nowcast suicide count. Both CUSUM and XmR
chart was built using R 4·2·1 with “qcc” packages.24
RESULTS
The demographic information of suicide cases reported in the local Chinese-language news
media are presented in Table 1. The statistical tests showed that there was no statistical
difference in the composition of news-reported suicide cases between 2019 and 2020. Based
on this observation, we used the suicide news reported in 2019 as a training dataset and tested
it on the 2020 data.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
11
Table 1. Summary of suicide cases reported in local Chinese news
2019 (N = 701)
n (%)
2020 (N = 755)
n (%)
P
Gender
0.254
Female
238 (34.0)
279 (37.0)
Male
463 (66.0)
476 (63.0)
Age Group
0.542
≤24
63 (9.0)
60 (7.9)
25-64
447 (63.8)
472 (62.5)
≥65
191 (27.2)
223 (29.5)
District of Suicide
0.970
Central & Western
39 (5.6)
32 (4.2)
Eastern
49 (7.0)
53 (7.0)
Islands
15 (2.1)
20 (2.6)
Kowloon City
32 (4.6)
29 (3.8)
Kwai Tsing
54 (7.7)
64 (8.5)
Kwun Tong
73 (10.4)
83 (11.0)
North
26 (3.7)
23 (3.0)
Sai Kung
27 (3.9)
40 (5.3)
Sha Tin
62 (8.8)
64 (8.5)
Sham Shui Po
49 (7.0)
44 (5.8)
Southern
28 (4.0)
22 (2.9)
Tai Po
24 (3.4)
29 (3.8)
Tsuen Wan
28 (4.0)
31 (4.1)
Tuen Mun
41 (5.8)
46 (6.1)
Wan Chai
21 (3.0)
21 (2.8)
Wong Tai Sin
44 (6.3)
54 (7.2)
Yau Tsim Mong
42 (6.0)
43 (5.7)
Yuen Long
47 (6.7)
57 (7.5)
Living in Public Housing
202 (28.8)
201 (26.6)
0.381
Method of Suicide
0.734
Carbon monoxide
69 (9.8)
72 (9.5)
Hanging
125 (17.8)
140 (18.5)
Jumping
432 (61.6)
457 (60.5)
Injury
63 (9.0)
78 (10.3)
Poisoning
12 (1.7)
8 (1.1)
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
12
The suicide news reported in 2019 was used to train the model and tune hyperparameters using
five-fold cross-validation. The suicide news reported in 2020 was used to report the MAE. The
MAE for different regression methods was at 0·896 for elastic net, 1·066 for XGBoost, and
0·962 for adaptive boosting. The result (Figure 1) indicated that using suicide news to predict
the actual suicide count on average miscounts the actual suicide case by less than one with the
elastic net or adaptive boosting method.
Figure 1: Actual suicide count and the nowcast suicide count using elastic net
Table 2 shows the variable importance of the presented regression methods. The greater
variable importance value indicates the corresponding type of suicide cases are less reported.
All regression methods show identical agreements on the jumping suicide with the greatest
variable importance. Elastic net and adaptive boosting also suggest that older adults’ suicide
was the second most important feature.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
13
Table 2: Variable importance measure of different regression methods
Regression Method
Elastic net
XGBoost
Adaptive boosting
Gender
Female
Baseline
Baseline
Baseline
Male
0.024
0.017
0.035
Age Group
24
Baseline
Baseline
Baseline
25-64
0.166
0.024
0.041
65
0.264
0.031
0.109
District of Suicide
Central & Western
Baseline
Baseline
Baseline
Eastern
0.092
0.040
0.010
Islands
0
0.068
0.004
Kowloon City
0
0.031
0
Kwai Tsing
0
0.026
0.003
Kwun Tong
0
0.025
0
North
0
0.041
0.009
Sai Kung
0
0.052
0
Sha Tin
0
0.017
0
Sham Shui Po
0
0.015
0
Southern
0
0.022
0
Tai Po
0
0.019
0
Tsuen Wan
0
0.051
0.009
Tuen Mun
0
0.014
0.001
Wan Chai
0
0.026
0.029
Wong Tai Sin
0
0.030
0.011
Yau Tsim Mong
0
0.031
0.006
Yuen Long
0.012
0.026
0.001
Place of Suicide
Private
Baseline
Baseline
Baseline
Public
0
0.050
0.075
Method of Suicide
Carbon monoxide
Baseline
Baseline
Baseline
Hanging
0.282
0.040
0.089
Jumping
0.515
0.231
0.564
Injury
0
0.029
0.002
Poisoning
0
0.041
0
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
14
Using predicted suicide count to monitor suicide trends
A CUSUM control chart and an XmR chart were used to examine the capability of using the
nowcast suicide counts to monitor the change in suicide trends in sudden surges of suicide
cases and short-term increasing trends. CUSUM control charts (Figure 2) and XmR chart
(Figure 3) for the actual suicide count and the nowcast suicide counts were presented to show
the applicability of statistical quality control on monitoring suicide cases. The first year (1
January 2019 to 31 December 2019) data were used for the calibration and construction of the
control limits of both CUSUM and XmR chart, and the second year of the nowcast data was
used to report the performance compared with the statistical monitoring results based on the
actual suicide counts. The CUSUM chart on the actual suicide count revealed that it had been
in the out-of-control status after 7 April 2020 which is during the first wave of the COVID-19
outbreak. Still, the CUSUM control chart using suicide news reports could correctly identify
the out-of-control period at the early development of the out-of-control period with the
specificity at 0·974, and the F1 score of 0·606. The XmR chart was also to detect the sudden
surge of outliers with the specificity at 0·963 and the F1 score of 0·500.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
15
Figure 2: CUSUM chart for actual suicide counts and nowcasted suicide count.
Red points are days that are out-of-control limits.
Figure 3: XmR chart for actual suicide counts and nowcasted suicide count.
Red points are days that are out-of-control limits.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
16
DISCUSSION
This study improved on the work from Zeng et al.16 using local news to nowcast the exact
suicide counts, and extended the work from Chai et al.17 monitoring suicide trends with not
only a short shift of the trend but also outliers from through statistical process control approach.
There are two main advantages of this study. The first advantage is that it can provide a timely
estimation of suicide counts and capture the suicide trend for a short period with little time
delay. The other advantage is that the data comes from publicly available news reports, which
raises fewer privacy concerns in accessing individual information compared with using medical
records. Based on these advantages, this work could help to improve the accuracy and
timeliness of the monitoring system for implementing suicide prevention strategies, policy-
making implications, and its evaluation.
The proposed model could accurately nowcast suicide counts of the day, proactively monitor
suicide cases, and identify possible outbreaks of suicide through news reporting. The elastic
net method outperforms the rest of the non-linear methods (i.e., XGBoosting and adaptive
boosting), and this implies that there would be a linear relationship between the news-reported
suicide cases and the actual suicide counts. In Figures 2 and 3, the CUSUM and XmR charts
have shown the capability of monitoring outliers and short-period suicide trends. Therefore,
the nowcast suicide counts are accurate enough to act as an early warning indicator to show if
there is an abnormal suicide trend or large-scale mental-health crisis.
Evaluating the statistical process control performance on suicide surveillance is still a
challenging topic. Statistical process control the monitoring characteristics are in a stable
process, which means there is no temporal trend and change in the variance within the
monitoring period. We would argue this is a strong assumption in the context of disease
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
17
surveillance. Our CUSUM suggests there is a systematic increment after April 2020 and the
alarm was kept as the suicide counts never went down to the level in 2019. The other concern
is selecting the parameters of the CUSUM and XmR charts. This work selected the parameters
based on the statistical properties of these charts, but it is encouraged that practitioners adjust
the parameters with their own justification. Nonetheless, it derives another research question
on evaluating the performance of control charts. Out-of-control situations are usually rare and
sparse, using classification metrics such as F1 score balancing the positive and negative cases
might not be an appropriate method. In this work, specificity was adopted as the major
performance metric as the false positive alarm might pose fatigue to the user.
Using an appropriate proxy to approximate the real-time suicide count is crucial for accurate
estimation and monitoring. One previous study found that online searching trends could predict
younger-age suicide cases better than other age groups,17 while another from the US found that
combining online searching trends with offline health service utilization data could improve
the prediction.12 These two studies suggest that online searching trends might only represent a
specific group of suicide cases, and some other groups of people may be under-represented
among online searching trends. Consequently, we believe that using reported suicide cases
from the local news is more representative of the whole Hong Kong population than searching
engine trends, given the fact that Hong Kong is an aging society with a higher suicide rate
among older adults than youngsters.
This approach could contribute to suicide research in that we could identify what kinds of
suicide cases are less likely to be reported in the news. Suicide news plays an important role in
public perception of suicide methods.25 The variables with higher importance in the model
indicate that these kinds of characteristics are more likely to be underreported. Older adults’
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
18
suicide and suicides by jumping are the most under-reported suicide cases according to the
models. It can be seen as a kind of ageism for suicide deaths among older adults. The models
show that media are more likely to underreport suicide cases due to age, and suicide methods
in Hong Kong. A possible explanation is that the suicide rate and population size vary in
different age groups,26–28 with different prevalence of suicide methods.29 Our models indicated
that underreported cases are more likely to be the group with more suicide cases or more
prevalent methods. Consequently, reporters might think such cases might hold less value to be
reported in the news. This selective reporting preference could misinform the community about
the risk of suicide for certain vulnerable groups.
Compared with a previous study studying the same topic in Hong Kong in 2015-2016,30 we
could observe a change in the reporting patterns, and this change might be an indicator of the
change in media professionals’ perspectives. First, the geographic location of the reported cases
in our study showed little association with the actual suicide count as only two out of eighteen
districts carry coefficients not equal to zero. In contrast, the previous study showed that the
under-reporting patterns were significantly associated with different geographic locations in
seven districts, and these seven districts are more deprived areas.31 This might imply the
stigmatization of deprived areas was being eliminated over time. The other difference from the
previous study is the under-reporting pattern of suicide methods. Our study reported an under-
reporting pattern in hanging and jumping, but a previous study reported that jumping and
charcoal-burning suicide are more likely to be reported. We supposed that hanging and jumping
are the two major suicide methods (account for 80% of the suicide deaths in Hong Kong,) and
consequently carry less news value to the public.32 In summary, the reporting style changed
from the geographical locations to the focus on the deceased demographic attributes and less
used suicide methods.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
19
This study has a number of limitations worth noting. First, our method assumed that the news
reporting preference on suicide cases was stable over the studied period. This assumption was
evaluated in Table 1 within the study period, and the results showed that there was no statistical
difference between the two studied years. Nonetheless, it is not guaranteed whether this
assumption will hold in the future. An adaptive approach to the data collection and model
update for estimation should be incorporated to ensure the model's performance in the long
run. So as to the statistical monitoring process, regular revision of the monitoring statistics and
calibrating the monitoring range is necessary. Second, this study excluded suicide cases for
which the Coroner’s Court could not determine the date of death. Although around 20% of the
suicide cases were ignored due to this reason, we believe it would not affect the monitoring in
the short time interval due to the fact that the exact death time is missing. Furthermore, it seems
reasonable to assume that if the Coroner’s Court is still not able to date back a suicide case
after their lengthy investigation process, it is highly likely that news reports would not be able
to identify the exact date of death either. Third, the high accuracy comes from the higher
reported rate in the news because Cantonese journalists tend to bring up the discussion on
certain social problems through suicide cases.32 Using news-reported cases in the area with a
lower report rate might not reflect the actual suicide counts.
Future work could consider several directions. First, an adaptive online learning and modeling
strategy to dynamically adjust the model and monitor the process based on the update of exact
suicide statistics. This can facilitate continuous prediction and monitoring of changing news
report behavior and suicide rates. The second future direction is incorporating a variety of
social surveillance, e.g., sentiment on social media, and immediate social information such as
celebrity suicides, public holidays, weather, etc to the model to improve the performance.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
20
This study presented a new approach to nowcast suicide case counts in a population, and
demonstrated how to use the nowcast numbers to monitor the suicide trends in short periods
and without much delay. Nevertheless, the performance of our model in other settings might
need to be evaluated as the reporting rate of suicide deaths could be an important factor for its
accuracy in the nowcasting. This approach can provide valuable information to issue early
warning of suicide risk raise public awareness about suicide and provide timely interventions
to decrease suicide-related deaths and prevent them from happening, particularly in places
where a real-time suicide monitoring system does not currently exist.
CONTRIBUTORS
YCH, IDL and PSFY conceptualized the study. YCH and IDL developed the methodology.
TML conducted the data curation and formal analysis. YCH validated the results, wrote the
original draft of the manuscript, and prepared visualization. IDL reviewed and edited the
manuscript. PSFY provided supervision, managed project administration, and acquired funding.
All authors had full access to all the data in the study and approved the final version.
DATA SHARING
Coroner’s Court data are available on request from the corresponding author; the data are not
publicly available due to privacy issues. The local suicide news reports database is publicly
available at https://hkspd.siuyeong.com/. The final model is implemented and visualized in
https://suicideearlywarning.hku.hk/eng/.
DECLARATION OF INTERESTS
The authors declare no competing interests.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
21
ACKNOWLEDGEMENTS
This study is funded by the Quality Education Fund and the Hong Kong Jockey Club Charities
Trust.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
22
REFERENCES
1. Hong Kong Jockey Club Centre for Suicide Research and Prevention. Latest Figures
show Youth Suicides are on the Rise CSRP together with SPS Propose Four Initiatives
Urging Society to “Co-Create Hope” through Action [Press Release]. 2023.
https://csrp.hku.hk/content/uploads/2023/09/2023-WSPD-Press-Con_Press-
Release_20230908.pdf.
2. World Health Organization. Suicide worldwide in 2019: global health estimates.
Geneva: World Health Organization, 2021.
3. Lo WH. Suicide and Attempted Suicide in Hong Kong - with a Note on Prevention.
Hong Kong J Ment Health 2017; 43(1): 58–67.
4. Wong PWC, Cheung DYT, Conner KR, Conwell Y, Yip PSF. Gambling and
Completed Suicide in Hong Kong: A Review of Coroner Court Files. Prim Care Companion
CNS Disord 2010; 12(6): e1–e7.
5. Li F, Lu X, Ou Y, Yip PSF. The influence of undetermined deaths on suicides in
Shanghai, China. Soc Psychiatry Psychiatr Epidemiol 2019; 54: 111–9.
6. Simon GE, Johnson E, Lawrence JM, et al. Predicting Suicide Attempts and Suicide
Deaths Following Outpatient Visits Using Electronic Health Records. Am J Psychiatry 2018;
175(10): 951–60.
7. Hedegaard H, Curtin SC, Warner M. Suicide Mortality in the United States, 1999–
2019. Hyattsville, MD: National Center for Health Statistics, 2021.
8. Coroner's Court. Coroners' Report 2022. Hong Kong: Hong Kong Judiciary, 2023.
9. Baran A, Gerstner R, Ueda M, Gmitrowicz A. Implementing Real-Time Data Suicide
Surveillance Systems. Crisis 2021; 42(5): 321–7.
10. Spittal MJ, Roberts L, Clapperton A. Using Real-Time Suicide Monitoring Systems
to Inform Policy and Practice. Crisis 2023; 44(6): 445–50.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
23
11. Rossen LM, Hedegaard H, Warner M, Ahmad FB, Sutton PD. Early provisional
estimates of drug overdose, suicide, and transportation-related deaths: Nowcasting methods
to account for reporting lags. Hyattsville, MD: National Center for Health Statistics, 2021.
12. Choi D, Sumner SA, Holland KM, et al. Development of a Machine Learning Model
Using Multiple, Heterogeneous Data Sources to Estimate Weekly US Suicide Fatalities.
JAMA Netw Open 2020; 3(12): e2030932.
13. Sumner SA, Alic A, Law RK, Idaikkadar N, Patel N. Estimating national and state-
level suicide deaths using a novel online symptom search data source. J Affect Disord 2023;
342: 63–8.
14. Cui JS, Yip PSF, Chau PH. Estimation of reporting delay and suicide incidence in
Hong Kong. Stat Med 2004; 23(3): 467–76.
15. Harris KM, Thandrayen J, Samphoas C, et al. Estimating Suicide Rates in Developing
Nations: A Low-Cost Newspaper Capture-Recapture Approach in Cambodia. Asia Pac J
Public Health 2016; 28(3): 262–70.
16. Zeng XY, Chau PH, Yip PSF. Improving the monitoring of suicide incidence by
estimating the probability of news reporting. Stat Med 2019; 38(26): 5103–12.
17. Chai Y, Luo H, Zhang Q, Cheng Q, Lui CSM, Yip PSF. Developing an early warning
system of suicide using Google Trends and media reporting. J Affect Disord 2019; 255: 41–9.
18. Benson R, Brunsdon C, Rigby J, et al. The development and validation of a dashboard
prototype for real-time suicide mortality data. Front Digit Health 2022; 4: 909294.
19. Shewart WA. Economic Quality Control of Manufactured Product. Bell Syst Tech J
1930; 9(2): 364–89.
20. Coroners Ordinance, Cap. 504 (2024).
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
24
21. World Health Organization. International Statistical Classification of Diseases and
Related Health Problems 10th Revision. 2019. https://icd.who.int/browse10/2019/en#/X60-
X84.
22. Yeong B. 香港自殺報道資料庫 [Hong Kong Suicide Report Database]. 2022.
https://hkspd.siuyeong.com/.
23. Cox MAA. Beyond the Nomogram: Rapid Selection of Parameters for a CUSUM
Chart. Qual Eng 2003; 16(1): 1–12.
24. Scrucca L. qcc: An R package for quality control charting and statistical process
control. R News 2004; 4: 11–7.
25. Cheng Q, Yip PSF. Media Representation of Suicide in Various Societies: A Critical
Review. In: Kumar U, ed. Suicidal Behaviour: Underlying dynamics. London: Routledge;
2014: 186–201.
26. Wong PWC, Caine ED, Lee CKM, Beautrais A, Yip PSF. Suicides by jumping from a
height in Hong Kong: a review of coroner court files. Soc Psychiatry Psychiatr Epidemiol
2014; 49: 211–9.
27. Wu KCC, Chen YY, Yip PSF. Suicide Methods in Asia: Implications in Suicide
Prevention. Int J Environ Res Public Health 2012; 9(4): 1135–58.
28. Yang CT, Yip PSF. Changes in the epidemiological profile of suicide in Hong Kong:
a 40-year retrospective decomposition analysis. China Popul Dev Stud 2021; 5: 153–73.
29. Yeung CY, Men VY, Guo Y, Yip PSF. Spatial–temporal analysis of suicide clusters
for suicide prevention in Hong Kong: a territory-wide study using 2014–2018 Hong Kong
Coroner's Court reports. Lancet Reg Health West Pac 2023; 39: 100820.
30. Zeng X, Chau P, Yip PS. Improving the monitoring of suicide incidence by estimating
the probability of news reporting. Stat Med 2019; 38(26): 5103–12.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
25
31. Hsu C-Y, Chang S-S, Lee ES, Yip PS. Geography of suicide in Hong Kong: spatial
patterning, and socioeconomic correlates and inequalities. Soc Sci Med 2015; 130: 190–203.
32. Cheng Q, Fu K-w, Caine E, Yip PS. Why do we report suicides and how can we
facilitate suicide prevention efforts? Crisis 2014.
This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=4957509
Preprint not peer reviewed
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Background This study aimed to (i) identify high-risk suicide-methods clusters, based on location of residence and suicide incidence; and (ii) compare the characteristics of cases and spatial units inside and outside clusters. Methods Suicide data of 4672 cases was obtained from the Coroner's Court reports in Hong Kong (2014–2018). Monthly aggregated suicide numbers based on location of residence, and suicide incidence, were obtained in small tertiary planning units (STPUs). Community-level characteristics and population of STPUs were retrieved from 2016 Census. Retrospective space-time analyses were performed to identify locations with elevated suicide rates over specific time periods, i.e., spatial–temporal clusters. Clusters were evaluated for overall suicide (any method), as well as jumping, hanging, and charcoal burning methods, in location of residence and suicide incidence. Bi-variate analysis was performed to compare the characteristics of cases, and spatial units, inside and outside the clusters. Findings Suicide clusters involving jumping and charcoal burning were identified, but no hanging clusters were found. The within-cluster distribution of types of housing was different from that of outside. For most of the overall suicide and suicide by jumping clusters, spatial units within the clusters were more socially disadvantaged compared to those outside. Interpretation Clusters varied by suicide methods, location of residence and location of incidence. The findings highlighted the need for consistent and concerted support from different stakeholders within suicide clusters, to ensure appropriate design, implementation and sustainability of effective suicide prevention programs. Funding General Research Fund (37000320) and seed fund from the 10.13039/501100003803University of Hong Kong (104006710).
Article
Full-text available
Introduction/Aim Data visualisation is key to informing data-driven decision-making, yet this is an underexplored area of suicide surveillance. By way of enhancing a real-time suicide surveillance system model, an interactive dashboard prototype has been developed to facilitate emerging cluster detection, risk profiling and trend observation, as well as to establish a formal data sharing connection with key stakeholders via an intuitive interface. Materials and Methods Individual-level demographic and circumstantial data on cases of confirmed suicide and open verdicts meeting the criteria for suicide in County Cork 2008–2017 were analysed to validate the model. The retrospective and prospective space-time scan statistics based on a discrete Poisson model were employed via the R software environment using the “ rsatscan ” and “ shiny” packages to conduct the space-time cluster analysis and deliver the mapping and graphic components encompassing the dashboard interface. Results Using the best-fit parameters, the retrospective scan statistic returned several emerging non-significant clusters detected during the 10-year period, while the prospective approach demonstrated the predictive ability of the model. The outputs of the investigations are visually displayed using a geographical map of the identified clusters and a timeline of cluster occurrence. Discussion The challenges of designing and implementing visualizations for suspected suicide data are presented through a discussion of the development of the dashboard prototype and the potential it holds for supporting real-time decision-making. Conclusions The results demonstrate that integration of a cluster detection approach involving geo-visualisation techniques, space-time scan statistics and predictive modelling would facilitate prospective early detection of emerging clusters, at-risk populations, and locations of concern. The prototype demonstrates real-world applicability as a proactive monitoring tool for timely action in suicide prevention by facilitating informed planning and preparedness to respond to emerging suicide clusters and other concerning trends.
Article
Full-text available
In 2019, suicide was the 10th leading cause of death for all ages in the United States (1). As the second leading cause of death for ages 10-34 and the fourth leading cause for ages 35-54, suicide is a major contributor to premature mortality (2). Recent reports have documented a steady increase in suicide rates over the past two decades (3-6). This Data Brief uses final mortality data from the National Vital Statistics System to update trends in suicide rates from 1999 through 2019 and to describe differences by sex, age group, and means of suicide.
Article
Full-text available
Importance Suicide is a leading cause of death in the US. However, official national statistics on suicide rates are delayed by 1 to 2 years, hampering evidence-based public health planning and decision-making. Objective To estimate weekly suicide fatalities in the US in near real time. Design, Setting, and Participants This cross-sectional national study used a machine learning pipeline to combine signals from several streams of real-time information to estimate weekly suicide fatalities in the US in near real time. This 2-phase approach first fits optimal machine learning models to each individual data stream and subsequently combines predictions made from each data stream via an artificial neural network. National-level US administrative data on suicide deaths, health services, and economic, meteorological, and online data were variously obtained from 2014 to 2017. Data were analyzed from January 1, 2014, to December 31, 2017. Exposures Longitudinal data on suicide-related exposures were obtained from multiple, heterogeneous streams: emergency department visits for suicide ideation and attempts collected via the National Syndromic Surveillance Program (2015-2017); calls to the National Suicide Prevention Lifeline (2014-2017); calls to US poison control centers for intentional self-harm (2014-2017); consumer price index and seasonality-adjusted unemployment rate, hourly earnings, home price index, and 3-month and 10-year yield curves from the Federal Reserve Economic Data (2014-2017); weekly daylight hours (2014-2017); Google and YouTube search trends related to suicide (2014-2017); and public posts on suicide on Reddit (2 314 533 posts), Twitter (9 327 472 tweets; 2015-2017), and Tumblr (1 670 378 posts; 2014-2017). Main Outcomes and Measures Weekly estimates of suicide fatalities in the US were obtained through a machine learning pipeline that integrated the above data sources. Estimates were compared statistically with actual fatalities recorded by the National Vital Statistics System. Results Combining information from multiple data streams, the machine learning method yielded estimates of weekly suicide deaths with high correlation to actual counts and trends (Pearson correlation, 0.811; P < .001), while estimating annual suicide rates with low error (0.55%). Conclusions and Relevance The proposed ensemble machine learning framework reduces the error for annual suicide rate estimation to less than one-tenth of that of current forecasting approaches that use only historical information on suicide deaths. These findings establish a novel approach for tracking suicide fatalities in near real time and provide the potential for an effective public health response such as supporting budgetary decisions or deploying interventions.
Article
Full-text available
This study tested a low-cost method for estimating suicide rates in developing nations that lack adequate statistics. Data comprised reported suicides from Cambodia’s 2 largest newspapers. Capture-recapture modeling estimated a suicide rate of 3.8/100 000 (95% CI = 2.5-6.7) for 2012. That compares to World Health Organization estimates of 1.3 to 9.4/100 000 and a Cambodian government estimate of 3.5/100 000. Suicide rates of males were twice that of females, and rates of those <40 years were twice that of those ≥40 years. Capture-recapture modeling with newspaper reports proved a reasonable method for estimating suicide rates for countries with inadequate official data. These methods are low-cost and can be applied to regions with at least 2 newspapers with overlapping reports. Means to further improve this approach are discussed. These methods are applicable to both recent and historical data, which can benefit epidemiological work, and may also be applicable to homicides and other statistics.
Article
A timely estimate of suicide incidence is important for surveillance and monitoring but always difficult if not possible. The delay in reporting suicide cases between the time of occurrence of the deaths and them being registered is unavoidable. There is at least one year if not more of the delay time in the latest WHO website reporting the suicide statistics of most countries. Based on the daily newspaper reporting on suicide incidence, this study proposes a method to estimate the unknown incidence in a timely manner. It is shown that demographic characteristics such as age, suicide methods, and the districts of the deceased were significantly associated with the probability of the newspapers reporting the suicides. By incorporating this information on the daily suicide news reports into estimating the probability of the newspapers reporting the suicides, the daily number of suicide cases can be estimated. The proposed method is applied to estimate the number of suicides in Hong Kong where there is the Coroner's Court to investigate into suicide deaths, but it takes at least six months to deliver a verdict. The present method can generate timely and accurate estimations on the daily count of suicide deaths with only a one day lag. In a threefold nested cross‐validation, the proposed approach has achieved an average RMSE of 1.38, MAE of 1.10, and R2 of 0.24. It can also serve as a surveillance system in providing estimations of temporal clusters of suicides with certain characteristics timelessly and accurately.
Article
Objective: The authors sought to develop and validate models using electronic health records to predict suicide attempt and suicide death following an outpatient visit. Method: Across seven health systems, 2,960,929 patients age 13 or older (mean age, 46 years; 62% female) made 10,275,853 specialty mental health visits and 9,685,206 primary care visits with mental health diagnoses between Jan. 1, 2009, and June 30, 2015. Health system records and state death certificate data identified suicide attempts (N=24,133) and suicide deaths (N=1,240) over 90 days following each visit. Potential predictors included 313 demographic and clinical characteristics extracted from records for up to 5 years before each visit: prior suicide attempts, mental health and substance use diagnoses, medical diagnoses, psychiatric medications dispensed, inpatient or emergency department care, and routinely administered depression questionnaires. Logistic regression models predicting suicide attempt and death were developed using penalized LASSO (least absolute shrinkage and selection operator) variable selection in a random sample of 65% of the visits and validated in the remaining 35%. Results: Mental health specialty visits with risk scores in the top 5% accounted for 43% of subsequent suicide attempts and 48% of suicide deaths. Of patients scoring in the top 5%, 5.4% attempted suicide and 0.26% died by suicide within 90 days. C-statistics (equivalent to area under the curve) for prediction of suicide attempt and suicide death were 0.851 (95% CI=0.848, 0.853) and 0.861 (95% CI=0.848, 0.875), respectively. Primary care visits with scores in the top 5% accounted for 48% of subsequent suicide attempts and 43% of suicide deaths. C-statistics for prediction of suicide attempt and suicide death were 0.853 (95% CI=0.849, 0.857) and 0.833 (95% CI=0.813, 0.853), respectively. Conclusions: Prediction models incorporating both health record data and responses to self-report questionnaires substantially outperform existing suicide risk prediction tools.