ChapterPDF Available

Automated Underwriting in Life Insurance: Predictions and Optimisation


Abstract and Figures

Underwriting is an important stage in the life insurance process and is concerned with accepting individuals into an insurance fund and on what terms. It is a tedious and labour-intensive process for both the applicant and the underwriting team. An applicant must fill out a large survey containing thousands of questions about their life. The underwriting team must then process this application and assess the risks posed by the applicant and offer them insurance products as a result. Our work implements and evaluates classical data mining techniques to help automate some aspects of the process to ease the burden on the underwriting team as well as optimise the survey to improve the applicant experience. Logistic Regression, XGBoost and Recursive Feature Elimination are proposed as techniques for the prediction of underwriting outcomes. We conduct experiments on a dataset provided by a leading Australian life insurer and show that our early-stage results are promising and serve as a foundation for further work in this space.
Content may be subject to copyright.
Automated Underwriting in Life Insurance: Predictions
and Optimisation (Industry Track)
Rhys Biddle, Shaowu Liu, Guandong Xu
Advanced Analytics Institute, University of Technology Sydney.,,
Abstract. Underwriting is an important stage in the life insurance process and is
concerned with accepting individuals into an insurance fund and on what terms.
It is a tedious and labour-intensive process for both the applicant and the under-
writing team. An applicant must fill out a large survey containing thousands of
questions about their life. The underwriting team must then process this applica-
tion and assess the risks posed by the applicant and offer them insurance products
as a result. Our work implements and evaluates classical data mining techniques
to help automate some aspects of the process to ease the burden on the under-
writing team as well as optimise the survey to improve the applicant experience.
Logistic Regression, XGBoost and Recursive Feature Elimination are proposed
as techniques for the prediction of underwriting outcomes. We conduct experi-
ments on a dataset provided by a leading Australian life insurer and show that
our early-stage results are promising and serve as a foundation for further work
in this space.
1 Introduction
The concept of an insurance fund is to create a pool of wealth such that an unfortunate
loss incurred by the few can be compensated by the wealth of the many [9]. It is clear
that the success of this concept relies on accurately making the distinction between the
few, those deemed to have a high chance of claiming, and the many, those deemed to
have a low chance of claiming. The process of making such decisions is called under-
The goal of underwriting from the perspective of the insurer is to accurately assess
the risk posed by individual applicants, where risk in life insurance can be considered as
the likelihood of an injury, sickness, disease, disability or mortality. A direct outcome of
the underwriting process is the decision to accept or decline an individuals access into
the insurance fund and what financial cost they should incur in exchange for access.
Most individuals will incur a standard cost for access to the fund but some may incur
a penalty, known as a loading, that should ideally reflect the level of risk they pose
and their likelihood of claiming. In addition to a loading an individual may be granted
special access to the fund but with certain claiming constraints attached, known as an
exclusion. An exclusion is applied to prevent a specific individual claiming as a result
of a particular event, back injury for example, but still allowing them access to the fund
and rights to claim for other events they are not excluded from. Correctly identifying
2 Rhys Biddle, Shaowu Liu, Guandong Xu
risks and applying the relevant loadings and exclusions during the underwriting process
is fundamental to maintaining the wealth of a life insurance fund.
Underwriting is a tedious and labor intensive process on behalf of both the applicant
and the underwriter. An applicant must fill out a highly personal questionnaire that
delves into almost all aspects of their life which can be up to 100 pages long and consist
of over 2and a half thousand questions, an imposing amount of paperwork that can
turn individuals off pursuing insurance cover. The industry is well aware of the growing
market group of millennials, forecast to reach 75 percent of the workforce by 2025, who
prioritise fast and seamless digital user experiences [1]. Current underwriting processes
do not allow for these kinds of digital experiences and insurers are aware that significant
time and costs must be allocated to transforming current practices in order to capture
the attention of this growing market [1]. In addition to being tedious on behalf of the
user this questionnaire must be closely examined by a team of skilled underwriters
who must follow guidelines mixed with intuition to arrive at a decision, resulting in
a process that takes many weeks to complete. The mixture of guidelines and intuition
is evident in the common phrase in the industry that underwriting is both an art and a
science [3]. It has been reported in industry trend analysis that there is a need to improve
the quantitative methods that make up the science aspect of the underwriting process
in order to maintain the relevancy of the industry. Fortunately, with recent advances
in machine learning and pattern recognition, it becomes possible to make significant
improvements to the decision making process.
Existing research on automated underwriting in life insurance sector is lacking in
broad and deep coverage. A combination of a neural network and fuzzy logic is pre-
sented without any experimental validation in [11], the strengths of the proposed ap-
proach is only informally justified with no proofs. The prototyping and implementation
of a knowledge based system given requirements gathered from a survey of 23 individ-
uals from Kenyan life insurance companies was proposed in [9], however, no experi-
mental validation provided. We believe that this lack of depth and coverage is due to
the difficulty in gaining access to real world life insurance datasets and the proprietary
nature of any endeavor to implement such a system. There is also a noted distinction
between life and non-life insurance industries in the literature due to the differing com-
plexity and sensitivity of the information involved. It has been shown in [13], that clas-
sical data mining algorithms, specifically Support Vector Regression(SVR) and Kernel
Logistic Regression (KLR), can successfully classify the risk and accurately predict in-
surance premiums in the automotive industry using real-world data. In this paper, we
aim to propose a prediction framework based on state-of-the-art machine learning al-
gorithms and evaluate on 9-years of life insurance data collected by a major insurance
provider in Australia.
The rest of the paper is organized as follows. In Section 2, we introduce the ba-
sic concepts of automated underwriting followed by problem formulation. Section 3 is
devoted to describing our methodology. Experiment results are presented in Section 4,
and Section 5 concludes.
Automated Underwriting in Life Insurance: Predictions and Optimisation 3
2 Preliminary
This section briefly summarises necessary background of automated underwriting
and problem formulation that form the basis of this paper.
2.1 Automated Underwriting
Automation of the underwriting process can assist in a number of ways and benefit
all parties involved. Timeliness of the underwriting process can be significantly im-
proved, instances of human error can be reduced and misunderstandings or knowledge
gaps in the underwriters can be filled. The current underwriting completion time frame
of weeks can be reduced significantly with the assistance of automated decision mak-
ing tools. Most applications go through the underwriting process with no exclusion or
loading applied, underwriters spend a lot of time dealing with these cases that could
be streamlined and allow that time to be spent focusing on the more complex cases.
In some instances rule-based expert systems have been crafted to identify and process
these simple applications but they are complex and cumbersome to update in light of
new information [3]. The breadth and detail covered by the thousands of questions
within the questionnaire requires a considerably deep and wide knowledge base to be
able to deeply understand the answers and the implications for risk. In addition to gain-
ing a thorough understanding of these numerous knowledge areas, an ambitious task
alone, there is the added difficulty of being able to identify the complex relationships
between the diverse knowledge areas and how they can be used to forecast risk. The use
of machine learning and pattern recognition tools can assist the underwriter in increas-
ing their knowledge base and identifying these complex relationships.
2.2 Underwriting Outcome
One of the most important underwriting outcomes is identifying exclusions. An exclu-
sion inhibits an individual from making a specific claim due to information gathered
from the applicants questionnaire. The reason for exclusions are numerous, in the thou-
sands, and considerably specific. This specific nature of the exclusion is necessary when
evaluating any claim made by an individual. If an applicant has a history of left knee
medical issues and experiences frequent pain or limitations as a result of these issues
than they may be excluded from making any claims that related to their left knee. As
well as numerous exclusions targeting specific claims they may also have a temporal
condition attached, such as a 90 day or 12 month exclusion from a particular claim.
Exclusions allow an insurance fund to tailor products to each individual applicant by
choosing what specific risks they are willing to accept and provide cover for and those
which they are not.
2.3 Exclusions Prediction Problem
The prediction of an exclusion can be approached as a supervised binary classifica-
tion problem. We have a dataset D={(x1, y1),(x2, y2),...,(xn, yn)}where xiis a
4 Rhys Biddle, Shaowu Liu, Guandong Xu
feature vector for applicant iand yiis a binary label indicating the presence of a par-
ticular exclusion for applicant i. The feature vector xiconsists of the responses to all
binary questions filled out by applicant i, some continuous features such as age and
sum insured amounts. The questionnaire covers a large range of information about each
applicant including family and medical history, occupation details, finances as well as
leisure activities. In the current process of underwriting a team experts comes up with yi
for each exclusion. We propose to learn a function fthat can accurately predict yigiven
xiusing the labels provided by the expert underwriters to evaluate the performance of
f. There are a few properties of this problem that make it an interesting supervised
classification problem. Firstly the questionnaire has been designed and refined over the
years to catch as many risky applicants as possible yet make it streamlined for the ap-
plicant. This results in a questionnaire that contains conditional-branching, which is
the creation of unique pathways through the questionnaire depending on responses to
particular questions. A result of this conditional branching is that the responses to the
questionnaire are considerably sparse because only a small subset of the questions need
to be answered by all applicants, i.e., the majority of xj
i= 0 for some questions j.
Questions are designed to catch exclusions so for any exclusion we expect a small sub-
set of feature vector xito be very strong features for the predictive task and the large
majority to be redundant. In addition to this sparsity we have the added issue of class
imbalance due to the specificity and rarity of exclusions. As mentioned previously ex-
clusions must be detailed enough so that the insurer can cover themselves at claim time
resulting in thousands of different and highly specific exclusion types.
3 Methodology
We propose to address the problems identified in the underwriting process and the
gaps in the existing research by implementing and evaluating two learning models to
the problem of exclusion prediction on a real-world dataset. There are two key goals
of this work, the prediction of exclusions and providing recommendations for ques-
tionnaire optimisation. In building our methodology both predictive accuracy and high
interpretability of results are equally important. This limits our choice of data prepara-
tion methods and learning algorithm as addressed in the following sections.
3.1 Feature Selection
Reducing the size of the feature space is an important first step in learning problems
and provides many benefits. A reduction in the size of feature vectors decreases the
learning time, can improve accuracy and avoid overfitting [8]. A decrease in learning
time is due to the smaller size of the training data after the reduction of the feature space.
Overfitting is a well known pitfall and occurs when a model learns the training data so
well that the predictive capabilities on new unseen data begins to suffer [14]. A large
feature space with numerous redundant features can lead to overfitting and a reduction
of these redundant features is a strategy to combat this pitfall [14]. In addition to this a
model trained on a large feature space is complex and can be difficult to interpret.
Automated Underwriting in Life Insurance: Predictions and Optimisation 5
There are two main approaches to feature space reduction, transformation-based
and selection-based methods. Transformation-based methods perform a transformation
of the initial input feature space to a smaller space [12, 7] where as selection-based
methods look to find an optimal subset of the original feature space [14].
Transformation-based methods are unsuitable for our work because they would de-
stroy the one-to-one relationship of feature to question response. Preservation of this
one-to-one relationship is key for us to assess the impact of individual questions and
the respective response provided by an applicant.
There are numerous approaches that can be taken for feature selection methods. Fil-
ter methods involve ranking features under a chosen criterion and specifying a thresh-
old at which to remove features from the feature space for training. Wrapper methods
use the prediction results of a learning algorithm to identify and select features that
are deemed important by the learning algorithm. In Embedded methods the feature se-
lection process is part of the learning algorithm and it is difficult to separate the two
processes. We have chosen to implement a wrapper method in our learning pipeline.
For the wrapper method we have chosen Recursive Feature Elimination (RFE) [8,
6]. RFE is an iterative wrapper method that consists of training a classifier on numerous
feature subsets and provides feature rankings for each subset. The three steps for RFE
are: i) train a classifier using feature set f; ii) get feature importances from trained
classifier, rank them; iii) remove a of subset the worst performing features for f. There
are two main parameters to be set for RFE, the size of the desired feature subset at the
end of the algorithm and the number of features to remove at each iteration. The size
of the desired feature subset can be found via cross-validation. RFE can be fit across
all training folds in the cross-validation loop and the feature subset that gives the best
averaged results across all testing folds can be selected as the optimal feature subset.
3.2 Prediction Models
Logistic Regression and Regularisation Logistic regression was chosen as a base-
line method because linear models are a favored tool in the insurance sector because of
the simple implementation, interperatability and their connection with traditional statis-
tics [10]. Logistic Regression is a popular statistical method used for modeling binary
classification problems by prescribing a weight to all input features to perform a linear
separation of the two classes. There is no feature selection inherent in the construc-
tion of Logistic Regression model however the addition of l1regularisation addresses
this. Logistic Regression with the addition of l1as penalty term is referred to as Lasso
Regression. The addition of this penalty term in Lasso Regression performs feature
selection because it shrinks the weights of unimportant features to zero.
Gradient Boosting Trees Gradient Boosting methods [5, 2] are tree-based ensemble
methods for supervised learning that are founded on the hypothesis that numerous weak
learners provide more accurate predictions than a single learner [10]. A weak learner
in this context is a simple model that can be considered to be only slightly better than
random guessing. The simplest approach to combining all the predictions from the in-
dividual learners to arrive at a single prediction is via a voting procedure. A prediction
6 Rhys Biddle, Shaowu Liu, Guandong Xu
by each weak learner is considered a vote and all of these are tallied up and the label
predict by most weak learners is chosen as the final prediction. A motivation for using
tree-based ensemble methods in insurance is that the decision making process is made
up a large number of simple conditional rules, if applicant ticks “yes” to question A but
“no” to question B then accept, which can be learnt by the numerous different weak
learners in the ensemble [10]. Interpretability of Gradient Boosting methods in com-
parison to other learning techniques of similar power and complexity, such as Neural
Networks and Support Vector Machines, is another motivation for using it in our work.
Gradient Boosting methods provide clear and intuitive metrics for each input feature
that indicate their importance in the resulting prediction, this aligns with our goal for
providing recommendations for questionnaire optimisation. The nature of tree construc-
tion in gradient boosting means that all variables are candidates for splitting the tree and
are evaluated. A direct result of this is that feature selection is inherent within the en-
semble construction and is capable of dealing with redundant features. In this work, the
XGBoost [2] implementation is employed.
3.3 Proposed Pipelines
We propose to use four separate feature selection and classification pipelines for imple-
mentation and evaluation. Firstly a pipeline of RFE with a standard Logistic Regression
model as the learning algorithm for the RFE process. Cross-validation will be used to
select the ideal number of features and the Logistic Regression model will be fit on
the reduced subset produced by the RFE procedure. Our second pipeline will consist
of Lasso Regression with the cross-validation used to select the ideal strength of the
l1penalty term. Another pipeline will be XGBoost with cross-validation to select the
ideal number of weak estimators and the learning rate. Lastly a pipeline of RFE with
XGBoost as learning algorithm.
4 Experiment
In this section, we introduce the experimental settings and a large-scale data collec-
tion from Australian insurer, followed by experiment results and discussions.
4.1 Dataset Preparation
We have been given access to data from a leading insurer in the Australian life in-
surance sector dating from 2009 to 2017. As with any real-world dataset a considerable
amount of effort was needed to prepare the data for modelling. There were several key
issues that needed to be addressed before modeling could be performed on the entire
dataset. Firstly the questionnaire data and the underwriting data had been stored and
maintained by separate entities due to privacy concerns. In addition to this the data had
changed hands several times across this time period due to organizational takeovers and
Automated Underwriting in Life Insurance: Predictions and Optimisation 7
vendor changes. Questionnaire data was most impacted by these changes and under-
went four major changes in this time period. There were numerous changes that were
made to the applicant data in how it was stored, such as different attributes and data
types with no master data dictionary available to resolve these changes. In this time
period the questionnaire itself had also changed with the addition, removal and modifi-
cation of the questions contained within. These issues are currently being resolved and
as a result the latest version of the questionnaire data, 2014-2017, has been used for
Fig. 1. Histogram of response proportion on questionnaire
A straightforward match between the questionnaire data and underwriting data was
not possible for privacy reasons and as a result we had to come up with a process to
merge the separate data stores. We used three attributes relating to the applicant found
in both application and underwriting data. These were related to the suburb, age and
gender of the applicant. In such a large dataset we found numerous applicants sharing
these traits so we used the date in which each applicant was entered into the two separate
systems to resolve any ambiguous cases. Through this process we were able to identify
approximately 60 thousand individuals from the 2014-2017 period.
As can be seen in Fig. 1 the response rates to questions are considerably low, the
majority of applicants fill out less than 10 percent of the entire questionnaire, due to
the conditional-branching structure of the questionnaire. This results in sparse feature
vectors for the majority of applicants. As well as the sparse feature vectors the data
exhibits sparsity in relation to the application of exclusions resulting in extreme class
imbalances when predicting the binary problem of exclusion application. There are
over 1 thousand different exclusions applied in the data set. Many of these exclusions
are extremely rare occurring far too infrequently, single digit frequency counts, and thus
8 Rhys Biddle, Shaowu Liu, Guandong Xu
Fig. 2. Histogram of application rate for the 30 most common exclusion codes in descending
not included in experiments. The most frequently applied exclusion is applied to only
1.5 percent of all applications, see Fig. 2.
4.2 Experimental Settings
We ran our proposed pipeline on the 20 most frequent exclusion codes. Our exper-
iments were conducted using the scikit learn library for the python programming lan-
guage. For the two pipelines that utilised RFE as feature transformation the following
was implemented. Nested cross validation (CV) loop containing RFE with CV as the
outer loop and Grid Search with CV as the inner loop to optimise the hyper-parameters
of the learning algorithm. The embedded feature selection approaches required no such
nested loop as there was no need for the transformation before prediction. CV was set
to 5 stratified folds for all experiments, seed was kept the same for all experiments to
ensure of the same dataset splits. For all learning algorithms the sampling procedure
was weighted to account for the class imbalance.
4.3 Evaluation Metrics
We used area under the Receiver Operating Curve (ROC) [4] to evaluate the perfor-
mance of our approach. An ROC curve is a plot of the rate of true positive predictions,
correctly predict an positive example, against the rate of false positive predictions, pre-
dict a positive label when in fact negative. This is plotted across all thresholds for the
prediction score of the model. The area under the ROC curve (AUC) is a single metric
that can be interpreted as the ability of a model to correctly rank a positive example as
Automated Underwriting in Life Insurance: Predictions and Optimisation 9
more likely to be positive than a negative example. AUC is a common tool for compar-
ing models in supervised binary classification problems.
4.4 Results and Discussions
Fig. 3. Prediction results on the 20 most common exclusion codes, ranked in descending order.
The results from four pipelines in this figure i) RFE-LOG : Recursive Feature Elimination with
Logistic Regression ii) LOG : Logisitic Regression with l1regularisation iii) XGB : Extreme
Gradient Boosting, iv) RFE-XGB : Recursive Feature Elimination with Extreme Gradient Boost-
Predictions The prediction results vary considerably between exclusion codes, see fig-
ure 3. The worst average AUC across all models is 0.83, while the best average AUC
is 0.96. In all but five of the top 20 exclusions setting a strong l1regularization penalty
on Logistic Regression provides greater or equal predictive accuracy when compared to
using RFE with CV as a feature selection phase before Logistic Regression with no l1
penalty as shown in Fig. 3. However the mean difference in AUC between Logistic Re-
gression with l1and RFE and Logistic Regression with no penalty is 0.006 which is
insignificant. The mean difference in AUC between XGBoost and RFE with XGBoost
is even more insignificant at only 0.0006. XGBoost and Logistic Regression with
l1regularisation deal adequately with the feature selection process requiring no need
for the prior feature selection step. Logistic Regression with l1is the best performing
model with an average AUC 0.0035 units greater than XGBoost the next best model.
There is little to separate these models in terms of predictive performance. The num-
ber of relevant features needed by each model shows a clear gap between the models.
XGBoost uses far fewer features to get similar accuracy as shown in Fig. 4. This has
10 Rhys Biddle, Shaowu Liu, Guandong Xu
Fig. 4. Number of features used by the four modelling pipelines i) RFE-LOG : Recursive Feature
Elimination with Logistic Regression ii) LOG : Logisitic Regression with l1regularisation iii)
XGB : Extreme Gradient Boosting, iv) RFE-XGB : Recursive Feature Elimination with Extreme
Gradient Boosting
implications for our recommendations for the questionnaire optimisation. Logistic Re-
gression on average uses 4 times as many features as XGBoost with a similar prediction
accuracy on average. Our recommendations for questionnaire optimisation is based on
the feature importance as given bu the XGBoost model.
Question Optimisation using Feature Importance We further explore the trained
model and discover insights of feature importance, i.e., the importance of each feature
(question) played for each exclusion. The result is shown as heatmap in Fig. 5 where
x-axis shows the selected exclusions and y-axis shows the features. Each cell is colored
from blue to red where red indicating the feature is high relevant to deciding the cor-
responding exclusion. For example, the heatmap shows that the question “Alcohol” is
commonly used for deciding the exclusion “lose of income”. Despite of the red cells,
the blue cells are also important for optimising the questions. For example, the question
Asthma Medication” has shown no relevance to any of the exclusions, which suggests
this is a potential redundant question. Note that due to the large number of questions
and exclusions, only a small fragment of the full heatmap is shown here.
5 Conclusions
In this paper, we implemented and evaluated a number of different machine learning
algorithms and feature selection methods to predict the application of exclusions in life
insurance applications. The results show that this simple approach performs well and
Automated Underwriting in Life Insurance: Predictions and Optimisation 11
Fig. 5. Feature importance as heatmap
can add value to the insurer. XGBoost is the most ideal model due to the need for
the significantly smaller number of features needed to produce similar accuracy. For
future work we would like to look into implementing a cost-sensitive approach to the
prediction of exclusions. Data from claims made by applicants along with the current
data would be needed to completely understand the cost of the underwriting decisions.
We currently have not processed enough of the dataset to utilize the claims data making
this approach unfeasible at the moment. Given that we only have last 3 years worth of
usable data at present moment the number of claims for this period is too small to be of
any use. Another direction for future work is the incorporation of the free text responses
provided by the applicants into the feature set.
1. Howlette B., Rajan M., and S. P. Chieng. Future of life insurance in australia. Technical
report, PricewaterhouseCoopers, 2017.
2. Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings
of the 22nd acm sigkdd international conference on knowledge discovery and data mining,
pages 785–794. ACM, 2016.
3. Gandhi D. and Kaul R. Life and health - future of life underwriting. Asia Insurance Review,
pages 76–77, Jun 2016.
4. Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861–874,
5. Jerome H Friedman. Stochastic gradient boosting. Computational Statistics & Data Analy-
sis, 38(4):367–378, 2002.
12 Rhys Biddle, Shaowu Liu, Guandong Xu
6. Pablo M Granitto, Cesare Furlanello, Franco Biasioli, and Flavia Gasperi. Recursive feature
elimination with random forest for ptr-ms analysis of agroindustrial products. Chemometrics
and Intelligent Laboratory Systems, 83(2):83–90, 2006.
7. Qinghua Hu, Jinfu Liu, and Daren Yu. Mixed feature selection based on granulation and
approximation. Knowledge-Based Systems, 21(4):294–304, 2008.
8. Guyon I., Weston J., Barnhill S., and Vapnik V. Gene selection for cancer classification using
support vector machines. Machine Learning, pages 389–422, 2002.
9. Joram M. K., Harrison B. K., and Joseph K. N. A knowledge-based system for life insurance
underwriting. International Journal of Information Technology and Computer Science, pages
40–49, 2017.
10. Guelman L. Gradient boosting trees for auto insurance loss cost modeling and prediction.
Expert Systems with Applications, pages 3659–3667, 2012.
11. Arora N. and Vij S. A hybrid neuro-fuzzy network for underwriting of life insurance. Inter-
national Journal of Advanced Research in Computer Science, pages 231–236, 2012.
12. Jensen R. and Shen Q. Semantics-preserving dimensionality reduction: rough and fuzzy-
rough-based approaches. IEEE Transactions on Knowledge and Data Engineering, pages
1457–1471, 2004.
13. Kacelan V., Kacelan L., and Buric M. N. A nonparametric data mining approach for risk
prediction in car insurance: a case study from the montenegrin market. Economic Research-
Ekonomska Istraivanja, pages 545–558, 2017.
14. Rodriguez-Galiano V.F., Luque-Espinar J.A., Chica-Olmo M., and Mendes M.P. Feature se-
lection approaches for predictive modelling of groundwater nitrate pollution: An evaluation
of filters, embedded and wrapper methods. Science of the Total Environment, pages 661–672,
... This determines whether the insurance applicant should be excluded from making a specific type of insurance claim based on their historical records. Statistically, this can be seen as a supervised classification problem [163]. ...
... Supervised learning methods: Both logistic regression and gradient boosted trees have been shown to provide good predictive performance for this problem [163]. However, gradient boosted trees generally require fewer features than logistic regression to generate similar predictive performance for this task. ...
Full-text available
Financial risk management avoids losses and maximizes profits, and hence is vital to most businesses. As the task relies heavily on information-driven decision making, machine learning is a promising source for new methods and technologies. In recent years, we have seen increasing adoption of machine learning methods for various risk management tasks. Machine-learning researchers, however, often struggle to navigate the vast and complex domain knowledge and the fast-evolving literature. This paper fills this gap, by providing a systematic survey of the rapidly growing literature of machine learning research for financial risk management. The contributions of the paper are four-folds: First, we present a taxonomy of financial-risk-management tasks and connect them with relevant machine learning methods. Secondly, we highlight significant publications in the past decade. Thirdly, we identify major challenges being faced by researchers in this area. And finally, we point out emerging trends and promising research directions.
... Feature interaction and importance is also useful in assessing risk across a wide range of insurance activities and informing underwriting and pricing of premiums. Biddle et al. (2018) add to literature on automated underwriting in life insurance applications using the XAI method of Feature Interaction and Importance. Recursive Feature Elimination is used to reduce the feature space through iteratively wrapping and training a classifier on several feature subsets and then providing feature rankings for each subset. ...
Full-text available
Explainable Artificial Intelligence (XAI) models allow for a more transparent and understandable relationship between humans and machines. The insurance industry represents a fundamental opportunity to demonstrate the potential of XAI, with the industry’s vast stores of sensitive data on policyholders and centrality in societal progress and innovation. This paper analyses current Artificial Intelligence (AI) applications in insurance industry practices and insurance research to assess their degree of explainability. Using search terms representative of (X)AI applications in insurance, 419 original research articles were screened from IEEE Xplore, ACM Digital Library, Scopus, Web of Science and Business Source Complete and EconLit. The resulting 103 articles (between the years 2000–2021) representing the current state-of-the-art of XAI in insurance literature are analysed and classified, highlighting the prevalence of XAI methods at the various stages of the insurance value chain. The study finds that XAI methods are particularly prevalent in claims management, underwriting and actuarial pricing practices. Simplification methods, called knowledge distillation and rule extraction, are identified as the primary XAI technique used within the insurance value chain. This is important as the combination of large models to create a smaller, more manageable model with distinct association rules aids in building XAI models which are regularly understandable. XAI is an important evolution of AI to ensure trust, transparency and moral values are embedded within the system’s ecosystem. The assessment of these XAI foci in the context of the insurance industry proves a worthwhile exploration into the unique advantages of XAI, highlighting to industry professionals, regulators and XAI developers where particular focus should be directed in the further development of XAI. This is the first study to analyse XAI’s current applications within the insurance industry, while simultaneously contributing to the interdisciplinary understanding of applied XAI. Advancing the literature on adequate XAI definitions, the authors propose an adapted definition of XAI informed by the systematic review of XAI literature in insurance.
... Technological developments have had a crucial impact on the insurance market, as on any other financial industry. In this line, machine learning (ML) algorithms receive special attention from researchers for addressing the following issues: insurance fraud detection [2][3][4], insurance premium prediction [5], underwriting process [6], claim analysis [7], risk prediction [8], sales forecasting [9], customer churn [10], and insurance tariff plans [11] among others. ...
Full-text available
Considering the large size of the agricultural sector in Romania, increasing the crop insurance adoption rate and identifying the factors that drive adoption can present a real interest in the Romanian market. The main objective of this research was to identify the performance of machine learning (ML) models in predicting Romanian farmers’ purchase of crop insurance based on crop-level and farmer-level characteristics. The data set used contains 721 responses to a survey administered to Romanian farmers in September 2021, and includes both characteristics related to the crop as well as farmer-level socio-demographic attributes, perception about risk, perception about insurers and knowledge about agricultural insurance. Various ML algorithms have been implemented, and among the approaches developed, the Multi-Layer Perceptron Classifier (MLP) and the Linear Support Vector Classifier (SVC) outperform the other algorithms in terms of overall accuracy. Tree-based ensembles were used to identify the most prominent features, which included the farmer’s general perception of risk, their likelihood of engaging in risky behaviour, as well as their level of knowledge about crop insurance. The models implemented in this study could be a useful tool for insurers and policymakers for predicting potential crop insurance ownership.
... Feature interaction and importance is also useful in assessing risk across a wide range of insurance activities and informing underwriting and pricing of premiums. Biddle et al. (2018) add to literature on automated underwriting in life insurance applications using the XAI method of Feature Interaction and Importance. Recursive Feature Elimination is used to reduce the feature space through iteratively wrapping and training a classifier on several feature subsets and then providing feature rankings for each subset. ...
Full-text available
Technological advancement has resulted in producing a large amount of unprocessed data. Data can be collected, processed, analyzed, and stored rather inexpensively. This capability has enabled to make innovations in banking, insurance, financial transactions, health care, communications, Internet, and e-commerce. Risk management is an integral part of the insurance industry as the risk levels determine the premiums of the insurance policies. Usually, insurance companies claim higher premiums from the insurance policy holders having higher risk factors. The higher the accuracy of risk evaluation of an applicant for a policy, the better is the accuracy in the pricing of the premium. Risk classification of customers in insurance companies plays an important role. Risk classification is based on the grouping of customers on the basis of their risk levels calculated by using machine learning algorithms on historical data. Evaluation of risk and calculation of premium are the function of underwriters.
The insurance industry is necessary for the global economy, the stability of the economic system, and its sustainability. However, the industry continues to face challenges related to customer service, market competition and re-engineering of processes. The insurance industry is no different to other industries, it is also affected by the forever changing customer requirements and demands and this fast growing digitalization. Almost every organization wants to be part of the new revolution because of its ability to bring company improvement in process efficiency, productivity and performance. Just like industrial revolution did more than a century ago, it continues to reshape the way humans live and perform their daily duties. As a result, organizational processes, customer expectations and the adoption of new channels, products, and services have all undergone significant change, forcing a reconsideration of business models. Digital technology has been embraced by many businesses because it has the potential to transform business operations, much as it did for other industries, and it has the potential to change the insurance industry as well. The sparse use of digital technology is the main hindrance to the South African insurance sector’s success. The main goal of this paper was to examine the extent in which digitalization has been explored in the South African long-term insurance industry compared to other industries. This was accomplished by doing a thorough PRISMA-based critical analysis of the current literature. Sixty-two accredited journal articles were assessed to understand the contribution that has been made by digital models in the previous research and to identify gaps and be able to contribute to the body of knowledge by introducing effective techniques to close the identified gaps. The first finding was that digitalization has not been thoroughly explored in the insurance industry and there is dearth of research about digitalization in this industry, particularly in the African continent. Secondly, the reviewed literature revealed that the application of Internet of Things and big data has the potential to enhance customer database in insurance business, because these digital tools can help insurers obtain customer data in real-time. This has been highlighted as a solution to the problems the long-term insurance sector is currently experiencing.KeywordsCompany performanceDigitalizationInsuranceLong-term insurance
This study attempts to structure methodologically the health insurance underwriting process by applying Multi-criteria Decision-making (MCDM) analysis in health insurance underwriting. This is done by assigning a score to each health insurance applicant which can be used to determine whether he or she is accepted, rejected or accepted with special terms and conditions (such as exclusions, additional waiting periods and/ or surcharge). The introduction of MCDM approaches in health insurance underwriting enables the quantification of the selection criteria, the increased standardization and automation of the process and its alignment through quantitative indicators with the risk tolerance/ risk appetite of the insurer, and there lie the novelties of this research. The proposed methodology can be readily implemented by insurers with added value in the underwriting, risk management and distribution (sales & marketing) functions, as well as in the profitability of the company or the level of premium paid by the insured.
Adverse selection (AS) is one of the significant causes of market failure worldwide. Analysis and deep insights into the Australian life insurance market show the existence of adverse activities to gain financial benefits, resulting in loss to insurance companies. Understanding the behavior of policyholders is essential to improve business strategies and overcome fraudulent claims. However, policyholders' behavior analysis is a complex process, usually involving several factors depending on their preferences and the nature of data such as data which is missing useful private information, the presence of asymmetric information of policyholders, the existence of anomalous information at the cell level rather than the data instance level and a lack of quantitative research. This study aims to analyze the life insurance policyholder's behavior to identify adverse behavior (AB). In this study, we present a novel association rule learning-based approach 'ARLAS' to detect the AS behavior of policyholders. In addition to the original data, we further created a synthetic AS dataset by randomly flipping the attribute values of 10% of the records in the test set. The experiment results on 31,800 Australian life insurance users show that the proposed approach achieves significant gains in performance comparatively.
Full-text available
The purpose of this work is to enhance the life insurance underwriting process by building a knowledge-based system for life insurance underwriting. The knowledge-based system would be useful for organizations, which want to serve their clients better, promote expertise capture, retention, and reuse in the organization. The paper identifies the main input factors and output decisions that life insurance practitioners considered and made on a daily basis. Life underwriting knowledge was extracted through interviews in a leading insurance company in Kenya. The knowledge is incorporated into a knowledge-based system prototype designed and implemented, built to demonstrate the potential of this technology in life insurance industry. Unified modelling language and visual prolog language was used in the design and development of the prototype respectively. The system's knowledge base was populated with sample knowledge obtained from the life insurance company and results were generated to illustrate how the system is expected to function.
Full-text available
For prediction of risk in car insurance we used the nonparametric data mining techniques such as clustering, support vector regression (SVR) and kernel logistic regression (KLR). The goal of these techniques is to classify risk and predict claim size based on data, thus helping the insurer to assess the risk and calculate actual premiums. We proved that used data mining techniques can predict claim sizes and their occurrence, based on the case study data, with better accuracy than the standard methods. This represents the basis for calculation of net risk premium. Also, the article discusses advantages of data mining methods compared to standard methods for risk assessment in car insurance, as well as the specificities of the obtained results due to small insurance market, such as Montenegrin.
Full-text available
Gradient boosting constructs additive regression models by sequentially fitting a simple parameterized function (base learner) to current “pseudo”-residuals by least squares at each iteration. The pseudo-residuals are the gradient of the loss functional being minimized, with respect to the model values at each training data point evaluated at the current step. It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure. Specifically, at each iteration a subsample of the training data is drawn at random (without replacement) from the full training data set. This randomly selected subsample is then used in place of the full sample to fit the base learner and compute the model update for the current iteration. This randomized approach also increases robustness against overcapacity of the base learner.
Recognising the various sources of nitrate pollution and understanding system dynamics are fundamental to tackle groundwater quality problems. A comprehensive GIS database of twenty parameters regarding hydrogeological and hydrological features and driving forces were used as inputs for predictive models of nitrate pollution. Additionally, key variables extracted from remotely sensed Normalised Difference Vegetation Index time-series (NDVI) were included in database to provide indications of agroecosystem dynamics. Many approaches can be used to evaluate feature importance related to groundwater pollution caused by nitrates. Filters, wrappers and embedded methods are used to rank feature importance according to the probability of occurrence of nitrates above a threshold value in groundwater. Machine learning algorithms (MLA) such as Classification and Regression Trees (CART), Random Forest (RF) and Support Vector Machines (SVM) are used as wrappers considering four different sequential search approaches: the sequential backward selection (SBS), the sequential forward selection (SFS), the sequential forward floating selection (SFFS) and sequential backward floating selection (SBFS). Feature importance obtained from RF and CART was used as an embedded approach. RF with SFFS had the best performance (mmce=0.12 and AUC=0.92) and good interpretability, where three features related to groundwater polluted areas were selected: i) industries and facilities rating according to their production capacity and total nitrogen emissions to water within a 3km buffer, ii) livestock farms rating by manure production within a 5km buffer and, iii) cumulated NDVI for the post-maximum month, being used as a proxy of vegetation productivity and crop yield.
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
In this paper we apply the recently introduced Random Forest-Recursive Feature Elimination (RF-RFE) algorithm to the identification of relevant features in the spectra produced by Proton Transfer Reaction-Mass Spectrometry (PTR-MS) analysis of agroindustrial products. The method is compared with the more traditional Support Vector Machine-Recursive Feature Elimination (SVM-RFE), extended to allow multiclass problems, and with a baseline method based on the Kruskal–Wallis statistic (KWS). In particular, we apply all selection methods to the discrimination of nine varieties of strawberries and six varieties of typical cheeses from Trentino Province, North Italy. Using replicated experiments we estimate unbiased generalization errors. Our results show that RF-RFE outperforms SVM-RFE and KWS on the task of finding small subsets of features with high discrimination levels on PTR-MS data sets. We also show how selection probabilities and features co-occurrence can be used to highlight the most relevant features for discrimination.
Feature subset selection presents a common challenge for the applications where data with tens or hundreds of features are available. Existing feature selection algorithms are mainly designed for dealing with numerical or categorical attributes. However, data usually comes with a mixed format in real-world applications. In this paper, we generalize Pawlak’s rough set model into δ neighborhood rough set model and k-nearest-neighbor rough set model, where the objects with numerical attributes are granulated with δ neighborhood relations or k-nearest-neighbor relations, while objects with categorical features are granulated with equivalence relations. Then the induced information granules are used to approximate the decision with lower and upper approximations. We compute the lower approximations of decision to measure the significance of attributes. Based on the proposed models, we give the definition of significance of mixed features and construct a greedy attribute reduction algorithm. We compare the proposed algorithm with others in terms of the number of selected features and classification performance. Experiments show the proposed technique is effective.