Conference PaperPDF Available

Fraud Detection and Analysis for Insurance Claim using Machine Learning

Authors:
978-1-6654-4940-3/22/$31.00 ©2022 IEEE
Fraud Detection and Analysis for Insurance Claim
using Machine Learning
1st Abhijeet Urunkar
Dept. of Computer Science and Engineering
Walchand College of Engineering
Sangli, India
abhijeet.urunkar@walchandsangli.ac.in
3rd Rashmi Bhat
Dept. of Computer Engineering
St. John College of Engineering and Management
Palghar, India
rashmib@sjcem.edu.in
2nd Amruta Khot
Dept. of Information Technology
Walchand College of Engineering
Sangli, India
amruta.khot@walchandsangli.ac.in
4th Nandinee Mudegol
Dept. of Computer Science and Engineering
Walchand College of Engineering
Sangli, India
nandinee.mudegol@walchandsangli.ac.in
Abstract—Insurance Company working as commercial
enterprise from last few years have been experiencing fraud
cases for all type of claims. Amount claimed by fraudulent is
significantly huge that may causes serious problems, hence
along with government, different organization also working to
detect and reduce such activities. Such frauds occurred in all
areas of insurance claim with high severity such as insurance
claimed towards auto sector is fraud that widely claimed and
prominent type, which can be done by fake accident claim. So,
we aim to develop a project that work on insurance claim data
set to detect fraud and fake claims amount. The project
implement machine learning algorithms to build model to label
and classify claim. Also, to study comparative study of all
machine learning algorithms used for classification using
confusion matrix in term soft accuracy, precision, recall etc. For
fraudulent transaction validation, machine learning model is
built using PySpark Python Library.
Keywords—Machine Learning Algorithm, PySpark, Fraud
Case detection, classifications
I. INTRODUCTION
Insurance fraud is a claim made for getting improper
money and not actual amount of money from insurance
company or any other underwriter. Motor and insurance area
unit two outstanding segments that have seen spurt in fraud.
Frauds is classified from a supply or nature purpose of read.
Sources is client, negotiator or internal with the latter two
being a lot of essential from control framework purpose of
reads.
Frauds cowl vary of improper activities that a private
might commit so as to attain the favorable outcome from an
underwriter. Frauds is classified into nature wise, for
example, application, inflation, identity, fabrication,
contrived, evoked accidents etc. This could vary from staging
incident, misrepresenting matters as well as pertinent
members and therefore reason behind finally the extent of
injury occurred. Probable things might embrace packing up
for a state of affairs that wasn’t lined beneath the insurance.
Misrepresenting the context of an event. This might embrace
transferring blames to the incidents wherever the insured set
is accountable, failure to require approved the security
measures. Increased impact of the incident .Inflated measure
of the loss occurred through the addition of not much related
losses or/and attributing inflated price to the increased
losses[1][2][3].
II. PROBLEM STSTEMENT
The traditional method for the detecting frauds depends on
the event of heuristics around fraud indicators. Supported
these, the selection on fraud created is said to occur in either
of situations like, in certain things the principles are shown if
the case should be interrogated for extra examination. In
numerous cases, an inventory would be prepared with scores
for various indicators of the occurred fraud. The factors for
deciding measures and additionally the thresholds are tested
statistically and periodically recalibrated. Associate
aggregation and then price of the claim would verify necessity
of case to be sent for extra examination. The challenge with
above strategies is that they deliberately believe on manual
mediation which might end in the next restrictions:
1. Inability to perceive the context-specific relationships
between the parameters (geography, client section,
insurance sales process) which may not mirror the typical
picture.
2. Constrained to control with the restricted set of notable
parameters supported the heuristic knowledge – whereas
being aware that a number of the opposite attributes might
conjointly influence the decisions.
3. Reconstruction of the given model is that the hand
operated exercise that need to be conducted sporadically
to react dynamic behavior. Also to make sure that the
model gives feedback from the examinations. The
flexibility to manage this standardization is tougher.
4. Incidence of occurrence of fraud is low - generally but
1percent of claims area unit classified.
5. Consultations with business specialists point out that
there is not a typical model to determine the model
exactly similar to the context
A. Motivation
Ideally, businesses ought to obtain the responses to
prevent fraud from happening or if that is out of the question,
to watch it before important damage is finished at intervals
406
2022 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES) | 978-1-6654-4940-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/SPICES52834.2022.9774071
Authorized licensed use limited to: Walchand College of Eng. Downloaded on May 20,2022 at 09:54:59 UTC from IEEE Xplore. Restrictions apply.
the strategy. In most of the companies, fraud is understood
entirely once it happens. Measures are then enforced to
forestall it from happening over again. At intervals the given
time that they can’t resist at different time intervals, but Fraud
detection is that the most effective suited issue for removing
it from the atmosphere and preventing from continuance once
more.
B. Significance of the Problem
Knowing a risk is that the beginning in bar, associated
intensive assessment offers the lightness that want. This is
typically usually performed exploitation varied
techniques, like interviews, surveys, focus teams, feedback
conducted anonymously, detailed study of record and
analysis to spot traffic pumpers, service users, and
subscription scam which are different fraudulent case. The
association of Certified Fraud Examiners offers a detailed
guide to follow. This can be usually
alleged to be a preventive methodology, fraud analysis and
detection is associate certain consequence of
associate intensive risk evaluation. Recognize and classify
threats to fraud in knowledge technology and
telecommunications sector stereotypically yield the shape of
the chances like:
Records showing associate degree inflated rates in calls
at associate degree surreal time of day to associate degree
uncertain location or far-famed fraud location.
Unusual Dialing patterns showing one variety being
referred to as additional of times by external numbers
than job out.
Increased calls created in an exceedingly day than the
minute’s allotted per day, that might indicate an account
has been hacked or shared
C. Major Contribution
To compare machine learning algorithms: LR, XGB,
DT, RF and SVM.
To construct a model that predict transactions could be
fraudulent with high accuracy.
To detect if an insurance claim is fraudulent or not.
To analyze the performance of fraud detection
algorithm
III. LITERATURE REVIEW
Machine learning is usually abbreviated as metric
capacity unit. The study of machine learning includes
computers with the implicit capability to be trained whereas
not being expressly programmed. This capacity unit focuses
on the expansion of pc programs that has enough capability
to alter, that square measure once unprotected to the new
information. Metric capacity unit algorithms square measure
generally classified into 3 main divisions that square measure
supervised learning, unattended learning and reinforcement
learning. Data processing a neighborhood of machine
learning has advanced considerably within the current years.
Data mining focuses at analysing the whole data obtained.
Furthermore data processing makes an attempt to seek out the
realistic patterns in it. On the contrary, within the different of
getting the knowledge for world understanding is within the
processing applications like machine learning, it uses the
knowledge to locate patterns in information and improvise
the program actions thereby. Mainly within the supervised
machine learning is that the objective of deducing which
means from label on the information used for the coaching.
The coaching information consists of a group of coaching
samples. Just in case of supervised learning, every instance
are often a base which incorporates Associate in Nursing
input object that’s considered the vector and also the output
features a worth that acts as an indicator to run the model. A
supervised learning rule initially accomplishes a groundwork
task from the sample information then tries to construct a
short lived perform, therefore it will plot new input vectors.
The supervised learning algorithms square measure
conspicuously employed in large choice of application areas.
Associate in Nursing best setting altogether the chance assist
the rule to accurately mark the class labels for close instances
and therefore a similar aspires supervised learning rule to
chop back from the knowledge to the enclosed objects in
terribly good manner[4][5][6].
The literature review in tabulated form is as
follows:
TABLE I. MACHINE LEARNING ALGORITHM COMPARISION
[7]
Algorithm Problem
Type
Average
Prediction
Accuracy
Training
speed
Prediction
Speed
Performance
well with small
number of
object?
Feature
Might
Need
Scaling?
KNN
Either
Lower
Fast
Depends
No
Yes
Regression Classification Lower Fast Fast Yes No
Support
Vector
Machine
Either Lower Fast(excluding
feature extraction) Fast Yes no
Decision Tress Either lower Fast Fast No No
407
Authorized licensed use limited to: Walchand College of Eng. Downloaded on May 20,2022 at 09:54:59 UTC from IEEE Xplore. Restrictions apply.
Random
Forests Either Higher Fast Moderate No No
XGB
Classification
Higher
Fast
Fast
No
Yes
IV. PROPOSED METHOD/ALGORITHM
The following is the proposed method of the model
development:
Different models are tested on the dataset once it is
obtained and cleaned.
On the basis of the initial model performance, different
features of the model are engineered and tested again.
Once all the options area unit designed, the model is
made and run victimisation completely different
completely different values and victimisation different
iteration procedures.
A predictive model is created that predicts if an
insurance claim is fraudulent or not.
Binary Classification task takes place which gives
answer between YES or NO. This report deals with
classification algorithm to detect fraudulent transaction.
A. Proposed System
The influence of the feature engineering, feature choice
parameter modification area unit explored with an aim of
achieving superior prophetic performance with superior
accuracy. The assorted machine learning techniques area unit
utilized in the development of accuracy of detection in
unbalanced samples. As a system, the info are divided into 3
completely different segments. These area unit loosely
coaching, testing and validation.
The algorithmic program is trained on partial set of
knowledge and parameters. These area unit later changed on
a validation set. This may be studied for evaluation and
performance on the particular testing dataset. The high acting
models area unit formerly tested with numerous random splits
of knowledge. This helps to confirm the consistency in results
the approach discussed above comprises of three layers.
Fig. 1. Layers for Model Building
B. System Architecture
Machine learning model is built with different algorithms
that is trained by information and data set provided which
predict new classification as “fraud” or “not” These
algorithms implemented for building model that is trained
using historical data and that predict unseen data with most
matching features. And then model is tested and validated to
evaluate its performance. After the calculations comparison is
made.
For automobile insurance fraud detection supply
regression shows the higher accuracy. Logistic regression
evaluates the connection among Y “Label” and also the X
“Features” by assessing possibilities employing a supply
perform. The model predicts a likelihood that is employed to
predict the label category. A supply perform or supply curve
may be a common curve with equation:
=+  + ⋯+  +  (1)
(
) = + + ⋯+  (2)
Where,
To implement the Logistic Regression using Python, we set
the following steps:
Data Pre-processing step: In this step, the data is ready
in order that are often employed in code with efficiency.
Extraction of the dependent and freelance variables from
the given dataset. Then the dataset is split as coaching
and checking victimisation train test split module from
sklearn library. Feature scaling is completed therefore on
get correct results of predictions
Fitting Logistic Regression to the Training set:
LogisticRegression category of the sklearn library is
employed. Classifier object is made and accustomed
work the model to the supply regression
Predicting the test result: The model is well trained on
the training set, the result is predicted by using test set
data.
Test accuracy of the result: Confusion matrix is
employed to judge the check accuracy. In this model of
fraud detection, the prediction is completed therefore on
check if deceitful dealings is claimed as deceitful and the
other way around.
Visualizing the test set result: Adjust the model fitting
parameters, and repeat tests. Adjust the model fitting
parameters, and repeat tests. Adjust the options or
machine learning algorithmic program and repeat tests.
The Methodology of this project is illustrated in below figure:
408
Authorized licensed use limited to: Walchand College of Eng. Downloaded on May 20,2022 at 09:54:59 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Prediction Model
C. Implementation Model
Pyspark:
Fraud detection in car Insurance claims are determined
employing a python module PySpark MLlib . It’s a machine-
learning library. it’s a wrapper over PySpark Core to try to
knowledge analysis exploitation machine-learning
algorithms. It works on distributed systems and is ascendable.
The implementations of classification, clustering, rectilinear
regression, and alternative machine-learning algorithms are
found in PySpark MLlib. A repo is employed in PySpark and
that they ar ready for giant CSV file process in standalone
mode. Particularly, the employment of the ‘spark.ml‘ module
was favored because the RDD-based MLLIB library goes to
be deprecated.
Scikit-Learn:
Scikit-learn is the most helpful library for machine learning
in Python. The sklearn library contains loads of economical
tools for machine learning and applied math modeling as
well as classification, regression, clump and spatiality
eduction.
V. RESULT AND ANALYSIS
Different measures can be used to evaluate and analyse the
Model Performance.
Some of the measures used in this project are:
TABLE II. PRECISION ANALYSYS
Model Recall Precision F1 Score
Logistic
Regression
79 90 83
XGB 74 89 81
Decision
Tree
66 79 71.86
KNN 65 75 68
`Forest Tree 45 77 56
In this analysis, several factors were known which can
facilitate to spot for associate degree correct distinction
between fraud transactions and non-fraudulent transactions
that helps to predict the presence of fraud within the given
transactions. Once completely different input datasets are
used the Machine Learning models performed at variable
performance levels. By considering average F1 score, model
rankings are obtained. Higher the F1 score, higher the
performance of the model. The analysis indicates that the
Adjusted Random Forest formula and changed random below
sampling formula provides best performance models.
However, it cannot be assumed that order of prophetical
quality would be replicated and might differ for alternative
datasets. Once discovered it’s complete that within the
dataset samples, the models with datasets that are feature
made, performs well.
A. Training and Testing Phase
In this analysis, several factors were known which can
facilitate to spot for associate degree correct distinction
between fraud transactions and non-fraudulent transactions
that helps to predict the presence of fraud within the given
transactions. Once completely different input datasets are used
the Machine Learning models performed at variable
performance levels. By considering average F1 score, model
rankings are obtained. Higher the F1 score, higher the
performance of the model. The analysis indicates that the
Adjusted Random Forest formula and changed random below
sampling formula provides best performance
models.
However, it cannot be assumed that order of prophetical
quality would be replicated and might differ for alternative
datasets. Once discovered it’s complete that within the dataset
samples, the models with datasets that are feature made,
performs well. Obtained during training and testing phase.
Depending on various features the trends are analyzed and
hence used to decide the best model among the various
Machine Learning classifiers. The following figures gives
some of the graphical representation of the results.
Fig. 3. Fraud Reported
409
Authorized licensed use limited to: Walchand College of Eng. Downloaded on May 20,2022 at 09:54:59 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Total Claim
B. Feature Selection
Fig. 5. Age Bins
Fig. 6. Incident Sevirty
The following figures gives idea about feature selection. Here
we drop column insured education level, insured occupation,
authorities contacted since they have very high unique values
which will lead to higher number of independent states.
Fig. 7. Feature Selection
C. One Hot Encoding:
The following figures gives idea about converting all
categorical values to numerical using One Hot Encoding. On
Hot Encoding allows the representation of categorical data to
be more expressive. It is required because many machine
learning algorithm unable to work and give expected results
with categorical data.
Fig. 8. Data Transformation into Numerical Data
Output of the model:
Fig. 9. Ouput for Logistic Regression Model
Input: Auto insurance fraud detection dataset containing
1000 records and 35 features
Output:
Table comparing the actual results and the predicted
results of the model, Where 0.0 stands for ’no fraud’ and
1.0 stands for ’fraud’.
Accuracy of Logistic Regression model.
410
Authorized licensed use limited to: Walchand College of Eng. Downloaded on May 20,2022 at 09:54:59 UTC from IEEE Xplore. Restrictions apply.
VI. CONCLUSION AND FUTURE WORK
The machine learning models that square measure
mentioned which square measure applied on these datasets
were able to determine most of the fallacious cases with low
false positive rate which suggests with cheap exactness.
Certain knowledge sets had severe challenges around data
quality, resulting in comparatively poor levels of prediction.
Given inherent characteristics of varied datasets, it would
not be sensible to outline optimum algorithmic techniques or
use feature engineering process for a lot of higher
performance. The models would then be used for specific
business context and user priorities. This helps loss
management units to specialize in a replacement fraud
situations and then guaranteeing that models square measure
adapting to spot them. However, it might be cheap to counsel
that supported the model performance on back-testing and
talent to spot new frauds, the set of models work the cheap
suite to use within the space of the insurance claims fraud
detection.
VII. REFERENCES
[1] K. Ulaga Priya and S. Pushpa, “A Survey on Fraud
Analytics Using Predictive Model in Insurance
Claims,” Int. J. Pure Appl. Math., vol. 114, no. 7, pp.
755–767, 2017.
[2] E. B. Belhadji, G. Dionne, and F. Tarkhani, “A
Model for the Detection of Insurance Fraud,” Geneva
Pap. Risk Insur. Issues Pract., vol. 25, no. 4, pp. 517–
538, 2000, doi: 10.1111/1468-0440.00080.
[3] “Predictive Analysis for Fraud Detection.”
https://www.wipro.com/analytics/comparative-
analysis-of-machine-learning-techniques-for-
%0Adetectin/.
[4] F. C. Li, P. K. Wang, and G. E. Wang, “Comparison
of the primitive classifiers with extreme learning
machine in credit scoring,” IEEM 2009 - IEEE Int.
Conf. Ind. Eng. Eng. Manag., vol. 2, no. 4, pp. 685–
688, 2009, doi: 10.1109/IEEM.2009.5373241.
[5] V. Khadse, P. N. Mahalle, and S. V. Biraris, “An
Empirical Comparison of Supervised Machine
Learning Algorithms for Internet of Things Data,”
Proc. - 2018 4th Int. Conf. Comput. Commun.
Control Autom. ICCUBEA 2018, pp. 1–6, 2018, doi:
10.1109/ICCUBEA.2018.8697476.
[6] S. Ray, “A Quick Review of Machine Learning
Algorithms,” Proc. Int. Conf. Mach. Learn. Big Data,
Cloud Parallel Comput. Trends, Prespectives
Prospect. Com. 2019, pp. 35–39, 2019, doi:
10.1109/COMITCon.2019.8862451.
[7] “https://www.dataschool.io/comparing-supervised-
learning-algorithms/.”.
411
Authorized licensed use limited to: Walchand College of Eng. Downloaded on May 20,2022 at 09:54:59 UTC from IEEE Xplore. Restrictions apply.
... Unlike conventional ML models that provide information for human decision-making, Agentic AI systems can execute end-to-end processes with minimal human intervention, learning and adapting from their experiences through reinforcement learning [1] The integration of AAI has fundamentally altered traditional insurance workflows, particularly in claims processing. Research conducted by Wang et al. demonstrates that modern AI-powered systems can process and evaluate claims with an accuracy rate of 89.3% when handling standard cases, representing a substantial improvement over traditional manual processing [2]. These systems employ sophisticated deep learning models that can analyze complex documentation, including images and text, reducing the average claims processing time from 8-12 days to just 24-48 hours. ...
... The implementation of fraud detection mechanisms, powered by advanced pattern recognition algorithms and neural networks, has shown particularly promising results in realworld applications. According to detailed case studies presented by Wang et al., AI-driven fraud detection systems have demonstrated the capability to identify potentially fraudulent claims with an accuracy rate of 94.2%, while maintaining a false positive rate of only 3.8% [2]. These systems utilize sophisticated temporal-spatial analysis techniques to process historical claims data, identifying patterns that would be impossible to detect through traditional methods. ...
... The validation and quality assurance processes for AAI systems have evolved to incorporate sophisticated human-in-the-loop mechanisms. Studies show that hybrid systems, combining AI capabilities with human expertise, achieve optimal results with a 96.4% accuracy rate in complex cases requiring subjective assessment [2]. These systems employ advanced machine learning models that continuously learn from human expert decisions, creating a feedback loop that enhances system performance over time. ...
Article
This technical article explores the transformative impact of Agentic Artificial Intelligence (AAI) systems within the insurance industry, focusing on key operational domains including claims processing, underwriting, and fraud detection. The article explores how modern insurance providers are leveraging advanced technologies such as deep learning, computer vision, and natural language processing to optimize their operations. The implementation of these AI-driven systems has revolutionized traditional workflows, from automated damage assessment in claims processing to sophisticated risk evaluation in underwriting, while maintaining robust security and compliance standards. The article also highlights the critical role of human-in-the-loop architectures and bias mitigation frameworks in ensuring accurate and equitable insurance operations. Through comprehensive analysis of system architectures, implementation strategies, and performance metrics, this article provides insights into how AAI systems are reshaping the insurance landscape while addressing challenges related to system integration, security, and quality assurance.
... Unlike conventional ML models that provide information for human decision-making, Agentic AI systems can execute end-to-end processes with minimal human intervention, learning and adapting from their experiences through reinforcement learning [1] The integration of AAI has fundamentally altered traditional insurance workflows, particularly in claims processing. Research conducted by Wang et al. demonstrates that modern AI-powered systems can process and evaluate claims with an accuracy rate of 89.3% when handling standard cases, representing a substantial improvement over traditional manual processing [2]. These systems employ sophisticated deep learning models that can analyze complex documentation, including images and text, reducing the average claims processing time from 8-12 days to just 24-48 hours. ...
... The implementation of fraud detection mechanisms, powered by advanced pattern recognition algorithms and neural networks, has shown particularly promising results in realworld applications. According to detailed case studies presented by Wang et al., AI-driven fraud detection systems have demonstrated the capability to identify potentially fraudulent claims with an accuracy rate of 94.2%, while maintaining a false positive rate of only 3.8% [2]. These systems utilize sophisticated temporal-spatial analysis techniques to process historical claims data, identifying patterns that would be impossible to detect through traditional methods. ...
... The validation and quality assurance processes for AAI systems have evolved to incorporate sophisticated human-in-the-loop mechanisms. Studies show that hybrid systems, combining AI capabilities with human expertise, achieve optimal results with a 96.4% accuracy rate in complex cases requiring subjective assessment [2]. These systems employ advanced machine learning models that continuously learn from human expert decisions, creating a feedback loop that enhances system performance over time. ...
Article
Full-text available
This technical article explores the transformative impact of Agentic Artificial Intelligence (AAI) systems within the insurance industry, focusing on key operational domains including claims processing, underwriting, and fraud detection. The article Vasudev Daruvuri https://iaeme.com/Home/journal/IJCET 2036 editor@iaeme.com explores how modern insurance providers are leveraging advanced technologies such as deep learning, computer vision, and natural language processing to optimize their operations. The implementation of these AI-driven systems has revolutionized traditional workflows, from automated damage assessment in claims processing to sophisticated risk evaluation in underwriting, while maintaining robust security and compliance standards. The article also highlights the critical role of human-in-the-loop architectures and bias mitigation frameworks in ensuring accurate and equitable insurance operations. Through comprehensive analysis of system architectures, implementation strategies, and performance metrics, this article provides insights into how AAI systems are reshaping the insurance landscape while addressing challenges related to system integration, security, and quality assurance.
... Unlike conventional ML models that provide information for human decision-making, Agentic AI systems can execute end-to-end processes with minimal human intervention, learning and adapting from their experiences through reinforcement learning [1] The integration of AAI has fundamentally altered traditional insurance workflows, particularly in claims processing. Research conducted by Wang et al. demonstrates that modern AI-powered systems can process and evaluate claims with an accuracy rate of 89.3% when handling standard cases, representing a substantial improvement over traditional manual processing [2]. These systems employ sophisticated deep learning models that can analyze complex documentation, including images and text, reducing the average claims processing time from 8-12 days to just 24-48 hours. ...
... The implementation of fraud detection mechanisms, powered by advanced pattern recognition algorithms and neural networks, has shown particularly promising results in realworld applications. According to detailed case studies presented by Wang et al., AI-driven fraud detection systems have demonstrated the capability to identify potentially fraudulent claims with an accuracy rate of 94.2%, while maintaining a false positive rate of only 3.8% [2]. These systems utilize sophisticated temporal-spatial analysis techniques to process historical claims data, identifying patterns that would be impossible to detect through traditional methods. ...
... The validation and quality assurance processes for AAI systems have evolved to incorporate sophisticated human-in-the-loop mechanisms. Studies show that hybrid systems, combining AI capabilities with human expertise, achieve optimal results with a 96.4% accuracy rate in complex cases requiring subjective assessment [2]. These systems employ advanced machine learning models that continuously learn from human expert decisions, creating a feedback loop that enhances system performance over time. ...
Article
Full-text available
This technical article explores the transformative impact of Agentic Artificial Intelligence (AAI) systems within the insurance industry, focusing on key operational domains including claims processing, underwriting, and fraud detection. The article Vasudev Daruvuri https://iaeme.com/Home/journal/IJCET 2036 editor@iaeme.com explores how modern insurance providers are leveraging advanced technologies such as deep learning, computer vision, and natural language processing to optimize their operations. The implementation of these AI-driven systems has revolutionized traditional workflows, from automated damage assessment in claims processing to sophisticated risk evaluation in underwriting, while maintaining robust security and compliance standards. The article also highlights the critical role of human-in-the-loop architectures and bias mitigation frameworks in ensuring accurate and equitable insurance operations. Through comprehensive analysis of system architectures, implementation strategies, and performance metrics, this article provides insights into how AAI systems are reshaping the insurance landscape while addressing challenges related to system integration, security, and quality assurance.
... Unlike conventional ML models that provide information for human decision-making, Agentic AI systems can execute end-to-end processes with minimal human intervention, learning and adapting from their experiences through reinforcement learning [1] The integration of AAI has fundamentally altered traditional insurance workflows, particularly in claims processing. Research conducted by Wang et al. demonstrates that modern AI-powered systems can process and evaluate claims with an accuracy rate of 89.3% when handling standard cases, representing a substantial improvement over traditional manual processing [2]. These systems employ sophisticated deep learning models that can analyze complex documentation, including images and text, reducing the average claims processing time from 8-12 days to just 24-48 hours. ...
... The implementation of fraud detection mechanisms, powered by advanced pattern recognition algorithms and neural networks, has shown particularly promising results in realworld applications. According to detailed case studies presented by Wang et al., AI-driven fraud detection systems have demonstrated the capability to identify potentially fraudulent claims with an accuracy rate of 94.2%, while maintaining a false positive rate of only 3.8% [2]. These systems utilize sophisticated temporal-spatial analysis techniques to process historical claims data, identifying patterns that would be impossible to detect through traditional methods. ...
... The validation and quality assurance processes for AAI systems have evolved to incorporate sophisticated human-in-the-loop mechanisms. Studies show that hybrid systems, combining AI capabilities with human expertise, achieve optimal results with a 96.4% accuracy rate in complex cases requiring subjective assessment [2]. These systems employ advanced machine learning models that continuously learn from human expert decisions, creating a feedback loop that enhances system performance over time. ...
Article
Full-text available
This technical article explores the transformative impact of Agentic Artificial Intelligence (AAI) systems within the insurance industry, focusing on key operational domains including claims processing, underwriting, and fraud detection. The article Vasudev Daruvuri https://iaeme.com/Home/journal/IJCET 2036 editor@iaeme.com explores how modern insurance providers are leveraging advanced technologies such as deep learning, computer vision, and natural language processing to optimize their operations. The implementation of these AI-driven systems has revolutionized traditional workflows, from automated damage assessment in claims processing to sophisticated risk evaluation in underwriting, while maintaining robust security and compliance standards. The article also highlights the critical role of human-in-the-loop architectures and bias mitigation frameworks in ensuring accurate and equitable insurance operations. Through comprehensive analysis of system architectures, implementation strategies, and performance metrics, this article provides insights into how AAI systems are reshaping the insurance landscape while addressing challenges related to system integration, security, and quality assurance.
... The authors of Ref. [20] investigated the integration of explainable AI (XAI) techniques in banking fraud detection, emphasizing the need for transparency and interpretability to balance performance with regulatory compliance. Similarly, ref. [21] compared several machine learning algorithms-logistic regression, extreme gradient boosting (XGB), decision tree, KNN, and random forest-for detecting fraudulent claims. Logistic regression achieved an F1 score of 83; however, when tested with new datasets, random forest delivered the best performance. ...
... The authors noted challenges related to data quality in some datasets, which negatively impacted prediction performance, and suggested refining models for different fraud cases to adapt to evolving fraud patterns. A follow-up study, ref. [22], extended the work presented in Ref. [21] by incorporating linear discriminant analysis (LDA), which achieved a superior F1 score of 87, concluding that LDA outperforms other methods. Additionally, ref. [23] explored the use of clustering algorithms for automating fraud detection in group life insurance claims during audits. ...
Article
Full-text available
The weighted K-means clustering algorithm is widely recognized for its ability to assign varying importance to features in clustering tasks. This paper introduces an enhanced version of the algorithm, incorporating a bi-partitioning strategy to segregate feature sets, thus improving its adaptability to high-dimensional and heterogeneous datasets. The proposed bi-partition weighted K-means (BPW K-means) clustering approach is tailored to address challenges in identifying patterns within datasets with distinct feature subspaces, such as those in insurance claim fraud detection. Experimental evaluations on real-world insurance datasets highlight significant improvements in both clustering accuracy and interpretability compared to the classical K-means, achieving an accuracy of approximately 91%, representing an improvement of about 38% over the classical K-means algorithm. Moreover, the method’s ability to uncover meaningful fraud-related clusters underscores its potential as a robust tool for fraud detection. Beyond insurance, the proposed framework applies to diverse domains where data heterogeneity demands refined clustering solutions. The application of the BPW K-means method to multiple real-world datasets highlights its clear superiority over the classical K-means algorithm.
... Feature selection also reduces complexity of the model. While analyzing insurance claims using machine learning for fraud detection, [52] observed that feature selection reduces number of independent states from the features that are having very high unique values. Also converting categorical values to numerical ones improves the results of the algorithm. ...
Article
Full-text available
Fraudulent claims have been a big drawback in motor insurance despite the insurance industry having vast amounts of motor claims data. Analyzing this data can lead to a more efficient way of detecting reported fraudulent claims. The challenge is how to extract insightful information and knowledge from this data and use it to model a fraud detection system. Due to constant evolution and dynamic nature of fraudsters, some approaches utilized by insurance firms, such as impromptu audits, whistle-blowing, staff rotation have become infeasible. Machine learning techniques can aid in fraud detection by training a prediction model using historical data. The performance of the models is affected by class imbalance and the determination of the most relevant features that might lead to fraud detection from data. In this paper we examine various fraud detection techniques and compare their performance efficiency. We then give a summary of techniques’ strengths and weaknesses in identifying claims as either fraudulent or non-fraudulent, and finally propose a fraud detection framework of an ensemble model that is trained on dataset balanced using SMOTE and with relevant features only. This proposed approach would improve performance and reduce false positives.
... Many studies have attempted to detect insurance fraud using AI methods such as ML algorithms, such as Naive Bayesian (NB), random forest (RF), logistic regression (LR), support vector machine (SVM), decision tree (DT), AdaBoost, and neural network (NN) models, which have been used for fraud detection. Some studies show that RF and DT algorithms perform better than other methods for detecting fraud in automobile [4], [15], [19], [20], [21], [22], [23], [24], [25], [26], [27]. Some studies implemented ensemble models to recognize insurance fraud, such as Bagged Ensemble Convolutional Neural Networks in [28], and deep boosting decision trees in [29]. ...
Article
Full-text available
Insurance fraud is a prevalent issue that insurance companies must face, particularly in the realm of automobile insurance. This type of fraud has significant cost implications for insurance firms and can have a long-term impact on pricing strategies and insurance rates. As a result, accurately predicting and detecting insurance fraud has become a crucial challenge for insurers. The fraud datasets are usually imbalanced, as the number of fraudulent instances is much less than the ligament instances and contains missing values. Prior research has employed machine learning methods to address this class imbalance dataset problem, but there is limited effort handling the class imbalance dataset present in insurance fraud datasets. Moreover, we could not find an overfitting analysis for the relevant predictive models. This paper addresses these two limitations by employing two car insurance company datasets, namely, an Egyptian real-life dataset and a standard dataset. We proposed addressing the missing data and the class imbalance problems with different methods. Then, the predictive models were trained on processed datasets to predict insurance fraud as a classification problem. The classifiers are evaluated on several evaluation metrics. Moreover, we proposed the first overfitting analysis for insurance fraud classifiers, to our knowledge. The obtained results outline that addressing the class imbalance in the insurance fraud detection dataset has a significant positive effect on the performance of the predictive model, while addressing the problem of missing values has a slight effect. Moreover, the proposed methods outperform all of the existing methods on the accuracy metric.
Article
Full-text available
The aim of this article is to develop a model to aid insurance companies in their decision-making and to ensure that they are better equipped to fight fraud. This tool is based on the systematic use of fraud indicators. We first propose a procedure to isolate the indicators which are most significant in predicting the probability that a claim may be fraudulent. We applied the procedure to data collected in the Dionne–Belhadji study (1996). The model allowed us to observe that 23 of the 54 indicators used were significant in predicting the probability of fraud. Our study also discusses the model's accuracy and detection capability. The detection rates obtained by the adjusters who participated in the study constitute the reference point of this discussion. As shown in the Caron–Dionne (1998), there is the possibility that these rates underestimate the true level of fraud.JEL numbers: D81, G14, G22.
Conference Paper
With the rapid growth in the credit industry, credit scoring classifiers are being widely used for credit admission evaluation. Effective classifiers have been regarded as a critical topic, with the related departments striving to collect huge amounts of data to avoid making the wrong decision. Finding effective classifier is important because it will help people make an objective decision instead of them having to rely merely on intuitive experience. This study proposes two well-known classifiers, namely, K-Nearest Neighbor (KNN), Support Vector Machine (SVM), which will be used to find the highest accuracy rate classifier without features selection. Two credit data sets from University of California, Irvine (UCI) are chosen to evaluate the accuracy of various classifiers. The results are compared and the nonparametric Wilcoxon signed rank test will be performed to show if there is any significant difference between these classifiers. Performance of the KNN classifier is better in only one data set but not significant, whereas SVM classifier is significant superior to Extreme Learning Machine (ELM) classifier in the German data set. The result of this study suggests that the primitive classifiers did not achieve satisfactory classification results. Combining with effective feature selection approaches in finding optimal subsets is a promising method in the field of credit scoring.
Article
The aim of this article is to develop a model to aid insurance companies in their decision-making and to ensure that they are better equipped to fight fraud. This tool is based on the systematic use of fraud indicators. We first propose a procedure to isolate the indicators which are most significant in predicting the probability that a claim may be fraudulent. We applied the procedure to data collected in the Dionne–Belhadji study (1996). The model allowed us to observe that 23 of the 54 indicators used were significant in predicting the probability of fraud. Our study also discusses the model's accuracy and detection capability. The detection rates obtained by the adjusters who participated in the study constitute the reference point of this discussion. As shown in the Caron–Dionne (1998), there is the possibility that these rates underestimate the true level of fraud. The Geneva Papers on Risk and Insurance (2000) 25, 517–538. doi:10.1111/1468-0440.00080
A Survey on Fraud Analytics Using Predictive Model in Insurance Claims
  • Ulaga Priya
  • S Pushpa
K. Ulaga Priya and S. Pushpa, "A Survey on Fraud Analytics Using Predictive Model in Insurance Claims," Int. J. Pure Appl. Math., vol. 114, no. 7, pp. 755-767, 2017.