Article

Managing a pool of rules for credit card fraud detection by a Game Theory based approach

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In the automatic credit card transaction classification there are two phases: in the Real-Time (RT) phase the system decides quickly, based on the bare transaction information, whether to authorize the transaction; in the subsequent Near-Real-Time (NRT) phase, the system enacts a slower ex-post evaluation, based on a larger information context. The classification rules in the NRT phase trigger alerts on suspicious transactions, which are transferred to human investigators for final assessment. The management criteria used to select the rules, to be kept operational in the NRT pool, are traditionally based mostly on the performance of individual rules, considered in isolation; this approach disregards the non-additivity of the rules (aggregating rules with high individual precision does not necessarily make a high-precision pool). In this work, we propose to apply, to the rule selection for the NRT phase, an approach which assigns a normalized score to the individual rule, quantifying the rule influence on the overall performance of the pool. As a score we propose to use a power-index developed within Coalitional Game Theory, the Shapley Value (SV), summarizing the performance in collaboration. Such score has two main applications: (1) it can be used, within the periodic rule assessment process, to support the decision of whether to keep or drop the rule from the pool; (2) it can be used to select the k top-ranked rules, so as to work with a more compact rule set. Using real-world credit card fraud data containing approximately 300 rules and 3×105 transactions records, we show that: (1) this score fares better – in granting the performance of the pool – than the one assessing the rules in isolation; (2) that the same performance of the whole pool can be achieved keeping only one tenth of the rules — the top-k SV-ranked rules. We observe that the latter application can be reframed in terms of Feature Selection (FS) task for a classifier: we show that our approach is comparable w.r.t benchmark FS algorithms, but argue that it presents an advantage for the management, consisting in the assignment of a normalized score to the individual rule. This is not the case for most FS algorithms, which only focus in yielding a high-performance feature-set solution.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... While various verification methods have been implemented, the number of fraud cases involving credit cards has not been significantly decreased [6]. The potential for substantial monetary gains, combined with the ever-changing nature of financial services, creates a wide range of opportunities for fraudsters [7]. Funds from payment card fraud are often used in criminal activities that are hard to prevent, e.g., to support terrorist acts [8]. ...
... This can be done through two main steps: training and testing. AI is employed to build systems for fraud detection, such as classificationbased systems [19,6,7,8], clustering-based systems [17,20,21], neural network-based systems [18,22,23], and support vector machine-based systems [9]. Although AI-based systems can perform well, they suffer from some critical issues. ...
... In this context, data mining tasks, such as classification, clustering, applying association rules, and using neural networks, are employed [2]. In addition, AI is employed to build systems for fraud detection, such as classification-based systems [19,6,7,8], clustering-based systems [17,20,21], neural network-based systems [18,22,23] and support vector machine-based systems [9]. The techniques employed to construct credit card fraud detection systems using AI can be categorized into four main groups. ...
... The feasibility of the overall approach relies on the assumption that one can compute the power indices α p with little e ort. Indeed, although the de nition (2) suggests that the computation of the power indices has exponential complexity, in practice, a su ciently useful estimate of the index can be obtained in polynomial time by sampling a reasonable number of coalitions [10]: in fact one does not have to discover the exact value of the indices, but for the proposed approach one just needs to nd out which ones are the highest k indices. ...
... The approach of the present work bears some analogy with the problems of classi cation rule selection and rule pool management in fraudulent credit card transaction detection as it is addressed in the work [10] (that work uses the Shapley Value as a scoring index). However the very nature of the involved objects (classi cation rules on one side and information streams on the other) makes the actual details of the methods rather di erent. ...
... In short, an event, or more in general, the truth to be discovered, are richly structured phenomena and their description is very different from the one of credit card transactions. 1 Even so, should one manage to formulate the performance metric in terms of a single real number the formalism could be the same used in reference [10]. However the two setting are distinguished also by a structural di erence, which forces the adoption of di erent modeling tools. ...
... We also investigated the importance of the input variables in predicting the target variable. To this end, we adopted a game-theory-based methodology relying on the Shapley value of the input features [56][57][58][59], which does not evaluate the predictive power of each feature in isolation but takes into account the features' interactivity by considering all the different feature coalitions and taking the average of each feature added values. To this end, we used the SHAP (SHapley Additive exPlanations, v.0.46.0) python library implementation of the algorithm (https://shap.readthedocs.io/, ...
... We also investigated the importance of the input variables in predicting the target variable. To this end, we adopted a game-theory-based methodology relying on the Shapley value of the input features [56][57][58][59], which does not evaluate the predictive power of each feature in isolation but takes into account the features' interactivity by considering all the different feature coalitions and taking the average of each feature added values. To this end, we used the SHAP (SHapley Additive exPlanations, v.0.46.0) python library implementation of the algorithm (h;ps://shap.readthedocs.io/, ...
Article
Full-text available
Background/Objectives: This study addresses the gap in methodological guidelines for neuroergonomic attention assessment in safety-critical tasks, focusing on validating EEG indices, including the engagement index (EI) and beta/alpha ratio, alongside subjective ratings. Methods: A novel task-embedded reaction time paradigm was developed to evaluate the sensitivity of these metrics to dynamic attentional demands in a more naturalistic multitasking context. By manipulating attention levels through varying secondary tasks in the NASA MATB-II task while maintaining a consistent primary reaction-time task, this study successfully demonstrated the effectiveness of the paradigm. Results: Results indicate that both the beta/alpha ratio and EI are sensitive to changes in attentional demands, with beta/alpha being more responsive to dynamic variations in attention, and EI reflecting more the overall effort required to sustain performance, especially in conditions where maintaining attention is challenging. Conclusions: The potential for predicting the attention lapses through integration of performance metrics, EEG measures, and subjective assessments was demonstrated, providing a more nuanced understanding of dynamic fluctuations of attention in multitasking scenarios, mimicking those in real-world safety-critical tasks. These findings provide a foundation for advancing methods to monitor attention fluctuations accurately and mitigate risks in critical scenarios, such as train-driving or automated vehicle operation, where maintaining a high attention level is crucial.
... Application fraud refers to a fraud during the time of new credit card application process, by providing fake identity information and the issuer also acknowledges. Besides, behaviour fraud happens after providing the credit card, accurately and it indicates credit card transactions, which includes fraud behaviour [11]. Credit card fraud recognition is an imperative problem for financial organizations and credit card users. ...
... In [11], designed a Convolutional Neural Networks (CNN) for the detection of fraud in the credit card. In this method, the intrinsic patterns were learned from the labelled data for determining the behaviour of fraud. ...
Article
Full-text available
One of the most important contributors to the expansion and progression of a nation's economy is its banking and financial industry. In particular, over the recent past, there has been a significant increase in the utilization of credit and debit cards, whereby all customers trade transactions either digitally over the internet or physically at the stores. Here, the customers, banking institutions, and financial organizations are all being put in a difficult position by fraudulent actors. Because more recent technology is now readily available, internet banking has become an important avenue for commercial transactions. Fake banking activities and fraudulent transactions are serious problem that affects both the users' sense of safety and their trust in the system. In addition, fraudulent activities result in enormous losses because of the proliferation of sophisticated frauds such as virus infections, scams, and fake websites. These frauds are all examples of advanced fraud. This study makes three contributions toward the prevention of fraudulent activity involving credit card transactions.
... These rules are then tested for every new transaction in order to trigger a signal indicating that a SIF has been detected. This process is similar to the Near-Real Time (NRT) fraud detection phase classically used in credit card fraud detection systems, as described in [Gia+20]. In the context of the SiS platform, as of today 224 such rules have been declared in the rule engine. ...
... However rule engines as fraud detection system suffer from several drawbacks. First of all, as discussed in [Gia+20] they are difficult to maintain. For example, due to the dynamic nature of fraud, new rules have to be added when fraudsters discover new ways to cheat the system. ...
Thesis
Supplier Impersonation Fraud (SIF) is a kind of fraud occuring in a Business-To-Business context (B2B), where a fraudster impersonates a supplier in order to trigger an illegitimate payment from a company. Most of the exisiting systems focus solely on a single, "intra-company" approach in order to detect such kind of fraud. However, the companies are part of an ecosystem where multiple agents interacts, and such interaction hav yet to be integrated as a part of the existing detection techniques. In this thesis we propose to use state-of-the-art techniques in Machine Learning in order to build a detection system for such frauds, based on the elaboration of a model using historical transactions from both the targeted companies and the relevant other companies in the ecosystem (contextual data). We perform detection of anomalous transactions when significant change in the payment behavior of a company is detected. Two ML-based systems are proposed in this work: ProbaSIF and GraphSIF. ProbaSIF uses a probabilistic approach (urn model) in order to asert the probability of occurrence of the account used in the transaction in order to assert its legitimacy. We use this approach to assert the differences yielded by the integration of contextual data to the analysis. GraphSIF uses a graph-based approach to model the interaction between client and supplier companies as graphs, and then uses these graph as training data in a Self-Organizing Map-Clustering model. The distance between a new transaction and the center of the cluster is used to detect changes in the behavior of a client company. These two systems are compared with a real-life fraud detection system in order to assert their performance.
... Modern approaches have recently been exploited to identify CCF within regular case streams. A technique inspired by the game theory is proposed based on the design and implementation of a rule pool management technique [35]. The proposed solution distributes suspicious cases for manual investigation while bypassing the step of isolating the individual rules. ...
Article
Full-text available
Recently, tremendous growth in e-business has arisen in an increasing number of online transactions. Such widespread adaptation of e-payments has been going along with the increase in deceitful activities, which results in tremendous losses in the financial sector. This led to a novel research paradigm using statistical and auto-data-driven techniques to detect anomalies and fraud. Thus, traditional techniques fail to provide a secure medium for online transactions. Consequently, building a credit card fraud (CCF) detector is essential for secure online operations. Therefore, based on the abovementioned constraints, this paper presents a comprehensive study incorporating heterogeneous machine learning (ML) techniques for CCF detection. The proposed framework utilizes a multi-stage classification system that employs multiple classifiers, i.e., logistic regression, support vector machine (SVM) XGBoost, Random Forest, K-Nearest Neighbors (KNN), and Deep Neural Network (DNN). Furthermore, to accomplish the intensive class imbalance, the proposed technique uses a sampling technique with an internal features selection technique implemented based on voting among different methods. The key finding indicates that the proposed model surpasses the existing DNN simple voting, traditional stacking framework with a fraud recall value of 0.901, a legitimate recall value of 0.995, and a model cost value of 0.421. INDEX TERMS Credit Card Fraud (CCF), Machine Learning (ML), Deep Learning (DL), Features Selection, multi-stage classification.
... Fraud detection is the process of recognizing fraudulent behavior [8]. The credit card industry encourages the deployment of fraud detection mechanisms; thus, fraud detection may become a preventive approach in the future [9]. ...
Article
Full-text available
Over time, with the growth of credit cards and financial data, credit models are needed to support banks in making financial decisions. Hence, developing an efficient fraud detection system is essential to avoid fraud in Internet transactions, which increased with the growth of technology. Deep learning techniques are superior to other machine learning techniques in predicting the behavior of credit card customers based on the probability that they will miss a payment. The bidirectional long-short term memory (BiLSTM) model is proposed to train the Taiwanese non-transactional dataset for bank credit cards to decrease the losses of banks. The BiLSTM reached an accuracy of 98% in fraud credit detection compared with other machine learning techniques.
... Furthermore, the number of fraudulent transactions has recently increased [1]. According to the Nilson Report, approximately 28.65 billion and 35 billion global business losses were caused by credit card fraud in 2019 and 2020 [2]. Due to fraudsters' use of some technical means (such as Trojan horses), credit card-related organizations and financial institutions need to be proactive to detect credit card fraud efficiently and accurately. ...
Article
Full-text available
The advancement of technologies and the proliferation of new payment services have improved our lives, offering limitless opportunities for individuals and companies in each country to develop their businesses through credit card transactions as a payment method. Consequently, continuous improvement is crucial for these systems, particularly in the classification of fraud transactions. Numerous studies are required in the realm of automated and real-time fraud detection. Due to their advantageous properties, recent studies have utilized different deep learning architectures to create well-fitting models to identify fraudulent transactions. Our proposed solution aims to exploit the robust capabilities of deep learning approaches to identify abnormal transactions. The solution can be presented as follows: To address the imbalanced data set issue, we applied an autoencoder combined with the support vector machine model (ASVM). For the classification phase, we utilize an attention-long short-term memory neural network as a weak learner for the gradient boosting algorithm (GB_ALSTM), comparing it with various techniques, including artificial neural networks (ANNs), convolutional neural networks (CNNs), long short-term memory neural networks (LSTMs), attention-long short-term memory neural networks (ALSTMs), and bidirectional long short-term memory neural networks (BLSTMs). We conducted several experiments on a real-world dataset, revealing promising results in detecting abnormal transactions and highlighting the dominance of our suggested solution over competing models.
... According to Mardiasmo (2018), such challenges call for improved accounting systems and management control mechanisms that would ensure timely and accurate preparation of financial reports to enable objective performance assessment and informed decision-making. In addition, as indicated by Gianini et al. (2020), the combination of abnormality detection methods can provide greater reliability to the financial data as grounds for continuous improvement. ...
Article
Full-text available
Fiscal decentralisation is the basic capital to reach better governance, regional development, and fiscal autonomy. This paper discusses financial management barriers and opportunities in the Gorontalo Province, a region with low fiscal autonomy, as shown by its index and heavy dependency on central government transfer. This qualitative research, using data from interviews and document studies, found various inefficiencies in the province, such as incongruence between expenditures and revenues, lack of coordination among stakeholders, non-compliance by taxpayers, and delays in financial reporting. Drawing on the theoretical work of Schick (1998) for a framework for public expenditure management and using empirical evidence, this paper proposes an integrated strategy toward a better fiscal management system, diversification of revenue sources, enhancements in planning techniques, introduction of comprehensive accounting mechanisms, and stakeholder participation. This paper contributes to the debate on fiscal decentralisation by giving practical suggestions on how to achieve fiscal autonomy and promote sustainable development in economically less-developed regions.
... The work by Baumann [105] also proposes a similar method in the context of insurance claim fraud with an extension through combi-nations of association rules where the weights for the rules are generated using a genetic algorithm. Gianini et al. [106] propose a game theory-based approach. Here the selection of the rules operating in near real time is carried out through a normalized score using the Shapley value. ...
Article
Full-text available
As a consequence of the ubiquity of online payments, there has been an accompanying surge in fraudulent activity in recent times, leading to billions of dollars of financial losses. As payment providers aim to tackle this with various preventive mechanisms, fraudsters also continuously evolve their methods to remain indistinguishable from genuine actors. This necessitates sophisticated fraud detection tools to supplement these security mechanisms. As the volume of transactions taking place per day is in the millions, relying solely on human investigation is expensive and ultimately unfeasible, leading to an emergence of research into data driven or machine learning methods for fraud detection. Over the last decade, this research has evolved into tackling the various particularities of the domain. These include the skewed nature of the data, the evolving user and fraud behaviour and towards learning representations of the context in which a transaction takes place. This work aims to provide the community an in-depth overview of these different directions in which recent research works on online fraud detection has focused on. We develop a taxonomy of the domain based on these directions and organize our analysis according to them. For each area we focus on significant methodological advancements and highlight limitations or gaps in the current state-of-the-art solutions. Through our analysis, it emerges that one of the the primary limiting factors that many researchers face is the lack availability of high-quality credit card data. Therefore, we provide a first step in addressing this in the form of a data generation framework using generative adversarial networks (GANs). We hope that this survey serves as a foundation for researchers who want to address the multi-faceted problem of credit card fraud detection.
... We also investigated the importance of the input variables in predicting the target variable. To this purpose we adopted the Game Theory based methodology relying on the Shapley Value of the input features [57][58][59][60], which does not evaluate the predictive power of each feature in isolation but keeps into account the features interactivity by considering all the different features coalitions and taking the average of each feature added values. To this purpose we used the SHAP (SHapley Additive exPlanations) python library implementation of the algorithm (https://shap.readthedocs.io/). ...
Preprint
Full-text available
This study addresses the gap in methodological guidelines for Neuroergonomic attention assessment in safety-critical tasks, focusing on validating EEG indices including the Engagement Index (EI) and Beta/Alpha ratio alongside subjective ratings. A novel task-embedded reaction time paradigm was developed to evaluate the sensitivity of these metrics to dynamic attentional demands in a more naturalistic multitasking context. By manipulating attention levels through varying secondary tasks in the NASA MATB-II task while maintaining a consistent primary reaction-time task, the study successfully demonstrated the effectiveness of the paradigm. Results indicate that both Beta/Alpha ratio and EI are sensitive to changes in attentional demands, with Beta/Alpha being more responsive to dynamic variations in attention, and EI reflecting more the overall effort required to sustain performance, especially in conditions where maintaining attention is challenging. The potential for predicting the attention lapses through integration of performance metrics, EEG measures, and subjective assessments was demonstrated, providing a more nuanced understanding of dynamic fluctuations of attention in multitasking scenarios, mimicking those in real-world safety-critical tasks. These findings provide a foundation for advancing methods to accurately monitor attention fluctuations and mitigate risks in critical scenarios, such as train-driving or automated vehicle operation, where maintaining high attention level is crucial.
... Moreover, the manual nature of audits makes them time-consuming and susceptible to human error. Rule-based fraud detection systems operate on predefined rules and thresholds to identify suspicious activities (Gianini et l., 2020). For example, a rule might flag transactions above a certain amount as potentially fraudulent. ...
Article
Full-text available
The integration of artificial intelligence (AI) into accounting has significantly transformed the landscape of fraud detection. Traditional methods, while effective to some extent, often struggle with the increasing complexity and volume of financial data. AI, with its advanced analytical capabilities, offers a promising solution to these challenges. This review provides an overview of the techniques used in AI-driven fraud detection in accounting and highlights case studies that demonstrate the practical applications and benefits of these technologies. AI techniques for fraud detection in accounting primarily involve machine learning (ML), natural language processing (NLP), and data mining. Machine learning algorithms, such as supervised and unsupervised learning models, are employed to identify patterns and anomalies in financial data that could indicate fraudulent activity. Supervised learning involves training a model on a labelled dataset containing examples of both fraudulent and non-fraudulent transactions, enabling the model to learn the distinguishing features of fraud. Unsupervised learning, on the other hand, is used to detect anomalies without prior labeling, identifying outliers that deviate from the norm. Natural language processing (NLP) is utilized to analyze textual data, such as emails and financial documents, to uncover suspicious activities and hidden relationships. This is particularly useful in forensic accounting, where vast amounts of unstructured data must be examined for signs of fraud. Data mining techniques are also critical, enabling the extraction of useful information from large datasets and the identification of trends and patterns that may not be immediately apparent. Several case studies illustrate the effectiveness of AI in enhancing fraud detection in accounting. One notable example is the use of AI by major financial institutions to combat credit card fraud. By implementing ML algorithms, these institutions have significantly improved their ability to detect fraudulent transactions in real-time. The algorithms analyze transaction patterns and flag those that deviate from a customer's typical behavior, allowing for immediate investigation and action. Another case study involves a large multinational corporation that integrated NLP and data mining techniques into its internal audit processes. The company utilized AI to analyze thousands of financial documents and emails, uncovering a complex fraud scheme that had previously gone undetected. The AI system identified unusual communication patterns and financial discrepancies, leading to a comprehensive investigation and the eventual prosecution of the perpetrators. A further example is found in the public sector, where government agencies have employed AI to detect and prevent procurement fraud. By analyzing historical procurement data, AI systems can identify anomalies and potential red flags, such as unusually high bids or frequent contract awards to the same vendor. This proactive approach has enabled these agencies to save millions of dollars and improve the transparency and integrity of their procurement processes. The application of AI in fraud detection within accounting represents a significant advancement over traditional methods. Techniques such as machine learning, natural language processing, and data mining offer powerful tools for identifying and mitigating fraudulent activities. The case studies discussed highlight the practical benefits and successes achieved through AI-driven fraud detection, demonstrating its potential to enhance the accuracy, efficiency, and effectiveness of fraud prevention efforts. As the complexity and volume of financial transactions continue to grow, the role of AI in fraud detection will become increasingly vital. Continued advancements in AI technology, coupled with its integration into accounting practices, promise to further strengthen the fight against financial fraud, safeguarding the integrity of financial systems and promoting trust and confidence among stakeholders. Keywords: Fraud, Detection, Accounting, Artificial Intelligence, Case Studies.
... A study [12] of intrusion detection utilizes reactive and proactive game theory, the study overcomes the presence of malicious nodes to achieve a 42% improvement in packet delivery rate, but the study suffers from high FPR. Beyond the typical individual rule assessment identi ed in the literature, the study [41] suggests applying concepts from coalitional game theory and the Shapley Value to evaluate a rule of collaborative performance, the study identi es the most important features of fraud detection. According to the study [9], game theory offers a framework for investigating the systems that provide nancial fraud detection in the metaverse setting, the study shows that using a metaverse environment can improve nancial analysts' ability to identify fraud, offering useful information for future collaboration tactics in corporate nance. ...
Preprint
Full-text available
New bank account fraud is a significant problem causing financial losses in banking and finance. Existing statistical and machine-learning methods were used to detect fraud thereby preventing financial losses. However, most studies do not consider the dynamic behavior of fraudsters and often produce a high False Positive Rate (FPR). This study proposes the detection of new bank account fraud in the context of simultaneous game theory (SGT) with Neural Networks, the SGT involves two players, a fraudster, and bank officials attacking each other through Bayesian probability in a zero-sum. The influence of outliers within the SGT was tackled by adding a context feature for effective simulation of the dynamic behavior of fraudsters. The Neural Networks layer uses the simulated features for fraud context learning. The study is validated using Bank Account Fraud (BAF) Dataset on different machine-learning models. The Radial Basis Function Networks achieved FPR of 0.0% and 8.3% for fraud and non-fraud classes, respectively, while achieving True Positive Rate (TPR) of 91.7% and 100.0% for fraud and non-fraud classes, respectively. An improved Radial Basis Function Networks detect fraud by revealing fraudulent patterns and dynamic behaviors in higher dimensional data. The findings will enhance fraud detection and reduce customer attrition.
... Each of these sequences with HMM and the likelihood associated with HMM is used as additional features in the Random Forest classifier for fraud detection [75]. Gianini et al. used a game theory-based approach for detecting credit card fraud by managing a pool of rules [129]. ...
Preprint
Full-text available
As the number of credit card users has increased, detecting fraud in this domain has become a vital issue. Previous literature has applied various supervised and unsupervised machine learning methods to find an effective fraud detection system. However, some of these methods require an enormous amount of time to achieve reasonable accuracy. In this paper, an Asexual Reproduction Optimization (ARO) approach was employed, which is a supervised method to detect credit card fraud. ARO refers to a kind of production in which one parent produces some offspring. By applying this method and sampling just from the majority class, the effectiveness of the classification is increased. A comparison to Artificial Immune Systems (AIS), which is one of the best methods implemented on current datasets, has shown that the proposed method is able to remarkably reduce the required training time and at the same time increase the recall that is important in fraud detection problems. The obtained results show that ARO achieves the best cost in a short time, and consequently, it can be considered a real-time fraud detection system.
... The stability of this approach was high, thereby detection performance was enhanced, but still, the F-measure value did not maintain gradually for a longer time. In credit card fraud identification, Gabriele Gianini et al. 38 developed Game Theory enabled Shapley Value (SV) approach. In this technique, the rule evaluation and selection process were applied for eliminating redundancy. ...
Article
Full-text available
Digital transactions based on credit cards are gradually increasing concept due to expediency. The amount of fraudulent transactions has intensely enlarged in modern days, because of the fast development of e‐services, namely e‐finance, mobile payments, and e‐commerce as well as the promotion of credit cards. Criminal fraud behaviors and user's payment behaviors are frequently varying, thus performance improvement of the fraud identification method and its stability are more challenging processes. The Shuffled Shepherd Political Optimization‐based Deep Residual network (SSPO‐based DRN) scheme is established for credit card fraud identification in this research. The SSPO is developed by merging the Political Optimization (PO) and Shuffled Shepherd Optimization Algorithm (SSOA). The quantile normalization model is an effective preprocessing technique, which normalizes the data for effective detection. Moreover, fisher score and class information gain effectively select the required features. Data augmentation is employed for increasing the data size, thereby the detection performance is improved. The Deep Residual Network (DRN) is employed for credit card fraud recognition, which is trained by devised SSPO algorithm. The SSPO‐based DRN approach achieved enhanced performance with testing sensitivity of 0.9279, specificity of 0.9023, and accuracy of 0.9120.
... Very good results have been obtained when applying the power indices in the Multi-attribute decision-making concept [3]. Power indices were used for determining the importance of decision rules [4]. Other uses of power indices can be found in the papers [6,7]. ...
Article
Full-text available
In this paper a new method of fusion predictions obtained based on dispersed data is proposed. In the method a power index is used. This approach allows to calculate the real power of prediction vectors generated based on local data with using the k-nearest neighbors classifier. The use of two power indices: Shapley-Shubik and Banzhaf-Coleman power index is analyzed. The influence of k-parameter value and the value of quota in simple game on the classification accuracy is also studied. The obtained results are compared with the approach in which the power index was not used. It was found that the proposed method of using the power index improves the classification accuracy. Moreover, both analyzed power indices generate comparable results.
... W ITH the development of technology, credit card payment has become a popular payment mode worldwide. However, transaction frauds often occur since fraudsters can utilize some technological means (e.g., Trojan horse and credential stuffing attacks) to embezzle card accounts for the use of unauthorized funds [1]. According to the Nilson Report, the fraud-related losses worldwide were about $35 billion in 2020 [2]. ...
Article
Credit card fraud detection is a challenging task since fraudulent actions are hidden in massive legitimate behaviors. This work aims to learn a new representation for each transaction record based on the historical transactions of users in order to capture fraudulent patterns accurately and, thus, automatically detect a fraudulent transaction. We propose a novel model by improving long short-term memory with a time-aware gate that can capture the behavioral changes caused by consecutive transactions of users. A current-historical attention module is designed to build up connections between current and historical transactional behaviors, which enables the model to capture behavioral periodicity. An interaction module is designed to learn comprehensive and rational behavioral representations. To validate the effectiveness of the learned behavioral representations, experiments are conducted on a large real-world transaction dataset provided to us by a financial company in China, as well as a public dataset. Experimental results and the visualization of the learned representations illustrate that our method delivers a clear distinction between legitimate behaviors and fraudulent ones, and achieves better fraud detection performance compared with the state-of-the-art methods.
... SMOTE is a way of oversampling that generates random examples instead of achieving oversampling by repetition or replacement alone. Furthermore, the technology can also progressively increase the learning process of fraud detection algorithm [32]. ...
Article
Full-text available
With the continuous expansion of the banks' credit card businesses, credit card fraud has become a serious threat to banking financial institutions. So, the automatic and real-time credit card fraud detection is the meaningful research work. Because machine learning has the characteristics of non-linearity, automation, and intelligence, so that credit card fraud detection can improve the detection efficiency and accuracy. In view of this, this paper proposes a credit card fraud detection model based on heterogeneous ensemble, namely CUS-RF (cluster-based under-sampling boosting and random forest), based on clustering under-sampling and random forest algorithm. CUS-RF-based credit card fraud detection model has the following advantages. Firstly, the CUS-RF model can better overcome the issue of data imbalance. Secondly, based on the idea of heterogeneous ensemble learning, the clustering under-sampling method and random forest model are fused to achieve a better performance for credit card fraud detection. Finally, through the verification of real credit card fraud dataset, the CUS-RF model proposed in this paper has achieved better performance in credit card fraud detection compared with the benchmark model.
... The method is based on Fisher Discriminant Analysis. A Game Theory based analysis for credit card fraud detection was proposed by Gianini et al. [5]. A Big Data based credit card fraud detection model that utilizes Big Data based techniques was proposed by Vaughan [6]. ...
Article
Full-text available
Fraud detection in credit card transactions is one of the major requirements of the current business scenario due to the huge amount of losses associated with the domain. This work presents a multi-level model that can provide highly effective fraud detection in credit card transactions. The model is based on the amount for which the transaction is committed. The proposed MLFD model identifies the nature of the transaction and depending on the significance level of the transaction, the appropriate learning model is selected. Experiments were performed with the standard benchmark data and comparisons were performed with existing model in literature. Results shows that the proposed model exhibits high performance compared to the existing model.
... Fraud detection mechanisms lie on a broad spectrum of approaches (Kou et.al, 2004;Bhattacharyya et al., 2011;Le Borgne et al., 2022). On one end, we have rule-based approaches, devised and updated by domain experts (Gianini et al., 2020). This requires human time, effort and maintenance, and cannot model very complex patterns. ...
Preprint
Full-text available
Fraud detection systems (FDS) mainly perform two tasks: (i) real-time detection while the payment is being processed and (ii) posterior detection to block the card retrospectively and avoid further frauds. Since human verification is often necessary and the payment processing time is limited, the second task manages the largest volume of transactions. In the literature, fraud detection challenges and algorithms performance are widely studied but the very formulation of the problem is never disrupted: it aims at predicting if a transaction is fraudulent based on its characteristics and the past transactions of the cardholder. Yet, in posterior detection, verification often takes days, so new payments on the card become available before a decision is taken. This is our motivation to propose a new paradigm: posterior fraud detection with "future" information. We start by providing evidence of the on-time availability of subsequent transactions, usable as extra context to improve detection. We then design a Bidirectional LSTM to make use of these transactions. On a real-world dataset with over 30 million transactions, it achieves higher performance than a regular LSTM, which is the state-of-the-art classifier for fraud detection that only uses the past context. We also introduce new metrics to show that the proposal catches more frauds, more compromised cards, and based on their earliest frauds. We believe that future works on this new paradigm will have a significant impact on the detection of compromised cards.
... According to the Nilson Report, global business losses caused by credit card frauds in 2019 and 2020 were about $28.65 billion and $35 billion, respectively [1]. The cardholders' accounts have a great risk of being stolen or used without their permission since fraudsters usually employ some technical means (like Trojan Horse and Certificate stuffing Attacks) to use the unauthorized funds [2]. Therefore, it is imperative for credit card-related enterprises and financial institutions to detect credit card frauds in a timely and accurate fashion. ...
Article
With the popularity of credit cards worldwide, timely and accurate fraud detection has become critically important to ensure the safety of their user accounts. Existing models generally utilize original features or manually aggregated features as their transactional representations, while they fail to reveal the hidden fraudulent behaviors. In this work, we propose a novel model to extract transactional behaviors of users and learn new transactional behavioral representations for credit card fraud detection. Considering the characteristics of transactional behaviors, two time-aware gates are designed in a recurrent neural net unit to learn long- and short-term transactional habits of users, respectively, and to capture behavioral changes of users caused by different time intervals between their consecutive transactions. A time-aware-attention module is proposed and employed to extract the behavioral information from their consecutive historical transactions with time intervals, which enables the proposed model to capture behavioral motive and periodicity inside their historical transactional behaviors. An interaction module is designed to learn more comprehensive and rational representations. To prove the effectiveness of the learned transactional behavioral representations, experiments are conducted on a large real-world transaction dataset and a public one. The results show that the learned representation can well distinguish fraudulent behaviors from legitimate ones, and the proposed method can improve the performance of credit card fraud detection in terms of various evaluation criteria over the state-of-the-art methods.
... A system was proposed [30] in which the authors ignored the non-additivity of the composition of rules in the pool. The authors suggested utilizing a method for predicting every rule's contribution to the pool's performance by using the Shapley value (SV). ...
Article
Full-text available
Online sales and purchases are increasing daily, and they generally involve credit card transactions. This not only provides convenience to the end-user but also increases the frequency of online credit card fraud. In the recent years, in some countries, this fraud increase has led to an exponential increase in credit card fraud detection, which has become increasingly important to address this security issue. Recent studies have proposed machine learning (ML)-based solutions for detecting fraudulent credit card transactions, but their detection scores still need improvement due to the imbalance of classes in any given dataset. Few approaches have achieved exceptional results on different datasets. In this study, the Kaggle dataset was used to develop a deep learning (DL)-based approach to solve the text data problem. A novel text2IMG conversion technique is proposed that generates small images. The images are fed into a CNN architecture with class weights using the inverse frequency method to resolve the class imbalance issue. DL and ML approaches were applied to verify the robustness and validity of the proposed system. An accuracy of 99.87% was achieved by Coarse-KNN using deep features of the proposed CNN.
... The DEAL is highly adaptive and robust towards data imbalance and latent transaction patterns. Gianini et al., 2020 have managed a set of rules using game theory and Zhu et al., 2020 have proposed WELM algorithm to achieve high fraud detection performance. ...
Chapter
Machine learning (ML) proven to be an emerging technology from small-scale to large-scale industries. One of the important industries is banking, where ML is being adapted all over the world by employing online banking. The online banking is using ML techniques in detecting fraudulent transactions like credit card fraud detection, etc. Hence, in this chapter, a Credit card Fraud Detection (CFD) system is devised using Luhn's algorithm and k-means clustering. Moreover, CFD system is also developed using Fuzzy C-Means (FCM) clustering instead of k-means clustering. Performance of CFD using both clustering techniques is compared using precision, recall and f-measure. The FCM gives better results in comparison to k-means clustering. Further, other evaluation metrics such as fraud catching rate, false alarm rate, balanced classification rate, and Mathews correlation coefficient are also calculated to show how well the CFD system works in the presence of skewed data.
... The paper describes (which is consistent with our industry partners' experience) that as the ruleset increases, the effort to maintain a transaction monitoring system also increases, and consequently, the accuracy of fraud detection decreases. An interesting approach that assigns a normalized score to the individual rule, quantifying the rule influence on the pool's overall performance, is described in [39]. ...
Article
Full-text available
Online shopping, already on a steady rise, was propelled even further with the advent of the COVID-19 pandemic. Of course, credit cards are a dominant way of doing business online. The credit card fraud detection problem has become relevant more than ever as the losses due to fraud accumulate. Most research on this topic takes an isolated, focused view of the problem, typically concentrating on tuning the data mining models. We noticed a significant gap between the academic research findings and the rightfully conservative businesses, which are careful when adopting new, especially black-box, models. In this paper, we took a broader perspective and considered this problem from both the academic and the business angle: we detected challenges in the fraud detection problem such as feature engineering and unbalanced datasets and distinguished between more and less lucrative areas to invest in when upgrading fraud detection systems. Our findings are based on the real-world data of CNP (card not present) fraud transactions, which are a dominant type of fraud transactions. Data were provided by our industrial partner, an international card-processing company. We tested different data mining models and approaches to the outlined challenges and compared them to their existing production systems to trace a cost-effective fraud detection system upgrade path.
... Many other approaches have been used recently in the identification of credit card fraud. Gianini et al. [36] proposed a method of rule pool management based on game theory in which the system distributes suspicious transactions for manual investigation while avoiding the need to isolate the individual rules. Based on generative adversarial networks, Fiore et al. [37] proposed a method to generate simulated fraudulent transaction samples to improve the effectiveness of classification models. ...
Article
Full-text available
Credit card fraud detection (CCFD) is important for protecting the cardholder’s property and the reputation of banks. Class imbalance in credit card transaction data is a primary factor affecting the classification performance of current detection models. However, prior approaches are aimed at improving the prediction accuracy of the minority class samples (fraudulent transactions), but this usually leads to a significant drop in the model’s predictive performance for the majority class samples (legal transactions), which greatly increases the investigation cost for banks. In this paper, we propose a heterogeneous ensemble learning model based on data distribution (HELMDD) to deal with imbalanced data in CCFD. We validate the effectiveness of HELMDD on two real credit card datasets. The experimental results demonstrate that compared with current state-of-the-art models, HELMDD has the best comprehensive performance. HELMDD not only achieves good recall rates for both the minority class and the majority class but also increases the savings rate for banks to 0.8623 and 0.6696, respectively.
... The transaction volumes are massive, reflecting a variety of different transaction types, and so are the outlier detection queries for detecting anomalous transaction patterns in contexts changing over time (e.g., sudden above-market payments as it enters pandemic lockdown, indicating hoarding activities; recurring micro-payments for items typically paid at once, indicating a possible tax evasion attempt). A lot of queries meant to detect different fraudulent activities are run at the same time, and the queries keep changing as the need for information and the accuracy changes [1,9]. □ Time Time t 1 t 2 Time t 1 t 2 q 2 q 1 q 3 q 2 q 3 q 4 q 5 ...
... The experimental implementation of the presented system depicted enhanced results in comparison to other decision-modeling techniques. Gianini et al. (2020) applied an intelligent approach for assigning the quantified score in credit card transactions. Specifically, the authors proposed a Coalition Game theory model for summarizing the performance of the users in the context of online fraud. ...
Article
Full-text available
Innovations in the Internet of Things (IoT) technology have revolutionized several industrial domains for smart decision-modeling. The capacity to perceive data about ubiquitous instances has resulted in numerous innovations in sensitive sectors like national security, and police departments. In this paper, an extensive IoT-based framework is introduced for assessing the integrity of police personnel based on his/her performance. The work introduced in this research is centered around analyzing several activities of police personnel to assess his/her integral behavior. In particular, the Probabilistic Measure of Integrity (PMI) is formalized based on professional data analysis for classification based on Bayesian Model. Moreover, the 2-player game model has been presented to assess the performance of police personnel for efficient decision-making. For validation purposes, the presented framework is deployed over challenging datasets acquired from the online repository of UCI. Based on the comparative analysis with the state-of-the-art decision-making models, the presented approach has registered enhanced performance in terms of Temporal Delay, Classification, Prediction, Reliability, and Stability.
... The dataset contains a very large number of features. A large number of techniques is available for feature selection [21,39]. We selected the features by applying a nature-inspired evolutionary optimization algorithm, the modified crow search algorithm (MCSA) developed by Gupta et al. [25]. ...
Article
Full-text available
Healthcare organizations and Health Monitoring Systems generate large volumes of complex data, which offer the opportunity for innovative investigations in medical decision making. In this paper, we propose a beetle swarm optimization and adaptive neuro-fuzzy inference system (BSO-ANFIS) model for heart disease and multi-disease diagnosis. The main components of our analytics pipeline are the modified crow search algorithm, used for feature extraction, and an ANFIS classification model whose parameters are optimized by means of a BSO algorithm. The accuracy achieved in heart disease detection is 99.1%99.1\% with 99.37%99.37\% precision. In multi-disease classification, the accuracy achieved is 96.08%96.08\% with 98.63%98.63\% precision. The results from both tasks prove the comparative advantage of the proposed BSO-ANFIS algorithm over the competitor models.
Article
Full-text available
The digital invasion of the banking and financial sectors made life simple and easy. Traditional machine learning models have been studied in credit card fraud detection, but these models are often difficult to find effective for unseen patterns. This study proposes a combined framework of deep learning and machine learning models. The long short term memory autoencoder (LSTMAE) with attention mechanism is developed to extract high-level features and avoid overfitting of the model. The extracted features serve as input to the powerful ensemble model XGBoost to classify legitimate and fraudulent transactions. As the focus of fraud detection is to increase the recall rate, an adaptive threshold technique is proposed to estimate an optimal threshold value to enhance performance. The experiment was done with the IEEE-CIS fraud detection dataset available in Kaggle. The proposed model with optimal threshold has an increase in predicting fraudulent transactions. The research findings are compared with conventional ensemble techniques to find the generalization of the model. The proposed LSTMAE-XGB w/ attention method attained a good precision and recall of 94.2 and 90.5%, respectively, at the optimal threshold of θ = 0.22. The experimental results proved that the proposed approach is better at finding fraudulent transactions than other cutting-edge models
Article
Full-text available
Economic Credit Scoring (CS), which aids in calculating the credit worth of both individuals and companies, is regarded as one of the greatest study issues in the finance field. In the banking industry, data mining techniques are believed to be helpful since they help designers and developers create appropriate goods or services for customers with the fewest possible risks. Losses and loan cancellations, which are the major sources of hazards in the banking industry, are related to credit risks. A Support Vector Machine based architecture is presented in the current study for the financial credit score prediction system. However, the existing work tends to have increased computational overhead and that requires complete data for attaining required accurate rate. The system known as the Fuzzy Support Vector Machine based Outlier Detection System is introduced in the suggested research study to address this (FSVM-ODS). This study's first grouping of data items utilising a hybrid genetic algorithm with K-Means clustering algorithm is named (HKGA). The dataset must be gradually lowered in size, and calculation time must likewise be decreased. The Enhanced Z-score (EZS) outlier identification (OD) technique was employed in the second step to identify outliers in the dataset. Then, we use a customized beaver searching method to choose the database. For categorization of the datasets, a fuzzy support vector machine is utilised. The whole study project is carried out in the Matlab simulation environment, and it has been shown that the suggested technique achieves a higher outlier identification rate than the current methodology.
Article
Credit card fraud detection (CCFD) is an important issue concerned by financial institutions. Existing methods generally employ aggregated or raw features as their representations to train their detection models. Yet such features tend to fall short of effectively exposing the characteristics of various frauds. In this work, we propose a spatial-temporal gated network (STGN) to automatically learn new informative transactional representations containing users’ transactional behavioral information for CCFD. A gated recurrent neural net unit is specifically constructed with a time-aware gate and location-aware gate to extract users’ spatial and temporal transactional behaviors. A spatial-temporal attention module is designed to expose the transaction motive of users in their historical transactional behaviors, which allows the proposed model to better extract the fraudulent characteristics from successive transactions with time and location information. A representation interaction module is offered to make rational decisions and learn compositive transactional representations. A real-world transaction dataset is used in experiments to verify the efficacy of the learned new representations. The results demonstrate that our proposed model outperforms the state-of-the-art ones, thus greatly advancing the field of CCFD. Note to Practitioners —The features of transaction records reflect the characteristics of users’ transactional behaviors. Therefore, effective features are critical for accurate CCFD. However, fraudsters often pretend to be legitimate users during transactions to deceive the CCFD system. As a result, fraudulent behaviors become concealed within legitimate ones, signifying that original features are inadequate for accurate CCFD. Thus, it is imperative for researchers and practitioners to extract new features that can well expose fraud characteristics. While existing methods employing some transaction aggregation strategies can spot certain fraudulent behaviors, they fail to clearly cluster all the anomalous behaviors and distinguish them from legitimate behaviors. Therefore, this work is driven by the urgent demand to extract new informative features for CCFD. Its primary focus is to unveil the aggregation of fraudulent transactional behaviors from both temporal and spatial perspectives, enabling more accurate CCFD. Specifically, this work introduces a new STGN model that automatically learns new transactional representations incorporating users’ transactional behavioral information for CCFD. By comprehensively considering the time interval and location interval of consecutive user transactions, we thoroughly reveal the temporal and spatial aggregation of fraudulent behavior, which provides valuable insights for CCFD practitioners: 1) employing features that integrate the behavioral characteristics of fraudsters instead of the original features can enhance the model’s capability to identify frauds, and 2) taking into account the time and location intervals of users’ consecutive historical transactions can better uncover the behavioral characteristics of fraudsters.
Chapter
As the number of organizations and their complexity have increased, a tremendous amount of manual effort has to be invested to detect financial fraud. Therefore, powerful machine learning methods have become a critical factor to reduce the workload of financial auditors. However, as most machine learning models have become increasingly complex over the years, a significant need for transparency of artificial intelligence systems in the accounting domain has emerged. In this paper, we propose a novel approach using Shapley additive explanations to improve the transparency of models in the field of financial fraud detection. Our information systems engineering procedure follows the cross industry standard process for data mining including a systematic literature review of machine learning methods in fraud detection, a systematic development process and an explainable artificial intelligence analysis. By training a downstream Logistic Regression, Support Vector Machine and eXtreme Gradient Boosting classifier on a dataset of publicly traded companies convicted of financial statement fraud by the United States Securities and Exchange Commission, we show how the key items for financial statement fraud detection and their directionality can be identified using Shapley additive explanations. Finally, we contribute to the current state of research with this work by increasing model transparency and by generating insights on important financial statement fraud detection variables.
Preprint
Full-text available
The literature on fraud analytics and fraud detection has seen a substantial increase in output in the past decade. This has led to a wide range of research topics and overall little organization of the many aspects of fraud analytical research. The focus of academics ranges from identifying fraudulent credit card payments to spotting illegitimate insurance claims. In addition, there is a wide range of methods and research objectives. This paper aims to provide an overview of fraud analytics in research and aims to more narrowly organize the discipline and its many subfields. We analyze a sample of almost 300 records on fraud analytics published between 2011 and 2020. In a systematic way, we identify the most prominent domains of application, challenges faced, performance metrics, and methods used. In addition, we build a framework for fraud analytical methods and propose a keywording strategy for future research. One of the key challenges in fraud analytics is access to public datasets. To further aid the community, we provide eight requirements for suitable data sets in research motivated by our research. We structure our sample of the literature in an online database. The database is available online for fellow researchers to investigate and potentially build upon.
Article
Artificial intelligence (AI) has accelerated the advancement of financial services by identifying hidden patterns from data to improve the quality of financial decisions. However, in addition to commonly desired attributes, such as model accuracy, financial services demand trustworthy AI with properties that have not been adequately realized. These properties of trustworthy AI are interpretability, fairness and inclusiveness, robustness and security, and privacy protection. Here, we review the recent progress and limitations of applying AI to various areas of financial services, including risk management, fraud detection, wealth management, personalized services, and regulatory technology. Based on these progress and limitations, we introduce FinBrain 2.0, a research framework toward trustworthy AI. We argue that we are still a long way from having a truly trustworthy AI in financial services and call for the communities of AI and financial industry to join in this effort.
Article
Full-text available
This paper, the efficient and effective fraud detection technique, termed SpiHWO-based Deep RNNtechnique is developed. At first, the data transformation is performed for transforming the data using Yeo-Johnson transformation. After that, the effective features are selected based on wrapper method where the best features are selected for further processing. Then, by using selected features, fraud detection is executed based on Deep RNN classifier, which is trained by developed SpiHWO technique. The proposed SpiHWO algorithm is newly developed by combining the SMO algorithm and HWO algorithm. Furthermore, the performance of developed method is computed using performance metrics, like sensitivity, specificity and also accuracy. The developed method achieved improved performance with respect to accuracy of 0.951, sensitivity of 0.985 and specificity of 0.792.
Article
Anomaly detection approaches have become critically important to enhance decision-making systems, especially regarding the process of risk reduction in the economic performance of an organisation and the consumer costs. Previous studies on anomaly detection have examined mainly abnormalities that translate into fraud, such as fraudulent credit card transactions or fraud in insurance systems. However, anomalies represent irregularities in system patterns data, which may arise from deviations, adulterations or inconsistencies. Further, its study encompasses not only fraud, but also any behavioural abnormalities that signal risks. This paper proposes a literature review of methods and techniques to detect anomalies on diverse financial systems using a five-step technique. In our proposed method, we created a classification framework using codes to systematize the main techniques and knowledge on the subject, in addition to identifying research opportunities. Furthermore, the statistical results show several research gaps, among which three main ones should be explored for developing this area: a common database, tests with different dimensional sizes of data and indicators of the detection models' effectiveness. Therefore, the proposed framework is pertinent to comprehending an existing scientific knowledge base and signals important gaps for a research agenda considering the topic of anomalies in financial systems.
Conference Paper
Full-text available
Cognitive relay network is an integrated technique that combines the cognitive radio network (CRN) with wireless relay technology. A range of challenges faces this combination and some of these are proposed in this paper. In this paper, the power allocation, subcarrier allocation, and relay selection are proposed as a problem formulation for OFDM based on cognitive radio network with assistance the two-way amplify-and-forward (AF) relay, under all interference constraints and power constraints that considered in this paper to maximize the total transmission rate for the secondary network. To solve this problem, genetic algorithm (GA) is proposed as an optimization technique to find all subcarriers allocation as matrix pairing A, the relay selection matrix B, and all powers calculated that distributed over 256 levels. The results showed the ability of this algorithm to effectively determine the frequency and power values of the system.
Conference Paper
Very recently, ESEAP mutual authentication protocol was designed to avoid the drawbacks of Wang et al. protocol and highlights that the protocol is protecting all kind of security threats using informal analysis. This work investigates the ESEAP protocol in security point of view and notices that the scheme is not fully protected against stolen verifier attack and does not provide user anonymity. Furthermore, the same protocol has user identity issues, i.e., the server cannot figure out the user identity during the authentication phase. Later we discuss the inconsistencies in the security analysis of ESEAP presented by RESEAP.
Chapter
Given a class label y assigned by a classifier to a point x in feature space, the counterfactual generation task, in its simplest form, consists of finding the minimal edit that moves the feature vector to a new point xx', which the classifier maps to a pre-specified target class yyy'\ne y. Counterfactuals provide a local explanation to a classifier model, by answering the questions “Why did the model choose y instead of yy': what changes to x would make the difference?". An important aspect in classification is ambiguity: typically, the description of an instance is compatible with more than one class. When ambiguity is too high, a suitably designed classifier can map an instance x to a class set Y of alternatives, rather than to a single class, so as to reduce the likelihood of wrong decisions. In this context, known as set-based classification, one can discuss set-based counterfactuals. In this work, we extend the counterfactual generation problem – normally expressed as a constrained optimization problem – to set-based counterfactuals. Using non-singleton counterfactuals, rather than singletons, makes the problem richer under several aspects, related to the fact that non-singleton sets allow for a wider spectrum of relationships among them: (1) the specification of the target set-based class YY' is more varied (2) the target solution xx' that ought to be mapped to YY' is not granted to exist, and, in that case, (3) since one might end up with the availability of a number of feasible alternatives to YY', one has to include the degree of partial fulfillment of the solution into the loss function of the optimization problem.KeywordsSet-based classificationCounterfactual explanations
Chapter
Online transactions are trending worldwide now and in future developments. A big amount of transactional data is generated like networking, stock market, telecommunications, and weather forecasting. This data can be classified for the knowledge extraction and learning. Credit card nowadays is very easy methods for physical and online transactions. Transactions using a credit/debit card having some advantages and flaws. Some of the problems with the credit card transactions are also highlighted here, further in the paper focused on the extensive studies of the various learning methods used by various authors on the imbalanced data stream of the credit card transactions.KeywordsCredit card transactionOnline transactionImbalanced data streamingLearning methods
Chapter
Full-text available
In the cloud environment, enormous amount of the data is shared on the server for the availability of access to the employees or customers related to the organization. Two main issues which are generally faced—when data is shared in cloud environment, first is authenticating the user who can access the data, and secondly, to secure the data itself. Seeing the concern, we proposed the hybrid concept which involves the role-based security as well as TPA-based security. By making use of role-based security, first we have authenticated the users using the graphical authentication in which first the image requires to be selected, then the image gets segmented into image blocks, which when selected then the pattern is formed which is used for the authentication purpose, after that when the user shares the file, then he/she specifies the role who can access the file.KeywordsCloud environmentCloud securityTPARole-based access
Preprint
With the advent of the Internet of things (IoT) era, more and more devices are connected to the IoT. Under the traditional cloud-thing centralized management mode, the transmission of massive data is facing many difficulties, and the reliability of data is difficult to be guaranteed. As emerging technologies, blockchain technology and edge computing (EC) technology have attracted the attention of academia in improving the reliability, privacy and invariability of IoT technology. In this paper, we combine the characteristics of the EC and blockchain to ensure the reliability of data transmission in the IoT. First of all, we propose a data transmission mechanism based on blockchain, which uses the distributed architecture of blockchain to ensure that the data is not tampered with; secondly, we introduce the three-tier structure in the architecture in turn; finally, we introduce the four working steps of the mechanism, which are similar to the working mechanism of blockchain. In the end, the simulation results show that the proposed scheme can ensure the reliability of data transmission in the Internet of things to a great extent.
Article
Full-text available
Using an efficient and scientific planning tool for planning and scheduling project can be considered a crucial process. Two approaches have been applied to find the total completion time of project, namely program evaluation and review technique (probabilistic PERT network) and binomial distribution cumulative density function (CDF). Cumulative density function is assumed that time is a random variable that followed the discrete distribution (binomial distribution). The coefficient of variation that depends on (S, X-) has been calculated to determine uncertainty of activity completion (c.v) at each stage of project where it is between (0,0.103), and it is a very weak value so this illustrates that most activities are worked as it planned. The final results show that the cumulative function method is more accurate than the traditional method (PERT) where the wasted time was decreased around 4 days. The total project completion time by using PERT is 33 days where it is 29 days by using the cumulative function method.
Article
Purpose The best algorithm that was implemented on this Brazilian dataset was artificial immune system (AIS) algorithm. But the time and cost of this algorithm are high. Using asexual reproduction optimization (ARO) algorithm, the authors achieved better results in less time. So the authors achieved less cost in a shorter time. Their framework addressed the problems such as high costs and training time in credit card fraud detection. This simple and effective approach has achieved better results than the best techniques implemented on our dataset so far. The purpose of this paper is to detect credit card fraud using ARO. Design/methodology/approach In this paper, the authors used ARO algorithm to classify the bank transactions into fraud and legitimate. ARO is taken from asexual reproduction. Asexual reproduction refers to a kind of production in which one parent produces offspring identical to herself. In ARO algorithm, an individual is shown by a vector of variables. Each variable is considered as a chromosome. A binary string represents a chromosome consisted of genes. It is supposed that every generated answer exists in the environment, and because of limited resources, only the best solution can remain alive. The algorithm starts with a random individual in the answer scope. This parent reproduces the offspring named bud. Either the parent or the offspring can survive. In this competition, the one which outperforms in fitness function remains alive. If the offspring has suitable performance, it will be the next parent, and the current parent becomes obsolete. Otherwise, the offspring perishes, and the present parent survives. The algorithm recurs until the stop condition occurs. Findings Results showed that ARO had increased the AUC (i.e. area under a receiver operating characteristic (ROC) curve), sensitivity, precision, specificity and accuracy by 13%, 25%, 56%, 3% and 3%, in comparison with AIS, respectively. The authors achieved a high precision value indicating that if ARO detects a record as a fraud, with a high probability, it is a fraud one. Supporting a real-time fraud detection system is another vital issue. ARO outperforms AIS not only in the mentioned criteria, but also decreases the training time by 75% in comparison with the AIS, which is a significant figure. Originality/value In this paper, the authors implemented the ARO in credit card fraud detection. The authors compared the results with those of the AIS, which was one of the best methods ever implemented on the benchmark dataset. The chief focus of the fraud detection studies is finding the algorithms that can detect legal transactions from the fraudulent ones with high detection accuracy in the shortest time and at a low cost. That ARO meets all these demands.
Chapter
The arrival of communication methods, along with online imbursement dealings, is growing every day. Together with this, fiscal scams linked with these dealings are also escalating. Amid several fiscal scams, credit card scam is the utmost common and hazardous one owing to its extensive practice. To perceive these illegal bases of communications, a credit card scam recognition structure is obligatory. In this paper, we propose to design a user behaviour-based accurate detection for credit card frauds. In this technique, user behaviours over a banking website are collected and analysed. The main features related to user behaviour are frequently visited pages, time spent on each page, clicking, etc. From these details, the page reachability and page utility are determined. From this, the normal and abnormal behaviours of users are classified using web Markov skeleton process (WMSP) model.
Chapter
Cognitive relay network is an integrated technique that combines the cognitive radio network (CRN) with wireless relay technology. A range of challenges faces this combination and some of these are proposed in this paper. In this paper, the power allocation, subcarrier allocation, and relay selection are proposed as a problem formulation for OFDM based on cognitive radio network with assistance the two-way amplify-and-forward (AF) relay, under all interference constraints and power constraints that considered in this paper to maximize the total transmission rate for the secondary network. To solve this problem, genetic algorithm (GA) is proposed as an optimization technique to find all subcarriers allocation as matrix pairing A, the relay selection matrix B, and all powers calculated that distributed over 256 levels. The results showed the ability of this algorithm to effectively determine the frequency and power values of the system.
Article
Internet of Things (IoT) technology backed by Artificial Intelligence (AI) techniques has been increasingly utilized for the realization of the Industry 4.0 vision. Conspicuously, this work provides a novel notion of the smart sports industry for provisioning efficient services in the sports arena. Specifically, an IoT-inspired framework has been proposed for real-time analysis of athlete performance. IoT data is utilized to quantify athlete performance in the terms of probability parameters of Probabilistic Measure of Performance (PMP) and Level of Performance Measure (LoPM). Moreover, a two-player game-theory-based mathematical framework has been presented for efficient decision modeling by the monitoring officials. The presented model is validated experimentally by deployment in District Sports Academy (DSA) for 60 days over four players. Based on the comparative analysis with state-of-the-art decision-modeling approaches, the proposed model acquired enhanced performance values in terms of Temporal Delay, Classification Efficiency, Statistical Efficacy, Correlation Analysis, and Reliability.
Article
This paper proposes to design a majority vote ensemble classifier for accurate detection of credit card frauds. In this technique, the behaviour, operational and transactional features of users are combined into a single feature. The user behaviours over a banking website are collected and so that normal and abnormal behaviours of users are classified using Web Markov Skeleton Process (WMSP) model. The operational and transaction features of users are collected and classified using the Random Forest (RF) classifier and Support Vector Machine (SVM), respectively. Finally, the classification results of WMSP, RF and SVM are passed on to the majority voting based ensemble (MVE) classifier, which accurately predicts fraud users. By experimental results, it was shown that the MVE classifier achieve higher detection rate with good accuracy.
Article
Full-text available
Supervised learning techniques are widely employed in credit card fraud detection, as they make use of the assumption that fraudulent patterns can be learned from an analysis of past transactions. The task becomes challenging, however, when it has to take account of changes in customer behavior and fraudsters’ ability to invent novel fraud patterns. In this context, unsupervised learning techniques can help the fraud detection systems to find anomalies. In this paper we present a hybrid technique that combines supervised and unsupervised techniques to improve the fraud detection accuracy. Unsupervised outlier scores, computed at different levels of granularity, are compared and tested on a real, annotated, credit card fraud detection dataset. Experimental results show that the combination is efficient and does indeed improve the accuracy of the detection.
Conference Paper
Full-text available
Machine learning and data mining techniques have been used extensively in order to detect credit card frauds. However, most studies consider credit card transactions as isolated events and not as a sequence of transactions. In this article, we model a sequence of credit card transactions from three different perspectives, namely (i) does the sequence contain a Fraud? (ii) Is the sequence obtained by fixing the card-holder or the payment terminal? (iii) Is it a sequence of spent amount or of elapsed time between the current and previous transactions? Combinations of the three binary perspectives give eight sets of sequences from the (training) set of transactions. Each one of these sets is modelled with a Hidden Markov Model (HMM). Each HMM associates a likelihood to a transaction given its sequence of previous transactions. These likelihoods are used as additional features in a Random Forest classifier for fraud detection. This multiple perspectives HMM-based approach enables an automatic feature engineering in order to model the sequential properties of the dataset with respect to the classification task. This strategy allows for a 15% increase in the precision-recall AUC compared to the state of the art feature engineering strategy for credit card fraud detection.
Patent
Full-text available
The invention relates to a system and method for managing financial transactions, the system including at least one database containing the transaction information, at least one fraud detection device comprising at least one rule evaluation module and a rule selection module, connected to each other and comprising means for implementing the method, the evaluation module making it possible to calculate an estimate of the contribution of each rule of the set of rules relative to a parameter representing the overall performance of a set of rules stored in a first memory of the detection device, and an evaluation report file analysed by the selection module to select a subset of rules from among the evaluated rules of the set of rules, the selected rules being stored in a second memory of the fraud detection device to be used for transaction control. https://worldwide.espacenet.com/publicationDetails/biblio?II=0&ND=3&adjacent=true&locale=en_EP&FT=D&date=20181025&CC=WO&NR=2018193085A1&KC=A1#
Article
Full-text available
Artificial Neural Networks have shown impressive success in very different application cases. Choosing a proper network architecture is a critical decision for a network's success, usually done in a manual manner. As a straightforward strategy, large, mostly fully connected architectures are selected, thereby relying on a good optimization strategy to find proper weights while at the same time avoiding overfitting. However, large parts of the final network are redundant. In the best case, large parts of the network become simply irrelevant for later inferencing. In the worst case, highly parameterized architectures hinder proper optimization and allow the easy creation of adverserial examples fooling the network. A first step in removing irrelevant architectural parts lies in identifying those parts, which requires measuring the contribution of individual components such as neurons. In previous work, heuristics based on using the weight distribution of a neuron as contribution measure have shown some success, but do not provide a proper theoretical understanding. Therefore, in our work we investigate game theoretic measures, namely the Shapley value (SV), in order to separate relevant from irrelevant parts of an artificial neural network. We begin by designing a coalitional game for an artificial neural network, where neurons form coalitions and the average contributions of neurons to coalitions yield to the Shapley value. In order to measure how well the Shapley value measures the contribution of individual neurons, we remove low-contributing neurons and measure its impact on the network performance. In our experiments we show that the Shapley value outperforms other heuristics for measuring the contribution of neurons.
Article
Full-text available
Credit card fraud detection is a very challenging problem because of the specific nature of transaction data and the labeling process. The transaction data are peculiar because they are obtained in a streaming fashion, and they are strongly imbalanced and prone to non-stationarity. The labeling is the outcome of an active learning process, as every day human investigators contact only a small number of cardholders (associated with the riskiest transactions) and obtain the class (fraud or genuine) of the related transactions. An adequate selection of the set of cardholders is therefore crucial for an efficient fraud detection process. In this paper, we present a number of active learning strategies and we investigate their fraud detection accuracies. We compare different criteria (supervised, semi-supervised and unsupervised) to query unlabeled transactions. Finally, we highlight the existence of an exploitation/exploration trade-off for active learning in the context of fraud detection, which has so far been overlooked in the literature.
Article
Full-text available
Detecting frauds in credit card transactions is perhaps one of the best testbeds for computational intelligence algorithms. In fact, this problem involves a number of relevant challenges, namely: concept drift (customers' habits evolve and fraudsters change their strategies over time), class imbalance (genuine transactions far outnumber frauds), and verification latency (only a small set of transactions are timely checked by investigators). However, the vast majority of learning algorithms that have been proposed for fraud detection rely on assumptions that hardly hold in a real-world fraud-detection system (FDS). This lack of realism concerns two main aspects: 1) the way and timing with which supervised information is provided and 2) the measures used to assess fraud-detection performance. This paper has three major contributions. First, we propose, with the help of our industrial partner, a formalization of the fraud-detection problem that realistically describes the operating conditions of FDSs that everyday analyze massive streams of credit card transactions. We also illustrate the most appropriate performance measures to be used for fraud-detection purposes. Second, we design and assess a novel learning strategy that effectively addresses class imbalance, concept drift, and verification latency. Third, in our experiments, we demonstrate the impact of class unbalance and concept drift in a real-world data stream containing more than 75 million transactions, authorized over a time window of three years.
Conference Paper
Full-text available
Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
Conference Paper
Full-text available
Global card fraud losses amounted to 16.31 Billion US dollars in 2014 [18]. To recover this huge amount, automated Fraud Detection Systems (FDS) are used to deny a transaction before it is granted. In this paper, we start from a graph-based FDS named APATE [28]: this algorithm uses a collective inference algorithm to spread fraudulent influence through a network by using a limited set of confirmed fraudulent transactions. We propose several improvements from the network data analysis literature [16] and semi-supervised learning [9] to this approach. Furthermore, we redesigned APATE to fit to e-commerce field reality. Those improvements have a high impact on performance, multiplying Precision@100 by three, both on fraudulent card and transaction prediction. This new method is assessed on a three-months real-life e-commerce credit card transactions data set obtained from a large credit card issuer.
Conference Paper
Full-text available
Fraud detection is a critical problem affecting large financial companies that has increased due to the growth in credit card transactions. This paper presents a new method for automatic detection of frauds in credit card transactions based on non-linear signal processing. The proposed method consists of the following stages: feature extraction, training and classification, decision fusion, and result presentation. Discriminant-based classifiers and an advanced non-Gaussian mixture classification method are employed to distinguish between legitimate and fraudulent transactions. The posterior probabilities produced by classifiers are fused by means of order statistical digital filters. Results from data mining of a large database of real transactions are presented. The feasibility of the proposed method is demonstrated for several datasets using parameters derived from receiver characteristic operating analysis and key performance indicators of the business.
Article
Full-text available
The Shapley value is arguably the most central normative solution concept in cooperative game theory. It specifies a unique way in which the reward from cooperation can be "fairly" divided among players. While it has a wide range of real world applications, its use is in many cases hampered by the hardness of its computation. A number of researchers have tackled this problem by (1) focusing on classes of games where the Shapley value can be computed efficiently, or (2) proposing representation formalisms that facilitate such efficient computation, or (3) approximating the Shapley value in certain classes of games. However, given the classical \textit{characteristic function} representation, the only attempt to approximate the Shapley value for the general class of games is due to Castro \textit{et al.} \cite{castro}. While this algorithm provides a bound on the approximation error, this bound is \textit{asymptotic}, meaning that it only holds when the number of samples increases to infinity. On the other hand, when a finite number of samples is drawn, an unquantifiable error is introduced, meaning that the bound no longer holds. With this in mind, we provide non-asymptotic bounds on the estimation error for two cases: where (1) the \textit{variance}, and (2) the \textit{range}, of the players' marginal contributions is known. Furthermore, for the second case, we show that when the range is significantly large relative to the Shapley value, the bound can be improved (from O(r,\sqrt{\nicefrac{1}{m}}) to O(\sqrt{r},\sqrt{\nicefrac{1}{m}})). Finally, we propose, and demonstrate the effectiveness of, using stratified sampling to improve the bounds.
Article
Full-text available
Working with multiple regression analysis a researcher usually wants to know a comparative importance of predictors in the model. However, the analysis can be made difficult because of multicollinearity among regressors, which produces biased coefficients and negative inputs to multiple determination from presum ably useful regressors. To solve this problem we apply a tool from the co-operative games theory, the Shapley Value imputation. We demonstrate the theoretical and practical advantages of the Shapley Value and show that it provides consistent results in the presence of multicollinearity. Copyright © 2001 John Wiley & Sons, Ltd.
Conference Paper
Full-text available
We present and study the Contribution-Selection algorithm (CSA), a novel algorithm for feature selection. The algorithm is based on the Multiperturbation Shapley Analysis, a framework which relies on game theory to estimate usefulness. The algorithm iteratively estimates the usefulness of features and selects them accordingly, using either forward selection or backward elimination. Empirical comparison with several other existing feature selection methods shows that the backward elimination variant of CSA leads to the most accurate classification results on an array of datasets.
Article
Full-text available
Many multiagent domains where cooperation among agents is crucial to achieving a common goal can be modeled as coalitional games. However, in many of these domains, agents are unequal in their power to affect the outcome of the game. Prior research on weighted voting games has explored power indices, which reflect how much “real power” a voter has. Although primarily used for voting games, these indices can be applied to any simple coalitional game. Computing these indices is known to be computationally hard in various domains, so one must sometimes resort to approximate methods for calculating them. We suggest and analyze randomized methods to approximate power indices such as the Banzhaf power index and the Shapley–Shubik power index. Our approximation algorithms do not depend on a specific representation of the game, so they can be used in any simple coalitional game. Our methods are based on testing the game’s value for several sample coalitions. We show that no approximation algorithm can do much better for general coalitional games, by providing lower bounds for both deterministic and randomized algorithms for calculating power indices. We also provide empirical results regarding our method, and show that it typically achieves much better accuracy and confidence than those required.
Article
Full-text available
A few applications of the Shapley value are described. The main choice criterion is to look at quite diversified fields, to appreciate how wide is the terrain that has been explored and colonized using this and related tools.
Article
Full-text available
We present and study the contribution-selection algorithm (CSA), a novel algorithm for feature selection. The algorithm is based on the multiperturbation shapley analysis (MSA), a framework that relies on game theory to estimate usefulness. The algorithm iteratively estimates the usefulness of features and selects them accordingly, using either forward selection or backward elimination. It can optimize various performance measures over unseen data such as accuracy, balanced error rate, and area under receiver-operator-characteristic curve. Empirical comparison with several other existing feature selection methods shows that the backward elimination variant of CSA leads to the most accurate classification results on an array of data sets.
Article
Algorithms for NP-complete problems often have different strengths andweaknesses, and thus algorithm portfolios often outperform individualalgorithms. It is surprisingly difficult to quantify a component algorithm's contributionto such a portfolio. Reporting a component's standalone performance wronglyrewards near-clones while penalizing algorithms that have small but distinctareas of strength. Measuring a component's marginal contribution to an existingportfolio is better, but penalizes sets of strongly correlated algorithms,thereby obscuring situations in which it is essential to have at least onealgorithm from such a set. This paper argues for analyzing component algorithmcontributions via a measure drawn from coalitional game theory---the Shapleyvalue---and yields insight into a research community's progress over time. Weconclude with an application of the analysis we advocate to SAT competitions,yielding novel insights into the behaviour of algorithm portfolios, theircomponents, and the state of SAT solving technology.
Article
One of the fundamental research challenges in network science is centrality analysis, i.e., identifying the nodes that play the most important roles in the network. In this article, we focus on the game-theoretic approach to centrality analysis. While various centrality indices have been recently proposed based on this approach, it is still unknown how general is the game-theoretic approach to centrality and what distinguishes some game-theoretic centralities from others. In this article, we attempt to answer this question by providing the first axiomatic characterization of game-theoretic centralities. Specifically, we show that every possible centrality measure can be obtained following the game-theoretic approach. Furthermore, we study three natural classes of game-theoretic centrality, and prove that they can be characterized by certain intuitive properties pertaining to the well-known notion of Fairness due to Myerson.
Article
Due to the growing volume of electronic payments, the monetary strain of credit-card fraud is turning into a substantial challenge for financial institutions and service providers, thus forcing them to continuously improve their fraud detection systems. However, modern data-driven and learning-based methods, despite their popularity in other domains, only slowly find their way into business applications. In this paper, we phrase the fraud detection problem as a sequence classification task and employ Long Short-Term Memory (LSTM) networks to incorporate transaction sequences. We also integrate state-of-the-art feature aggregation strategies and report our results by means of traditional retrieval metrics. A comparison to a baseline random forest (RF) classifier showed that the LSTM improves detection accuracy on offline transactions where the card-holder is physically present at a merchant. Both the sequential and non-sequential learning approaches benefit strongly from manual feature aggregation strategies. A subsequent analysis of true positives revealed that both approaches tend to detect different frauds, which suggests a combination of the two. We conclude our study with a discussion on both practical and scientific challenges that remain unsolved.
Chapter
With the advancements in various data mining and social network-related approaches, datasets with a very high feature—dimensionality are often used. Various information theoretic approaches have been tried to select the most relevant set of features, and hence bring down the size of the data. Most of the times these approaches try to find a way to rank the features, so as to select or remove a fixed number of features. These principles usually assume some probability distribution for the data. These approaches also fail to capture the individual contribution of every feature in a given set of features. In this paper, we propose an approach which uses the Relief algorithm and cooperative game theory to solve the problems mentioned above. The approach was tested on NIPS 2003 and UCI datasets using different classifiers and the results were comparable to the state-of-the-art methods.
Article
Independence between detectors is normally assumed in order to simplify the algorithms and techniques used in decision fusion. In this paper, we derive the optimum fusion rule of N non-independent detectors in terms of the individual probabilities of detection and false alarm and defined dependence factors. This has interest for the implementation of the optimum detector, the incorporation of specific dependence models and for gaining insights into the implications of dependence. This later is illustrated with a detailed analysis of the two equally-operated non-independent detectors case. We show, for example, that not any dependence model is compatible with an arbitrary point of operation of the detectors, and that optimality of the counting rule is preserved in presence of dependence if the individual detectors are “good enough”. We have derived also the expressions of the probability of detection and false alarm after fusion of dependent detectors. Theoretical results are verified in a real data experiment with acoustic signals.
Article
Every year billions of Euros are lost worldwide due to credit card fraud. Thus, forcing financial institutions to continuously improve their fraud detection systems. In recent years, several studies have proposed the use of machine learning and data mining techniques to address this problem. However, most studies used some sort of misclassification measure to evaluate the different solutions, and do not take into account the actual financial costs associated with the fraud detection process. Moreover, when constructing a credit card fraud detection model, it is very important how to extract the right features from the transactional data. This is usually done by aggregating the transactions in order to observe the spending behavioral patterns of the customers. In this paper we expand the transaction aggregation strategy, and propose to create a new set of features based on analyzing the periodic behavior of the time of a transaction using the von Mises distribution. Then, using a real credit card fraud dataset provided by a large European card processing company, we compare state-of-the-art credit card fraud detection models, and evaluate how the different sets of features have an impact on the results. By including the proposed periodic features into the methods, the results show an average increase in savings of 13%.
Article
We present a sensitivity analysis-based method for explaining prediction models that can be applied to any type of classification or regression model. Its advantage over existing general methods is that all subsets of input features are perturbed, so interactions and redundancies between features are taken into account. Furthermore, when explaining an additive model, the method is equivalent to commonly used additive model-specific methods. We illustrate the method's usefulness with examples from artificial and real-world data sets and an empirical analysis of running times. Results from a controlled experiment with 122 participants suggest that the method's explanations improved the participants' understanding of the model.
Conference Paper
Models for the processes by which ideas and influence propagate through a social network have been studied in a number of do- mains, including the diffusion of medical and technological innova- tions, the sudden and widespread adoption of various strategies in game-theoretic settings, and the effects of “word of mouth” in the promotion of new products. Recently, motivated by the design of viral marketing strategies, Domingos and Richardson posed a fun- damental algorithmic problem for such social network processes: if we can try to convince a subset of individuals to adopt a new product or innovation, and the goal is to trigger a large cascade of further adoptions, which set of individuals should we target? We consider this problem in several of the most widely studied models in social network analysis. The optimization problem of selecting the most influential nodes is NP-hard here, and we pro- vide the first provable approximation guarantees for efficient algo- rithms. Using an analysis framework based on submodular func- tions, we show that a natural greedy strategy obtains a solution that is provably within 63% of optimal for several classes of models; our framework suggests a general approach for reasoning about the performance guarantees of algorithms for these types of influence problems in social networks. We also provide computational experiments on large collabora- tion networks, showing that in addition to their provable guaran- tees, our approximation algorithms significantly out-perform node- selection heuristics based on the well-studied notions of degree centrality and distance centrality from the field of social networks.
Article
The transferable belief model is a model to represent quantified beliefs based on the use of belief functions, as initially proposed by Shafer. It is developed independently from any underlying related probability model. We summarize our interpretation of the model and present several recent results that characterize the model. We show how rational decision must be made when beliefs are represented by belief functions. We explain the origin of the two Dempster's rules that underlie the dynamic of the model through the concept of specialization and least commitment. We present the canonical decomposition of any belief functions, and discover the concept of 'debt of beliefs'. We also present the generalization of the Bayesian Theorem to belief functions.
Article
In the following paper we offer a method for the a priori evaluation of the division of power among the various bodies and members of a legislature or committee system. The method is based on a technique of the mathematical theory of games, applied to what are known there as “simple games” and “weighted majority games.” We apply it here to a number of illustrative cases, including the United States Congress, and discuss some of its formal properties. The designing of the size and type of a legislative body is a process that may continue for many years, with frequent revisions and modifications aimed at reflecting changes in the social structure of the country; we may cite the role of the House of Lords in England as an example. The effect of a revision usually cannot be gauged in advance except in the roughest terms; it can easily happen that the mathematical structure of a voting system conceals a bias in power distribution unsuspected and unintended by the authors of the revision. How, for example, is one to predict the degree of protection which a proposed system affords to minority interests? Can a consistent criterion for “fair representation” be found? It is difficult even to describe the net effect of a double representation system such as is found in the U. S. Congress (i.e., by states and by population), without attempting to deduce it a priori . The method of measuring “power” which we present in this paper is intended as a first step in the attack on these problems.
Article
Credit card fraud is a serious and growing problem. While predictive models for credit card fraud detection are in active use in practice, reported studies on the use of data mining approaches for credit card fraud detection are relatively few, possibly due to the lack of available data for research. This paper evaluates two advanced data mining approaches, support vector machines and random forests, together with the well-known logistic regression, as part of an attempt to better detect (and thus control and prosecute) credit card fraud. The study is based on real-life data of transactions from an international credit card operation.
Article
The Shapley value is a key solution concept for coalitional games in general and voting games in particular. Its main advantage is that it provides a unique and fair solution, but its main drawback is the complexity of computing it (e.g., for voting games this complexity is #p-complete). However, given the importance of the Shapley value and voting games, a number of approximation methods have been developed to overcome this complexity. Among these, Owen's multi-linear extension method is the most time efficient, being linear in the number of players. Now, in addition to speed, the other key criterion for an approximation algorithm is its approximation error. On this dimension, the multi-linear extension method is less impressive. Against this background, this paper presents a new approximation algorithm, based on randomization, for computing the Shapley value of voting games. This method has time complexity linear in the number of players, but has an approximation error that is, on average, lower than Owen's. In addition to this comparative study, we empirically evaluate the error for our method and show how the different parameters of the voting game affect it. Specifically, we show the following effects. First, as the number of players in a voting game increases, the average percentage error decreases. Second, as the quota increases, the average percentage error decreases. Third, the error is different for players with different weights; players with weight closer to the mean weight have a lower error than those with weight further away. We then extend our approximation to the more general k-majority voting games and show that, for n players, the method has time complexity O(k2n) and the upper bound on its approximation error is .
A method for evaluating the distribution of power in a committee system. American political science review
  • S Lloyd
  • Martin Shapley
  • Shubik
Lloyd S Shapley and Martin Shubik. A method for evaluating the distribution of power in a committee system. American political science review, 48(03):787-792, 1954.
Explaining prediction models and individual predictions with feature contributions
  • Igor Erikštrumbeljerikˇerikštrumbelj
  • Kononenko
ErikŠtrumbeljErikˇErikŠtrumbelj and Igor Kononenko. Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems, 41(3):647-665, 2014.
Bounding the estimation error of sampling-based shapley value approximation
  • Sasan Maleki
  • Long Tran-Thanh
  • Greg Hines
  • Talal Rahwan
  • Alex Rogers
Sasan Maleki, Long Tran-Thanh, Greg Hines, Talal Rahwan, Alex Rogers, Bounding the estimation error of sampling-based shapley value approximation, arXiv preprint arXiv:1306.4265, 2013.
Ernesto Damiani, FR3065558A1, System and method to manage the detection of fraud in a system of financial transactions
  • Olivier Caelen
  • Gabriele Gianini
Olivier Caelen, Gabriele Gianini, Ernesto Damiani, FR3065558A1, System and method to manage the detection of fraud in a system of financial transactions.