Conference Paper

Fake Profiles Identification on Social Networks With Bio Inspired Algorithm

Authors:
  • LabRI-SBA Lab. Ecole Superieure en Informatique Sidi Bel Abbes, Algeria
  • Ecole Superieure en Informatique, Sidi Bel-Abbes, Algeria
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

All around the world, people are drawn to online social networks as Facebook or Twitter. Even so, the more frequently these social networks are used, the more security, integrity and confidentiality issues arise. Now, and more than ever, it important to make sure of following the right account or dealing with a real consumer on any online social network, to avoid dangerous and harmful situations. This paper proposes an approach for detecting fake profiles on social media. This approach is based on hybridation between a machine learning algorithm and a bio inspired algorithm. To detect fake profiles, the proposed approach makes use of a dataset from Facebook social network. The hybrid approach consists of two stages. The first stage is to use Satin Bowerbird Optimization algorithm which assures us of finding the best bower, which is used in stage two as an initial centroid within k-means clustering algorithm, that make sure of accurate profiles types detection. When the results of the proposed approach are compared with well-known machine learning algorithms, it outperforms them.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Mihalcea and Tarau developed TextRank [13], a graph-based ranking model for text processing, and showcased its effective application in natural language processing tasks, including keyword and sentence extraction. ...
... To extract the key terms from a text, we calculate the weight of each lexical unit in the text with the TextRank (TR) algorithm [13]. We can take a text as a directed graph. ...
Article
Numerous contributors share social problem ideas through open innovation. However, manually analyzing and processing many ideas, extracting information, and identifying expert ideas is difficult. Natural language processing can analyze and process crowd ideas. Our solution outsources creative tasks to crowds and automatically processes ideas. After processing unstructured texts, our solution analyzes text dependencies to extract syntactic sets. The third step, named entity recognition, detects proper names, organization names, locations, countries, etc. in texts. Key term extraction summarizes a text's meaning in the most useful key terms. Next, we find the most important sentences in texts to summarize ideas. Our sixth step extracts text patterns using the previous steps' results. The final step is identifying expert idea sources based on crowd idea patterns. We prove our method works in two studies. First, we compared named entity extraction and syntactic sets to human judgment. Second, we assessed key term extraction to a natural language processing API. Finally, we asked an economics expert to identify expert ideas in web form responses and compared our results with an online API this contributor used. Our method for textual data exploration produces promising results.
Article
Full-text available
Online social media platforms today have many more users than ever before. This increased fake profiles trends which is harming both social and business entities as fraudsters use images of people for creating new fake profiles. However, most of those proposed methods are outdated and aren't accurate enough with an average accuracy of 83%. Our proposed solution, for this problem, is a Spark ML-based project that can predict fake profiles with higher accuracy than other present methods of profile recognition. Our project consists of Spark ML libraries including Random Forest Classifier and other plotting tools. We have described our proposed model diagram and tried to depict our results in graphical representations like confusion matrix, learning curve and ROC plot for better understanding. Research findings through this project illustrate that this proposed system has accuracy of 93% in finding fake profiles over social media platforms. While there is 7% false positive rate in which our system fails to correctly identify a fake profile.
Article
Full-text available
Social media platforms like Twitter have become common tools for disseminating and consuming news because of the ease with which users can get access to and consume it. This paper focuses on the identification of false news and the use of cutting-edge detection methods in the context of news, user, and social levels. Fake news detection taxonomy was proposed in this research. This study examines a variety of cutting-edge methods for spotting false news and discusses their drawbacks. It also explored how to detect and recognize false news, such as credibility-based, time-based, social context-based, and the substance of the news itself. Lastly, the paper examines various datasets used for detecting fake news and proposed an algorithm.
Article
Full-text available
A protuberant issue of the present time is that, organizations from different domains are struggling to obtain effective solutions for detecting online-based fake news. It is quite thought-provoking to distinguish fake information on the internet as it is often written to deceive users. Compared with many machine learning techniques, deep learning-based techniques are capable of detecting fake news more accurately. Previous review papers were based on data mining and machine learning techniques, scarcely exploring the deep learning techniques for fake news detection. However, emerging deep learning-based approaches such as Attention, Generative Adversarial Networks, and Bidirectional Encoder Representations for Transformers are absent from previous surveys. This study attempts to investigate advanced and state-of-the-art fake news detection mechanisms pensively. We begin with highlighting the fake news consequences. Then, we proceed with the discussion on the dataset used in previous research and their NLP techniques. A comprehensive overview of deep learning-based techniques has been bestowed to organize representative methods into various categories. The prominent evaluation metrics in fake news detection are also discussed. Nevertheless, we suggest further recommendations to improve fake news detection mechanisms in future research directions.
Article
Full-text available
With the arrival of the Internet and social media, at the same time as masses of humans have benefitted from the full-size reassets of records available, there was an full-size boom with inside the upward push of cyber-crimes, mainly targeted closer to women. According to a 2019 file with inside the Economics Times, India has witnessed a 457% upward push in cybercrime with inside the 5 years span among 2011 and 2016. Most speculate that that is because of effect of social media inclusive of Facebook, Instagram and Twitter on our day by day lives. While those simply assist in growing a legitimate social network, advent of consumer debts in those websites normally desires simply an email-id. A actual lifestyles man or woman can create more than one fake IDs and for this reason impostors can effortlessly be made. Unlike the actual international state of affairs in which more than one policies and guidelines are imposed to become aware of oneself in a completely unique manner (as an instance at the same time as issuing one’s passport or driver’s license), with inside the digital international of social media, admission does now no longer require this kind of checks. In this paper, we study the one-of-a-kind debts of Instagram, specifically and try and verify an account as fake or actual the use of Machine Learning strategies specifically Logistic Regression and Random Forest Algorithm.
Article
Full-text available
Decision tree classifiers are regarded to be a standout of the most well-known methods to data classification representation of classifiers. Different researchers from various fields and backgrounds have considered the problem of extending a decision tree from available data, such as machine study, pattern recognition, and statistics. In various fields such as medical disease analysis, text classification, user smartphone classification, images, and many more the employment of Decision tree classifiers has been proposed in many ways. This paper provides a detailed approach to the decision trees. Furthermore, paper specifics, such as algorithms/approaches used, datasets, and outcomes achieved, are evaluated and outlined comprehensively. In addition, all of the approaches analyzed were discussed to illustrate the themes of the authors and identify the most accurate classifiers. As a result, the uses of different types of datasets are discussed and their findings are analyzed.
Article
Full-text available
Nowadays, Twitter has become one of the fastest-growing Online Social Networks (OSNs) for data sharing frameworks and microblogging. It attracts millions of users worldwide where subscribers communicate with each through posts and messages known as "tweets". The open structure and behaviour of Twitter cause it to be vulnerable to attacks from fake accounts and a large number of automated software, known as 'bots'. Bots are regarded to be malicious as they send spam to users of social networks over the internet. Data security and privacy are among the most critical issues of social network users, as the protection and fulfilment of these requirements strengthen the network's interest and, ultimately, its credibility. To overcome these issues, we need to build an efficient model to detect and classify fake Twitter accounts. This paper presents a new approach with dual functions, namely to identify and classify the Twitter bots based on ontological engineering and Semantic Web Rule Language (SWRL) rules. Web Ontology Language (OWL), Semantic Web Rule Language (SWRL) rules, and reasoners are deployed to inductively learn the rules that distinguish a fake account (bot) from a real one, as well as to classify fake accounts into fake followers or spam bot. Our approach could properly identify the false account with an accuracy of (97%) in the first stage, after which these fake accounts were classified into spam or fake follower bots with an accuracy rate of (94.9%). Furthermore, it has been found that the ontology classifier is a more interpretable model that offers straightforward and human-interpretable decision rules, as compared to other machine learning classifiers.
Article
Full-text available
Data clustering is an important data analysis and data mining tool in many fields such as pattern recognition and image processing. The goal of data clustering is to optimally organize similar objects into clusters. Grey wolf optimizer is a newly introduced optimization algorithm with inspiration from the social behavior of gray wolves. In this work, we propose a modified gray wolf optimizer to tackle some of the challenges in meta-heuristic algorithms. These modifications include a balanced approach to the exploration and exploitation stages of the algorithm as well as a local search around the best solution found. The performance of the proposed algorithm is compared to seven other clustering methods on nine data sets from the UCI machine learning laboratory. Experimental results demonstrate the competence of the proposed algorithm in solving data clustering problems. Overall, the intra-cluster distance of the proposed algorithm is lower than other algorithms and gives an average error rate of 11.22% which is the lowest among all.
Article
Full-text available
The k-means algorithm is generally the most known and used clustering method. There are various extensions of k-means to be proposed in the literature. Although it is an unsupervised learning to clustering in pattern recognition and machine learning, the k-means algorithm and its extensions are always influenced by initializations with a necessary number of clusters a priori. That is, the k-means algorithm is not exactly an unsupervised clustering method. In this paper, we construct an unsupervised learning schema for the k-means algorithm so that it is free of initializations without parameter selection and can also simultaneously find an optimal number of clusters. That is, we propose a novel unsupervised k-means (U-k-means) clustering algorithm with automatically finding an optimal number of clusters without giving any initialization and parameter selection. The computational complexity of the proposed U-k-means clustering algorithm is also analyzed. Comparisons between the proposed U-k-means and other existing methods are made. Experimental results and comparisons actually demonstrate these good aspects of the proposed U-k-means clustering algorithm.
Article
Full-text available
Today, OSNs (Online Social Networks) considered the most platforms common on the Internet. It plays a substantial role for users of the internet to hold out their everyday actions such as news reading, content sharing, product reviews, messages posting, and events discussing etc. Unfortunately, on the OSNs some new attacks have been recognized. Different types of spammers are existing in these OSNs. These cyber-criminals containing online fraudsters, sexual predators, catfishes, social bots, and advertising campaigners etc.OSNs abuse in different ways especially by creating fake profiles to carry out scams and spread their content. The identities of all these malicious are so damaging to the service providers and the users. From the opinion of OSNs service providers, the loss of bandwidth moreover the overall reputation of the network is affected by fake profiles. Thus, needing more complex automated methods, and tremendous effort manpower to discover and stopping these harmful users.This paper explains different kinds of OSNs risk generators such as cloned profiles, compromised profiles, and online bots (spam-bots, chat-bots, and social-bots). In addition, it presents several classifications of features that have been used for training classifiers in order to discover fake profiles. We try to show different ways that used to detect every kind of these malicious profiles. Also, this paper trying to show what is the dangerous type of profile attacks and the most popular in OSNs.
Article
Full-text available
This work presents a machine learning model that utilizes a set of supervised and unsupervised mining algorithms for detecting fake Facebook profiles. Specifically, three supervised algorithms (ID3 decision tree, k-NN, and SVM) and two unsupervised algorithms (k-Means and k-Medoids) are implemented using the RapidMiner© Studio with a set of 12 behavioral and non-behavioral attributes provided in the Facebook users' profiles. To collect the related data and due to the strict privacy settings of Facebook, a special tool (CRAWLER) is developed specifically for this purpose. This ends with a dataset of 982 profiles that are used to carried out two experiments, with and without removing the missing values profiles, to determine which technique has performed best. Results showed that the supervised algorithms have best accuracy rates over the unsupervised algorithms in both experiments. In particular, ID3 algorithm outperforms other classifiers, while k-Medoids registered the lowest accuracy rate in the detection process.
Article
Full-text available
In this paper, an efficient meta-heuristic satin bowerbird optimization (SBO) algorithm is presented for congestion management (CM) in the deregulated power system. The main objective of CM is to relieve congestion in the transmission lines using a generation rescheduling-based approach, while satisfying all the constraints with minimum congestion cost. The SBO is a nature-inspired algorithm, developed based on the ‘male-attracts-the-female for breeding’ principle of the specialized stick structure mechanism of satin birds. The proposed approach is effectively tested on small and large test systems, namely, modified IEEE 30-bus, modified IEEE 57-bus, and IEEE 118-bus test systems. The constraints like line loading, line limits, generator limits, and bus voltage impact, etc., are incorporated into this study. The proposed technique gives superior results with regards to congestion cost and losses compared with various recent optimization algorithms.
Article
Full-text available
The massive growth in the scale of data has been observed in recent years being a key factor of the Big Data scenario. Big Data can be defined as high volume, velocity and variety of data that require a new high-performance processing. Addressing big data is a challenging and time-demanding task that requires a large computational infrastructure to ensure successful data processing and analysis. The presence of data preprocessing methods for data mining in big data is reviewed in this paper. The definition, characteristics, and categorization of data preprocessing approaches in big data are introduced. The connection between big data and data preprocessing throughout all families of methods and big data technologies are also examined, including a review of the state-of-the-art. In addition, research challenges are discussed, with focus on developments on different big data framework, such as Hadoop, Spark and Flink and the encouragement in devoting substantial research efforts in some families of data preprocessing methods and applications on new big data learning paradigms.
Article
Full-text available
The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method. The approach, which combines several randomized decision trees and aggregates their predictions by averaging, has shown excellent performance in settings where the number of variables is much larger than the number of observations. Moreover, it is versatile enough to be applied to large-scale problems, is easily adapted to various ad-hoc learning tasks, and returns measures of variable importance. The present article reviews the most recent theoretical and methodological developments for random forests. Emphasis is placed on the mathematical forces driving the algorithm, with special attention given to the selection of parameters, the resampling mechanism, and variable importance measures. This review is intended to provide non-experts easy access to the main ideas.
Chapter
Millions of people use social media platforms like Twitter, Facebook, and Instagram all around the world. People are drawn to these social media sites, and as the prevalence of social media grow, so do the security and privacy concerns that come with it. Nowadays, it is critical to ensure that we are following the correct social media account or purchasing a product from the actual consumer, as malicious users can be extremely harmful. This paper proposes a hybrid method for detecting fake social media user accounts. For detecting fake accounts, it makes use of the Instagram social media platform’s dataset. There are two steps to the hybrid approach. The first stage is to use Principal Component Analysis (PCA) which turn original variables into new uncorrelated variables, and the second stage is to use various classification algorithms, in the second stage five algorithms are used to obtain accurate results. Fake profiles are detected using naive Bayes, artificial neural networks (ANN), support vector machine (SVM), logistic regression, and K-nearest neighbors (KNN) algorithms. When the classification performances of these approaches are compared, the artificial neural network outperforms the others.KeywordsHybridPCAMachine learningInstagramANNConfusion matrix
Chapter
Online Social Networks (OSNs) are the most popular web services nowadays. They provide users with different kinds of services. Anyone can create his account on a certain OSN such as Facebook, using an email and password for registration. In addition, a single user can own one or more accounts. However, this feature has a lot of disadvantages and security drawbacks, such as creating fake accounts. A fake account is a profile that exists physically on OSN. Nonetheless, it is missing identity information such as names, last name, profile photo, and other profile attributes. Owners of fake accounts exploit them (accounts) for malicious internet activities like phishing, hacking, and more. Recently, this problem attracted considerably the research community. In this context, a lot of approaches have been emerged to solve fake account detection on OSNs. However, despite its importance, this field of research is still missing a systematic review. In this paper, we introduce a survey on sixteen kinds of research studies that have proposed solutions for solving the detection of fake accounts problem. We analyzed the following themes: Social networking platforms, evaluation metrics, machine learning algorithms, and models, data scale, features, and result accuracy were studied and analyzed in this survey.
Chapter
Today, social media data spread more swiftly, which can be beneficial or destructive in various circumstances. Because of their widespread use, OSNs have become a platform for spammers to distribute undesirable content. Many security breaches have been noticed with the increasing frequency of cyber-attacks. Cyber-attacks are making news, and consumers are turning their trust in online social networks against them. These attacks have far-reaching consequences. This necessitates identifying such fictitious users to keep the trust of the social networks’ users intact. In this paper, a method based on machine learning classification algorithms for specifically tagging fake Twitter profiles has been proposed. Fake users are spam profiles that appear to be actual users and are responsible for tarnishing the reputation of legitimate users. Data from about 8 K users have been used to validate the suggested system for this purpose. The accuracy of the proposed technique is around 83.5% with logistic regression classifier.
Article
In the world of work the presence of the best employees becomes a benchmark of progress of the company itself. In the determination usually by looking at the performance of the employee e.g. from craft, discipline and also other achievements. The goal is to optimize in decision making to the best employees. Models obtained for employee predictions tested on real data sets provided by IBM analytics, which includes 29 features and about 22005 samples. In this paper we try to build system that predicts employee attribution based on A collection of employee data from kaggle website. We have used four different machines learning algorithms such as KNN (Neighbor K-Nearest), Naïve Bayes, Decision Tree, Random Forest plus two ensemble technique namely stacking and bagging. Results are expressed in terms of classic metrics and algorithms that produce the best result for the available data sets is the Random Forest classifier. It reveals the best withdrawals (0,88) as good as the stacking and bagging method with the same value
Chapter
The social network a crucial part of our life is plagued by online impersonation and fake accounts. Facebook, Instagram, Snapchat are the most well-known informal communities’ sites. The informal organization an urgent piece of our life is tormented by online pantomime and phony records. Fake profiles are for the most part utilized by the gatecrashers to complete malevolent exercises, for example, hurting individual, data fraud, and security interruption in online social network (OSN). Hence, recognizing a record is certified or counterfeit is one of the basic issues in OSN. Right now, propose a model that could be utilized to group a record as phony or certified. This model uses random forest method as an arrangement strategy and can process an enormous dataset of records on the double, wiping out the need to assess each record physically. Our concern can be said to be a characterization or a bunching issue. As this is a programmed recognition strategy, it very well may be applied effectively by online interpersonal organizations which have a large number of profiles, whose profiles cannot be inspected physically.
Chapter
Satin bower bird optimizer (SBO) is a very recent meta-heuristic algorithm used in optimization problems. In this work, an investigation is done on the efficacy of SBO in tuning a PI controller parameter for improving voltage stability in an autonomous AC microgrid. Intelligent microgrids are in fact the building blocks of a smart grid. For a microgrid operating in the autonomous mode, maintaining frequency and voltage stability under different disturbance conditions are major causes of concern. Droop control is the most accepted control for ensuring proper current sharing among parallel converters. The AC bus voltage is affected by the change in reactive power demand, and maintaining the bus voltage stability by minimizing the voltage deviation error is studied in this work. Here, a PI controller is employed whose gain is optimally tuned by SBO. The effectiveness is further tested by comparing the performance with controller gain set to values above and below the optimally tuned values.
Article
Online Social Networks (OSN) are popular applications for sharing various data, including text, photos, and videos. However, fake account problems are one of the obstacles in the current OSN systems. Attacker exploits fake accounts to distribute misleading information such as malware, virus, or malicious URLs. Inspired by the big successes of deep learning in computer vision, mainly in automatic feature extraction and representation, we propose DeepProfile, a deep neural network (DNN) algorithm to deal with fake account issues. Instead of using standard machine learning, we construct a dynamic CNN to train a learning model in fake profile classification. Notably, we propose a novel pooling layer to optimize the neural network performance in the training process. Demonstrated by the experiments, we harvest a promising result with better accuracy and small loss than common learning algorithms in a malicious account classification task.
Article
Machine learning with maximization (support) of separating margin (vector), called support vector machine (SVM) learning, is a powerful classification tool that has been used for cancer genomic classification or subtyping. Today, as advancements in high-throughput technologies lead to production of large amounts of genomic and epigenomic data, the classification feature of SVMs is expanding its use in cancer genomics, leading to the discovery of new biomarkers, new drug targets, and a better understanding of cancer driver genes. Herein we reviewed the recent progress of SVMs in cancer genomic studies. We intend to comprehend the strength of the SVM learning and its future perspective in cancer genomic applications.
Article
Accurate software development effort estimation is crucial to efficient planning of software projects. Due to complex nature of software projects, development effort estimation has become a challenging issue which must be seriously considered at the early stages of project. Insufficient information and uncertain requirements are the main reasons behind unreliable estimations in this area. Although numerous effort estimation models have been proposed during the last decade, accuracy level is not satisfying enough. This paper presents a new model based on a combination of adaptive neuro-fuzzy inference system (ANFIS) and satin bower bird optimization algorithm (SBO) to reach more accurate software development effort estimations. SBO is a novel optimization algorithm proposed to adjust the components of ANFIS through applying small and reasonable changes in variables. The proposed hybrid model is an optimized neuro-fuzzy based estimation model which is capable of producing accurate estimations in a wide range of software projects. The proposed optimization algorithm is compared against other bio inspired optimization algorithms using 13 standard test functions including unimodal and multimodal functions. Moreover, the proposed hybrid model is evaluated using three real data sets. Results show that the proposed model can significantly improve the performance metrics.
Data Mining Predicts Immunization Vaccine Needs Using the Naive Bayes Method (Case Study of UPT Primary Health Center)
  • L Siburian
Applications of Support Vector Machine (SVM) Learning In Cancer Genomics
  • S Huang
  • N Cai
  • P P Pacheco
  • S Narrandes
  • Y Wang
  • W Xu