Chapter

A Review on Preprocessing Techniques for Noise Reduction in PET-CT Images for Lung Cancer

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Cancer is one of the leading causes of death. According to World Health Organization, lung cancer is the most common cause of cancer deaths in 2020, with over 1.8 million deaths. Therefore, lung cancer mortality can be reduced with early detection and treatment. The components of early detection require screening and accurate detection of the tumor for staging and treatment planning. Due to the advances in medicine, nuclear medicine has become the forefront of precise lung cancer diagnosis. Currently, PET/CT is the most preferred diagnostic modality for lung cancer detection. However, variable results and noise in the imaging modalities and the lung's complexity as an organ have made it challenging to identify lung tumors from the clinical images. In addition, the factors such as respiration can cause blurry images and introduce other artifacts in the images. Although nuclear medicine is at the forefront of diagnosing, evaluating, and treating various diseases, it is highly dependent on image quality, which has led to many approaches, such as the fusion of modalities to evaluate the disease. In addition, the fusion of diagnostic modalities can be accurate when well-processed images are acquired, which is challenging due to different diagnostic machines and external and internal factors associated with lung cancer patients. The current works focus on single imaging modalities for lung cancer detection, and there are no specific techniques identified individually for PET and CT images, respectively, for attaining effective and noise-free hybrid imaging for lung cancer detection. Based on the survey, it has been identified that several image preprocessing filters are used for different noise types. However, for successful preprocessing, it is essential to identify the types of noise present in PET and CT images and the appropriate techniques that perform well for these modalities. Therefore, the primary aim of the review is to identify efficient preprocessing techniques for noise and artifact removal in the PET/CT images that can preserve the critical features of the tumor for accurate lung cancer diagnosis.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Epilepsy diagnosis is one of the critical subjects of research in health fields in recent years. The aim of this paper is to detect and classify two types of seizures: Tonic and myoclonic based on surface electromyography signals. Based on data from eight sEMG electrodes on the biceps brachii, flexor carpi ulnaris, gastrocnemius and quadriceps muscle. Several time domain features were investigated, including waveform length, mean absolute value, variance, mobility, complexity, kurtosis, skewness, simple square integral, integrated EMG, root mean square, average amplitude change, standard deviation and entropy. These features are used as inputs to the Artificial Neural Network (ANN) classifier to determine if a data segment corresponds to a seizure or not. Following the development of the ANN model, an evaluation performance in terms of accuracy is presented to validate the results, which reached an accuracy of 93.33%. The overall performance of the presented machine learning algorithm is sufficient for clinical implementation.
Article
Full-text available
The study of network traffic identification is not only important for the network management, but also crucial to monitor network security issues. Currently, traffic classification tasks including protocols identification, applications identification and traffic characterization identification and so on have a good result. However, existing classification methods can hardly distinguish between encrypted and non-encrypted compressed traffic. In this paper, we propose an entropy-based feature extraction algorithm for en-crypted and non-encrypted compressed traffic classification, which uses the entropy of fixed-length packet payload. For a fixed-length binary sequence from packet payload, the algorithm uses a sliding window of 8-bit and slides through different bits to obtain different sequences. Then it calculates the serial binary entropy of different sequences and an entropy vector as feature vector of the original sequence is obtained. By this method, the feature vectors of encrypted traffic and non-encrypted compressed traffic sequences are used as input of the support vector machine or random forest for training and classification. The experimental results show that the proposed feature extraction algorithm can well distinguish between encrypted traffic and non-encrypted compressed traffic. When the packet payload length is 1444 bytes, it can reach high classification accuracy (about 97.90%).
Conference Paper
Full-text available
21st century is named as the age of information technologies. Social applications such as Facebook, Twitter, Instagram, etc. have become a quick and huge media for spreading news over the internet. At the same time, the ability for the wide spread of news that is of low quality with intentionally false information is creating havocs causing damage to the extent of losing lives in the society. Such news is termed as fake news and detecting the fake news spreader is drawing more attention these days as fake news can manipulate communities' minds and also social trust. Until date, many studies have been done in this area and most of them are based on Machine Learning and Deep Learning approaches. In this paper, we have proposed a Universal Language Model Fine-Tuning model based on Transfer Learning to detect potential fake news spreaders on Twitter. The proposed model collects wiki text data to train the Language Model to capture general features of the language and this knowledge is transferred to build a classifier using fake news spreaders dataset provided by PAN 2020 to identify the fake news spreader. The results obtained on PAN 2020 fake news dataset are encouraging.
Article
Full-text available
Teaching-learning-based optimization (TLBO) is a population-based metaheuristic search algorithm inspired by the teaching and learning process in a classroom. It has been successfully applied to many scientific and engineering applications in the past few years. In the basic TLBO and most of its variants, all the learners have the same probability of getting knowledge from others. However, in the real world, learners are different, and each learner’s learning enthusiasm is not the same, resulting in different probabilities of acquiring knowledge. Motivated by this phenomenon, this study introduces a learning enthusiasm mechanism into the basic TLBO and proposes a learning enthusiasm based TLBO (LebTLBO). In the LebTLBO, learners with good grades have high learning enthusiasm, and they have large probabilities of acquiring knowledge from others; by contrast, learners with bad grades have low learning enthusiasm, and they have relative small probabilities of acquiring knowledge from others. In addition, a poor student tutoring phase is introduced to improve the quality of the poor learners. The proposed method is evaluated on the CEC2014 benchmark functions, and the computational results demonstrate that it offers promising results compared with other efficient TLBO and non-TLBO algorithms. Finally, LebTLBO is applied to solve three optimal control problems in chemical engineering, and the competitive results show its potential for real-world problems.
Article
Full-text available
With the rapid development of human society, the urbanization of the world's population is also progressing rapidly. Urbanization has brought many challenges and problems to the development of cities. For example, theurban population is under excessive pressure, various natural resources and energy are increasingly scarce, and environmental pollution is increasing, etc. However, the original urban model has to be changed to enable peopleto live in greener and more sustainable cities, thus providing them with a more convenient and comfortable living environment. The new urban framework, the smart city, provides excellent opportunities to meet these challenges, while solving urban problems at the same time. At this stage, many countries are actively responding to calls for smart city development plans. This paper investigates the current stage of the smart city. First, it introduces the background of smart city development and gives a brief definition of the concept of the smart city. Second, it describes the framework of a smart city in accordance with the given definition. Finally, various intelligent algorithms to make cities smarter, along with specific examples, are discussed and analyzed.
Article
Full-text available
Improper healthcare waste (HCW) management poses significant risks to the environment, human health, and socioeconomic sustainability due to the infectious and hazardous nature of HCW. This research aims at rendering a comprehensive landscape of the body of research on HCW management by (i) mapping the scientific development of HCW research, (ii) identifying the prominent HCW research themes and trends, and (iii) providing a research agenda for HCW management towards a circular economy (CE) transition and sustainable environment. The analysis revealed four dominant HCW research themes: (1) HCW minimization, sustainable management, and policy-making; (2) HCW incineration and its associated environmental impacts; (3) hazardous HCW management practices; and (4) HCW handling and occupational safety and training. The results showed that the healthcare industry, despite its potential to contribute to the CE transition, has been overlooked in the CE discourse due to the single-use mindset of the healthcare industry in the wake of the infectious, toxic, and hazardous nature of HCW streams. The findings shed light on the HCW management domain by uncovering the current status of HCW research, highlighting the existing gaps and challenges, and providing potential avenues for further research towards a CE transition in the healthcare industry and HCW management.
Article
Full-text available
COVID-19 was declared a global pandemic by the World Health Organization in March 2020, and has infected more than 4 million people worldwide with over 300,000 deaths by early May 2020. Many researchers around the world incorporated various prediction techniques such as Susceptible-Infected-Recovered model, Susceptible-Exposed-Infected-Recovered model, and Auto Regressive Integrated Moving Average model (ARIMA) to forecast the spread of this pandemic. The ARIMA technique was not heavily used in forecasting COVID-19 by researchers due to the claim that it is not suitable for use in complex and dynamic contexts. The aim of this study is to test how accurate the ARIMA best-fit model predictions were with the actual values reported after the entire time of the prediction had elapsed. We investigate and validate the accuracy of an ARIMA model over a relatively long period of time using Kuwait as a case study. We started by optimizing the parameters of our model to find a best-fit through examining auto-correlation function and partial auto correlation function charts, as well as different accuracy measures. We then used the best-fit model to forecast confirmed and recovered cases of COVID-19 throughout the different phases of Kuwait’s gradual preventive plan. The results show that despite the dynamic nature of the disease and constant revisions made by the Kuwaiti government, the actual values for most of the time period observed were well within bounds of our selected ARIMA model prediction at 95% confidence interval. Pearson’s correlation coefficient for the forecast points with the actual recorded data was found to be 0.996. This indicates that the two sets are highly correlated. The accuracy of the prediction provided by our ARIMA model is both appropriate and satisfactory.
Article
Full-text available
Retail fashion is the fastest growing e-commerce sector; however, the industry is facing a serious issue with returns for purchases made online. Retailers are slowly but surely embracing new online tools to help consumers make smarter fit decisions. The current study investigates the influence that online sizing technology, True Fit®, has on consumer confidence while making sizing decisions online and eventual intention to use True Fit®. The study uses an adapted Technology Acceptance Model and includes the following variables: perceived ease of use of using True Fit®, perceived usefulness of using True Fit®; convenience of using True Fit®, and intention to use True Fit®. Data were collected via an online survey (n = 577). Data were analysed using descriptive statistics, factor analysis and Structural Equation Modelling, all the hypotheses were supported. The paper adds insights into the subject of online sizing and provides recommendations for future studies.
Article
Full-text available
In the e-commerce and financial industries, AI has been deployed to achieve better customer experience, efficient supply chain management, improved operational efficiency, and reduced mate size, with the main goal of designing standard, reliable product quality control methods and the search for new ways of reaching and serving customers while maintaining low cost. Machine learning and deep learning are two of the most often used AI approaches. Individuals, businesses, and government agencies utilize these models to anticipate and learn from data. Machine learning models for the complexity and diversity of data in the food industry are being developed at the moment. This article discusses machine learning and artificial intelligence applications in e-commerce, corporate management, and finance. Sales growth, profit maximization, sales forecast, inventory management, security, fraud detection, and portfolio management are some of the major uses.
Data
Full-text available
A collection of words across multiple categories and languages. The languages present are - English, Hindi, Marathi, Punjabi, Kannada, Tamil, Telugu, and Sanskrit.
Article
Full-text available
Medical services inevitably generate healthcare waste (HCW) that may become hazardous to healthcare staffs, patients, the population, and the atmosphere. In most of the developing countries, HCW disposal management has become one of the fastest-growing challenges for urban municipalities and healthcare providers. Determining the location for HCW disposal centers is a relatively complex process due to the involvement of various alternatives, criteria, and strict government guidelines about the disposal of HCW. The objective of the paper is to introduce the WASPAS (weighted aggregated sum product assessment) method with Fermatean fuzzy sets (FFSs) for the HCW disposal location selection problem. This method combines the score function, entropy measure, and classical WASPAS approach within FFSs context. Next, a combined procedure using entropy and score function is proposed to estimate the criteria weights. To do this, a novel score function with its desirable properties and some entropy measures are introduced under the FFSs context. Further, an illustrative case study of the HCW disposal location selection problem on FFSs is established, which evidences the practicality and efficacy of the developed approach. Comparative discussion and sensitivity analysis are made to monitor the permanence of the introduced framework. The final results approve that the proposed methodology can effectively handle the ambiguity and inaccuracy in the decision-making procedure of HCW disposal location selection.
Article
Full-text available
Generative adversarial networks (GANs) are most popular generative frameworks that have achieved compelling performance. They follow an adversarial approach where two deep models generator and discriminator compete with each other. In this paper, we propose a Generative Adversarial Network with best hyper-parameters selection to generate fake images for digit numbers 1–9 with generator and train discriminator to decide whereas the generated images are fake or true. Genetic algorithm (GA) technique was used to adapt GAN hyper-parameters, the resulted algorithm is named GANGA: generative adversarial network with genetic algorithm. The resulted algorithm has achieved high performance; it was able to get zero value of loss function for the generator and discriminator separately. Anaconda environment with tensorflow library facilitates was used; python as programming language was adapted with needed libraries. The implementation was done using MNIST dataset to validate the work. The proposed method is to let genetic algorithm choose best values of hyper-parameters depending on minimizing a cost function such as a loss function or maximizing accuracy function used to find best values of learning rate, batch normalization, number of neurons and a parameter of dropout layer.
Article
Full-text available
Twitter is one of the most popular micro-blogging social media platforms that has millions of users. Due to its popularity, Twitter has been targeted by different attacks such as spreading rumors, phishing links, and malware. Tweet-based botnets represent a serious threat to users as they can launch large-scale attacks and manipulation campaigns. To deal with these threats, big data analytics techniques, particularly shallow and deep learning techniques have been leveraged in order to accurately distinguish between human accounts and tweet-based bot accounts. In this paper, we discuss existing techniques, and provide a taxonomy that classifies the state-of-the-art of tweet-based bot detection techniques. We also describe the shallow and deep learning techniques for tweet-based bot detection, along with their performance results. Finally, we present and discuss the challenges and open issues in the area of tweet-based bot detection.
Article
Full-text available
Intrusion Detection Systems (IDSs) have received more attention to safeguarding the vital information in a network system of an organization. Generally, the hackers are easily entering into a secured network through loopholes and smart attacks. In such situation, predicting attacks from normal packets is tedious, much challenging, time consuming and highly technical. As a result, different algorithms with varying learning and training capacity have been explored in the literature. However, the existing Intrusion Detection methods could not meet the desired performance requirements. Hence, this work proposes a new Intrusion Detection technique using Deep Autoencoder with Fruitfly Optimization. Initially, missing values in the dataset have been imputed with the Fuzzy C-Means Rough Parameter (FCMRP) algorithm which handles the imprecision in datasets with the exploit of fuzzy and rough sets while preserving crucial information. Then, robust features are extracted from Autoencoder with multiple hidden layers. Finally, the obtained features are fed to Back Propagation Neural Network (BPN) to classify the attacks. Furthermore, the neurons in the hidden layers of Deep Autoencoder are optimized with population based Fruitfly Optimization algorithm. Experiments have been conducted on NSL_KDD and UNSW-NB15 dataset. The computational results of the proposed intrusion detection system using deep autoencoder with BPN are compared with Naive Bayes, Support Vector Machine (SVM), Radial Basis Function Network (RBFN), BPN, and Autoencoder with Softmax. Article Highlights A hybridized model using Deep Autoencoder with Fruitfly Optimization is introduced to classify the attacks. Missing values have been imputed with the Fuzzy C-Means Rough Parameter method. The discriminate features are extracted using Deep Autoencoder with more hidden layers.
Article
Full-text available
Interval valued Fermatean fuzzy set is an extension of Fermatean fuzzy set. It is a combination of interval valued fuzzy set and Fermatean fuzzy set. In this paper we propose the notion of interval valued Fermatean fuzzy et. It is a pair of intevral numbers such that the sum of the ccubic of the upper bounds should be less than or equal to one. Some basic properties based on interval valued Feramtean fuzzy set is studied. We introduce the concept of interval valued Feramtean fuzzy Γ-subsemihypergroup, interval valued Feramtean fuzzy (bi, interior) Γ-hypersemigroup. Relation between these Γ-hyperideals are also discussed with suitable examples. Finally the inverse image of an interval valued Feramtean fuzzy set is established and also proved that the inverse image of an interval valued Feramtean fuzzy (bi, interior) Γ-hyperideal is also an interval valued Feramtean fuzzy (bi, interior) Γ-hyperideal.
Article
Full-text available
With the rapid increase in communication technologies and smart devices, an enormous surge in data traffic has been observed. A huge amount of data gets generated every second by different applications, users, and devices. This rapid generation of data has created the need for solutions to analyze the change in data over time in unforeseen ways despite resource constraints. These unforeseeable changes in the underlying distribution of streaming data over time are identified as concept drifts. This paper presents a novel approach named ElStream that detects concept drift using ensemble and conventional machine learning techniques using both real and artificial data. ElStream utilizes the majority voting technique making only optimum classifier to vote for decision. Experiments were conducted to evaluate the performance of the proposed approach. According to experimental analysis, the ensemble learning approach provides a consistent performance for both artificial and real-world data sets. Experiments prove that the ElStream provides better accuracy of 12.49%, 11.98%, 10.06%, 1.2%, and 0.33% for PokerHand, LED, Random RBF, Electricity, and SEA dataset respectively, which is better as compared to previous state-of-the-art studies and conventional machine learning algorithms.
Article
Full-text available
As COVID-19 hounds the world, the common cause of finding a swift solution to manage the pandemic has brought together researchers, institutions, governments, and society at large. The Internet of Things (IoT), Artificial Intelligence (AI) — including Machine Learning (ML) and Big Data analytics — as well as Robotics and Blockchain, are the four decisive areas of technological innovation that have been ingenuity harnessed to fight this pandemic and future ones. While these highly interrelated smart and connected health technologies cannot resolve the pandemic overnight and may not be the only answer to the crisis, they can provide greater insight into the disease and support frontline efforts to prevent and control the pandemic. This paper provides a blend of discussions on the contribution of these digital technologies, propose several complementary and multidisciplinary techniques to combat COVID-19, offer opportunities for more holistic studies, and accelerate knowledge acquisition and scientific discoveries in pandemic research. First, four areas where IoT can contribute are discussed, namely, i) tracking and tracing, ii) Remote Patient Monitoring (RPM) by Wearable IoT (WIoT), iii) Personal Digital Twins (PDT), and iv) real-life use case: ICT/IoT solution in Korea. Second, the role and novel applications of AI are explained, namely: i) diagnosis and prognosis, ii) risk prediction, iii) vaccine and drug development, iv) research dataset, v) early warnings and alerts, vi) social control and fake news detection, and vii) communication and chatbot. Third, the main uses of robotics and drone technology are analyzed, including i) crowd surveillance, ii) public announcements, iii) screening and diagnosis, and iv) essential supply delivery. Finally, we discuss how Distributed Ledger Technologies (DLTs), of which blockchain is a common example, can be combined with other technologies for tackling COVID-19.
Article
Full-text available
The rapid spread of coronavirus disease has become an example of the worst disruptive disasters of the century around the globe. To fight against the spread of this virus, clinical image analysis of chest CT (computed tomography) images can play an important role for an accurate diagnostic. In the present work, a bi-modular hybrid model is proposed to detect COVID-19 from the chest CT images. In the first module, we have used a Convolutional Neural Network (CNN) architecture to extract features from the chest CT images. In the second module, we have used a bi-stage feature selection (FS) approach to find out the most relevant features for the prediction of COVID and non-COVID cases from the chest CT images. At the first stage of FS, we have applied a guided FS methodology by employing two filter methods: Mutual Information (MI) and Relief-F, for the initial screening of the features obtained from the CNN model. In the second stage, Dragonfly algorithm (DA) has been used for the further selection of most relevant features. The final feature set has been used for the classification of the COVID-19 and non-COVID chest CT images using the Support Vector Machine (SVM) classifier. The proposed model has been tested on two open-access datasets: SARS-CoV-2 CT images and COVID-CT datasets and the model shows substantial prediction rates of 98.39% and 90.0% on the said datasets respectively. The proposed model has been compared with a few past works for the prediction of COVID-19 cases. The supporting codes are uploaded in the Github link: https://github.com/Soumyajit-Saha/A-Bi-Stage-Feature-Selection-on-Covid-19-Dataset
Article
Full-text available
Background: Fatty liver disease (FLD) has become a rampant condition. It is associated with a high rate of morbidity and mortality in a population. The condition is commonly referred as FLD. Early prediction of FLD would allow patients to take necessary preventive, diagnosis, and treatment. The main objective of this research is to develop a machine learning (ML) model to predict FLD that can help medics to classify individuals at high risk of FLD, make novel diagnosis, management, and prevention for FLD. Methods: Total of 3,419 subjects were recruited with 845 having been screened for FLD. Classification models were used in the detection of the disease. These models include logistic regression (LR), random forest (RF), artificial neural networks (ANNs), k-nearest neighbors (KNNs), extreme gradient boosting (XGBoost), and linear discriminant analysis (LDA). Predictive accuracy was assessed by area under curve (AUC), sensitivity, specificity, positive predictive value, and negative predictive value. Results: We demonstrated that ML models give more accurate predictions, the best accuracy reached to 0.9415 in the XGBoost model. Feature importance analysis not only confirmed some well-known FLD risk factors, but also demonstrated several novel features for predicting the risk of FLD, such as hemoglobin. Conclusion: By implementing the XGBoost model, physicians can efficiently identify FLD in general patients; this would help in prevention, early treatment, and management of FLD.
Article
Full-text available
Blockchain is a revolutionary technology that is making a great impact on modern society due to its transparency, decentralization, and security properties. Blockchain gained considerable attention due to its very first application of Cryptocurrencies e.g., Bitcoin. In the near future, Blockchain technology is determined to transform the way we live, interact, and perform businesses. Recently, academics, industrialists, and researchers are aggressively investigating different aspects of Blockchain as an emerging technology. Unlike other Blockchain surveys focusing on either its applications, challenges, characteristics, or security, we present a comprehensive survey of Blockchain technology’s evolution, architecture, development frameworks, and security issues. We also present a comparative analysis of frameworks, classification of consensus algorithms, and analysis of security risks & cryptographic primitives that have been used in the Blockchain so far. Finally, this paper elaborates on key future directions, novel use cases and open research challenges, which could be explored by researchers to make further advances in this field.
Article
Full-text available
Huge amounts of educational data are being produced, and a common challenge that many educational organizations confront, is finding an effective method to harness and analyze this data for continuously delivering enhanced education. Nowadays, the educational data is evolving and has become large in volume, wide in variety and high in velocity. This produced data needs to be handled in an efficient manner to extract value and make informed decisions. For that, this paper confronts such data as a big data challenge and presents a comprehensive platform tailored to perform educational big data analytical applications. Further, present an effective environment for non-data scientists and people in the educational sector to apply their demanding educational big data applications. The implementation stages of the educational big data platform on a cloud computing platform and the organization of educational data in a data lake architecture are highlighted. Furthermore, two analytical applications are performed to test the feasibility of the presented platform in discovering knowledge that potentially promotes the educational institutions.
Article
Full-text available
The notion of smart cities has remained under evolution as its global implementations are challenged by numerous technological, economic, and governmental obstacles. Moreover, the synergy of the Internet of Things (IoT) and big data technologies could result in promising horizons in terms of smart city development which has not been explored yet. Thus, the current research aims to address the essence of smart cities. To this end, first, the concept of smart cities is briefly overviewed; then, their properties and specifications as well as generic architecture, compositions, and real-world implementations are addressed. Furthermore, possible challenges and opportunities in the field of smart cities are described. Numerous issues and challenges such as analytics and using big data in smart cities introduced in this study offers an enhancement in developing applications of the above-mentioned technologies. Hence, this study paves the way for future research on the issues and challenges of big data applications in smart cities.
Article
Full-text available
Advancements in the sector of computer and multimedia technology and introduction of the World Wide Web have increased the volume of image databases and collections, for example medical imageries, digital libraries, art galleries which in total contain millions of images. The retrieval process of images from such huge database by traditional methods such as Text Based Image Retrieval, Color Histogram and Chi Square Distance may take a lot of time to get the desired images. It is necessity to develop an effective image retrieval system which can handle these huge amounts of images at once. The main purpose is to build a robust system that builds, executes and responds to data in an efficient manner. A Content-Based Image Retrieval (CBIR) system has been developed as an efficient image retrieval tool where user can provide their query to the system to allow it to retrieve user’s desired image from the image collection. Moreover, the emergence of web development and transmission networks and also the number of images which are available to users continue to grow. We propose an effective deep learning framework based on Convolution Neural Networks (CNN) and Support Vector Machine (SVM) for fast image retrieval. Proposed architecture extracts features using CNN and classification using SVM. The results demonstrate the robustness of the system.
Article
Full-text available
The proliferation of fake news and its propagation on social media has become a major concern due to its ability to create devastating impacts. Different machine learning approaches have been suggested to detect fake news. However, most of those focused on a specific type of news (such as political) which leads us to the question of dataset-bias of the models used. In this research, we conducted a benchmark study to assess the performance of different applicable machine learning approaches on three different datasets where we accumulated the largest and most diversified one. We explored a number of advanced pre-trained language models for fake news detection along with the traditional and deep learning ones and compared their performances from different aspects for the first time to the best of our knowledge. We find that BERT and similar pre-trained models perform the best for fake news detection, especially with very small dataset. Hence, these models are significantly better option for languages with limited electronic contents, i.e., training data. We also carried out several analysis based on the models’ performance, article’s topic, article’s length, and discussed different lessons learned from them. We believe that this benchmark study will help the research community to explore further and news sites/blogs to select the most appropriate fake news detection method.
Article
Full-text available
Recently, a huge amount of online consumer reviews (OCRs) is being generated through social media, web contents, and microblogs. This scale of big data cannot be handled by traditional methods. Sentiment analysis (SA) or opinion mining is emerging as a powerful and efficient tool in big data analytics and improving decision making. This research paper introduces a novel method that integrates neutrosophic set (NS) theory into the SA technique and multi-attribute decision making (MADM) to rank the different products based on numerous online reviews. The method consists of two parts: Determining sentiment scores of the online reviews based on the SA technique and ranking alternative products via NS theory. In the first part, the online reviews of the alternative products concerning multiple features are crawled and pre-processed. A neutral lexicon consists of 228 neutral words and phrases is compiled and the Valence Aware Dictionary and sEntiment Reasoner (VADER) for sentiment reasoning is adapted to handle the neutral data. The compiled neutral lexicon, as well as the adapted VADER, are utilized to build a novel adaptation called Neutro-VADER. The Neutro-VADER assigns positive, neutral, and negative sentiment scores to each review concerning the product feature. In this stage, the novel idea is to point out the positive, neutral, and negative sentiment scores as the truth, indeterminacy, and falsity memberships degrees of the neutrosophic number. The overall performance of each alternative concerning each feature based on a neutrosophic number is measured. In the second part, the ranking of alternatives is being evaluated through the simplified neutrosophic number weighted averaging (SNNWA) operator and cosine similarity measure methods. A case study with real datasets (Twitter datasets) is provided to illustrate the application of the proposed method. The results show good performance in handling the neutral data on the SA stage as well as the ranking stage. In the SA stage, findings show that the Neutro-VADER in the proposed method can deal successfully with all types of uncertainties including indeterminacy comparable with the traditional VADER in the other methods. In the ranking stage, the results show a great similarity and consistency while using other ranking methods such as PROMETHEE II, TOPSIS, and TODIM methods.
Article
Full-text available
With the increasingly widespread of advanced metering infrastructure, electric load clustering is becoming more essential for its great potential in analytics of consumers' energy consumption patterns and preference through data mining. Moreover, a variety of electric load clustering techniques have been put into practice to obtain the distribution of load data, observe the characteristics of load clusters, and classify the components of the total load. This can give rise to the development of related techniques and research in the smart grid, such as demand-side response. This paper summarizes the basic concepts and the general process in electric load clustering. Several similarity measurements and five major categories in electric load clustering are then comprehensively summarized along with their advantages and disadvantages. Afterwards, eight indices widely used to evaluate the validity of electric load clustering are described. Finally, vital applications are discussed thoroughly along with future trends including the tariff design, anomaly detection, load forecasting, data security and big data, etc.
Article
Full-text available
The popularity of digital histopathology is growing rapidly in the development of computer aided disease diagnosis systems. However, the color variations due to manual cell sectioning and stain concentration make the process challenging in various digital pathological image analysis such as histopathological image segmentation and classification. Hence, the normalization of these variations are needed to obtain the promising results. The proposed research intends to introduce a reliable and robust new complete color normalization method, addressing the problems of color and stain variability. The new complete color normalization involves three phases, namely enhanced fuzzy illuminant normalization, fuzzy-based stain normalization, and modified spectral normalization. The extensive simulations are performed and validated on histopathological images. The presented algorithm outperforms the existing conventional normalization methods by overcoming the certain limitations and challenges. As per the experimental quality metrics and comparative analysis, the proposed algorithm performs efficiently and provides promising results.
Article
In any service-providing organization, it is necessary to answer current or future customer’s questions for their information and for maintaining good relationship with the customers. These organizations usually have call centers with hundreds and thousands of employees to give replies to customers through chat (i.e. text) or phone calls. These kind of traditional systems need large number of employees which eventually increases the overall expenditure and manpower requirement of that organization. This paper is aimed at minimizing the manpower requirement i.e. decreasing the number of employees by implementing a chat service. This concept of chatbot has been proposed for implementation in College Website that will automatically reply to the queries of stakeholders like students, parents, etc. without any human interaction. For implementation of chatbot sequence to sequence model is used which is a deep neural network.
Conference Paper
Evolving digital transformation has exacerbated cybersecurity threats globally. Digitization expands the doors wider to cybercriminals. Initially cyberthreats approach in the form of phishing to steal the confidential user credentials. Usually, Hackers will influence the users through phishing in order to gain access to the organizatlou's digital assets and networks. With security breaches, cybercriminals execute ransomware attack, get unauthorized access, and shut down systems and even demand a ransom for releasing the access. Anti-phishing software and techniques are circumvented by the phishers for dodging tactics. Though threat intelligence and behavioural analytics systems support organizations to spot the unusual traffic patterns, still the best practice to prevent phishing attacks is defended in depth. In this perspective, the proposed research work has developed a model to detect the phishing attacks using machine learning (ML) algorithms like random forest (RF) and decision tree (DT). A standard legitimate dataset of phishing attacks from Kaggle was aided for ML processing. To analyze the attributes of the dataset, the proposed model has used feature selection algorithms like principal component analysis (PCA). Finally, a maximum accuracy of 97% was achieved through the random forest algorithm.
Chapter
People in the modern world are attracted towards smart working and earning environments rather than having a long-term perception. The goal of this work is to address the challenge of providing better inputs to the customers interested to investing in the share market to earn better returns on investments. The Twitter social networking site is chosen to develop the proposed environment as a majority of the customers tweet about their opinions. A huge set of data across various companies that take inputs from Twitter are processed and stored in the cloud environment for efficient analysis and assessment. A statistical measure is used to signal the worth of investing in a particular stock based on the outcomes obtained. Also, rather than ignoring the missing values and unstructured data, the proposed work analyzes every single entity to enable the customers to take worthy decisions. Tweets in the range of 1 to 100,000 are taken to perform analysis and it is observed from the results that for a maximum of 100,000 tweets, the number of missing is identified as 2,524 and the statistical measure to fill in the missing values is calculated based on the particular missing data record, the count of all data records, and the total number of records. If the outcome of the measure is obtained as a negative, then proceeding with an investment is not recommended. The findings of this work will help the share market investors to earn better profits.
Article
A lot of work has been done towards reconstructing the 3D facial structure from single images by capitalizing on the power of Deep Convolutional Neural Networks (DCNNs). In the recent works, the texture features either correspond to components of a linear texture space or are learned by auto-encoders directly from in-the-wild images. In all cases, the quality of the facial texture reconstruction is still not capable of modeling facial texture with high-frequency details. In this paper, we take a radically different approach and harness the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images. That is, we utilize GANs to train a very powerful facial texture prior from a large-scale 3D texture dataset. Then, we revisit the original 3D Morphable Models (3DMMs) fitting making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. In order to be robust towards initialisation and expedite the fitting process, we propose a novel self-supervised regression based approach. We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details.
Chapter
The synaptic disturbance in the prefrontal portion of the brain induces epileptic seizures. Electroencephalography is a noninvasive tool for diagnosing the different brain disorders including the seizures. Visual inspection of these signals for prognosis of a disease is very time consuming. Therefore, automated epileptic seizure detection methods are adopted by the medical practitioners for fast analysis and accurate detection of disease. An integrated method was proposed in this work, in which wavelet packet decomposition (WPD) method is implemented for time frequency transformation and then the features are extracted. After feature extraction, four different classification models are compared using the balanced train test split method, where 70% train dataset and 30% test dataset have been taken for validation of the model. The results reveal that WPD with SVM has an accuracy of 96% which outperforms other conventional models applied to the benchmark dataset of Bonn University’s EEG signal for seizure prediction.
Article
In the world, the technological and industrial revolution is accelerating by the widespread application of new generation information and communication technologies, such as AI, IoT (the Internet of Things), and blockchain technology. Artificial intelligence has attracted much attention from government, industry, and academia. In this study, popular articles published in recent years that relate to artificial intelligence are selected and explored. This study aims to provide a review of artificial intelligence based on industry information integration. It presents an overview of the scope of artificial intelligence using background, drivers, technologies, and applications, as well as logical opinions regarding the development of artificial intelligence. This paper may play a role in AI-related research and should provide important insights for practitioners in the real world.The main contribution of this study is that it clarifies the state of the art of AI for future study.
Article
The advancement and introduction of computing technologies has proven to be highly effective and has resulted in the production of large amount of data that is to be analyzed. However, there is much concern on the privacy protection of the gathered data which suffers from the possibility of being exploited or exposed to the public. Hence, there are many methods of preserving this information they are not completely scalable or efficient and also have issues with privacy or data utility. Hence this proposed work provides a solution for such issues with an effective perturbation algorithm that uses big data by means of optimal geometric transformation. The proposed work has been examined and tested for accuracy, attack resistance, scalability and efficiency with the help of 5 classification algorithms and 9 datasets. Experimental analysis indicates that the proposed work is more successful in terms of attack resistance, scalability, execution speed and accuracy when compared with other algorithms that are used for privacy preservation.